Are you looking forward to Struct types in Java?

The_Leo · April 18, 2018, 3:03pm

In Java, almost everything is an object with the exception of primitive types.
The overhead of using objects is usually not significant in many cases.
There are cases where structs/value types would be nice.
Take an example of: an array of Vector3f, consisting of 3 floats:

//data takes 12000 bytes = 1000 * 3 * 4
//each object takes 12 extra so 12000 bytes
//array references: 4000 bytes = 4*1000
//array object: 16 bytes
Vector3f[] arr = new Vector3f[1000];
//Total is 28016 bytes

That is plenty of extra bytes, its 2.333x more than structs would take.
Memory is not the only overhead.
Since the array contains pointers, each vector is accessed indirectly.
If you algorithm is memory bound, using structs can improve the performance by at least 2.333x times.

There is openJDK project OpenJDK: Valhalla
which aims to bring value types to Java. However, It is not known when it will be finished.

Would you use struct types if Java had them?

danielp · April 18, 2018, 3:22pm

For situations like this, totally. In addition to the memory savings value types are a big plus for taking advantage of CPU caching. Right now if you want to operate on arrays of objects in a cache-friendly manner, you need to split the object values into primitive arrays and operate on those to avoid the pointer indirection and the resulting (highly probable) cache misses. Value types would eliminate that requirement and let you operate on the “objects” directly without losing cache locality.

Ali_RS · April 18, 2018, 3:35pm

I think we can also take advantage of value types in an Entity Component System Design for Components which are pure data and are immutable objects.
I am interested to know pro’s idea about this.

nehon · April 18, 2018, 3:38pm

Well I’d really like to know how you can deduce this ^. How can you tell your algorithm is memory bound in Java.

I’ve been experimenting with data oriented design lately. For those who don’t know what it is : it’s based on the idea that your CPU uses L1, L2, L3 cache to store chunks of memory for fast access while processing. The caches works in a way that when your program access data at a memory address all nearby / consecutive data is also loaded in the cache, with the assumption that the program may access nearby data subsequently.
The problem is copying data in this cache is long so if your program treat chunks of data that are scattered throughout the ram, it will constantly load useless data in the cache an dismiss them right after. This is called a cache miss.
Basically there are a lot of programs where the CPU is just waiting for memory to be copied in the cache instead of processing it. That’s why cpus do branch prediction during “down” time to get ahead of future execution (that’s what lead to the Meltdown thingy not so long ago, but that’s another story).

When you think of a OOP structure you thing objects in objects in objects, this usually ends up in objects allocated far away in the memory (like your example of an array of objects, the pointers to the objects will be in one chunk in memory, but there is no guaranty of where will be the actual objects in memory)
The idea with data oriented design is to make your design around the data, and make sure all data that will be processed at the same time is continuous in memory to maximize cache efficiency.

So I experimented, with java of course, and the result were… totally unexpected. In the way that the DoD approach didn’t give me any perf boost… even the opposite. I tried many things: primitive types arrays, bytebuffers, even used the sun.misc.Unsafe to allocate memory myself… and I barely get on par with OOP design…
So I may have done shit… but, I’m not exactly the Java nooby… so I don’t know.

My assumption is this one (DISCLAIMER, this may be totally wrong, it’s just my idea based on empiric observation and I have no proof so take it as it is) : Hotspot aligns the memory to minimize cache misses. There is no guaranty of where the JVM will put your variables in memory. IMO that’s because they may move. And they move according the the most accessed memory.

So … if this is true… having struct won’t help that much. Sure it will minimize memory usage… But I really doubt it will give you a 2,33 perf gain as you claim.

Note that in C++ data oriented design is an immediate perf win. The approach is really interesting, and it’s worth a read or 2.

EDIT : and btw I’d be glad to have comments about “DoD in java” if some of you experimented. I couldn’t find a lot of literature on the topic.

MoffKalast · April 18, 2018, 3:45pm

Well from what I know, java’s Collection.sort() uses a quicksort for large arrays instead of a heapsort despite the latter being in theory a tiny bit better. Due to the locality you talked about quicksort still beats it by a wide margin (heapsort has lots of sparse array accesses) in most cases so I wouldn’t dismiss the fact that the JVM already takes these things into account and optimizes it internally. Whatever you try to do yourself to override that will probably end up worse as you’ve found out.

As Sonarqube code analizer likes to say: “Don’t try to be smarter than the JVM.”

The_Leo · April 18, 2018, 4:12pm

Interesting replies

Well, obviously being memory bound is a system dependant thing.
One can implement the same algorithm in C using structs. If the two got way different performance results, and the original Java algorithm uses lot of array indirection, it’s probably safe to assume it has to do something with it.

danielp · April 18, 2018, 4:43pm

I wasn’t aware that the JVM did optimizations like this so I did a quick search and this turned up:

When moving objects, the parallel collector tries to keep related objects together, resulting in improved memory locality and cache utilization, and leading to improved mutator performance. This is accomplished by copying objects in depth first order.

(The Java HotSpot Performance Engine Architecture - under the “Parallel Young Generation Collector” section)

I’d assume that the other copying/compacting collectors make similar optimizations.

So in practice I’m apparently wrong about performance boosts due to operating on primitive arrays - maybe Hotspot is smart enough about object layout to make manual array layout moot at best. That said, from what I’ve heard newer versions of Java (8, 9) have some (experimental, I think) support for auto-vectorization & SSE. I think you have to use some pretty specific for loops/array access patterns for the JIT to use SSE, however. Take all that with a major grain of salt though - I never found much definitive information on the subject and the last I looked I got the distinct impression that it was still pretty experimental (which may still be the case).

nehon · April 18, 2018, 6:08pm

Well that definitely explains what I witnessed.
So basically… it all boils down to:

It’s funny because, I’ve been trolled so many times at work that the VM is the worst hindrance of java because “overhead”, because “not compiled”, because “whatever is not C++ is slower”. Those same guys telling me that de data oriented design is great for performance, but you need to forget everything you know about object oriented and re learn to code…
Well to some extend, thanks to the VM Java manages to get the best of both worlds… And that is something I’ll thrive to stick in their face…

pspeed · April 18, 2018, 6:43pm

FYI: Entity systems are a direct response to “data oriented designs”. It was originally the whole point, actually.

Because you don’t actually have “pointers” in Java (at least not in the ‘raw memory pointer’ sense), Java can and absolutely will move your stuff around in RAM to suit its needs. It’s hard to predict what that means.

I still remember a piece of C code I wrote back in the late-ish 1990s that did radial signal propagation over terrain. The original guys who wrote it had it running in minutes. (yes, minutes) but they were signal engineers and not coders. I was able to get it to run so fast that the progress dialog was silly.

For fun, I copied that algorithm over to Java since we were writing a Java app at that time. At the time, it ran in about double the time.

I had both algorithms wrapped in command line test programs and for fun every year or two I would take them out and run them. Within two years, the Java version (without recompiling or anything) was twice as fast. Hotspot got some really big improvements around then. By the time I stopped checking, the Java version was fast enough that all of the time was essentially in JVM startup.

Anyway, in a Vector3f[] situation where you just access .x, .y, etc… a significant amount of your time is going to be taken in the out of bounds checks. (This is why the for-each loop is better for arrays.) Beyond that, if you allocated the objects together then I’d expect them to BE together.

And as far a struct implementation, we can only guess of course… but I would bet good money that it will just be syntactic sugar on regular objects. Else reflection and a whole mess of other things will break. So that extra “pointer” has to be in there somewhere.

MoffKalast · April 18, 2018, 7:49pm

Yeah I’m just about convinced that the JVM is the 8th wonder of the world. It literally takes all that planning and clever tricks that an experienced C++ programmer needs to do and does it all for you on the fly. I don’t think it gets any better than that.

Wow I didn’t know those were actually a lot faster. One cannot use that may of them though because of the whole Iterator garbage collection bug (although it was reportedly fixed in J9 or J10 iirc).

nehon · April 18, 2018, 7:55pm

Wasn’t it on List? I don’t think it has ever been an issues with arrays

jayfella · April 18, 2018, 8:15pm

I feel left out I’m stuck here on Android. I can’t even get a lambda expression or a Path. Java is moving at a ridiculous pace lately.

pspeed · April 18, 2018, 8:15pm

For each over arrays does not create an iterator. For each over arrays does not do per item index checks.

It’s the fastest thing, really.

Edit: and this is half the reason that JME’s SafeArrayList was written. (And I should know. :))

danielp · April 18, 2018, 8:23pm

My favorite is the C++11 smart pointers… “GC’s are too slow and induce pauses. I know! Let’s do reference counted smart pointers!”… congratulations! You’ve now replaced GCs with a system that suffers from almost all of the downsides of a GC, is significantly less efficient, and is harder to use properly. Great work there!

nehon · April 18, 2018, 8:28pm

Let me write that up… I don’t know C++ enough to give that kind of answer…

ia97lies · April 18, 2018, 9:49pm

I actually love streams and lambda. https://m.youtube.com/watch?v=ntWdmlrCheY I like the style of this guy, really enjoyable presentation.

The_Leo · April 18, 2018, 11:19pm

I didn’t know for-each loop is faster now. I remember it being considerable slower, so I almost never ended using it. But again that might have been for objects.

Regarding out of bounds checks, I hope the compiler is smart enough to push them outside loop for trivial for loops. If not in interpreted mode, I think JIT does.

pspeed · April 18, 2018, 11:29pm

for( int i = 0; i < array.length; i++ ) {
    array[i].foo();
}

Seems like it would be easy to optimize… and in reality it probably can be. The thing is that the compiler has to look really closely at everywhere that the ‘i’ value can be changed.

for( MyObject o : array ) {
}

…has never been slower. Not since day one. It is always faster.

for( Object o : myList ) {
}

…has never been slower than:

for( Iterator it = myList.iterator(); it.hasNext(); ) {
    Object o = it.next();
}

…because it will generate almost byte-for-byte compatible byte code.

Under some circumstances (even today), the following “might” be faster:

int listSize = myList.size();
for( int i = 0; i < listSize; i++ ) {
    Object o = myList.get(i);
}

But really only if you call it enough times that the Iterator creation would have caused issues.
…it also has strange behavior if the list is modified while iterating.

JME had sooooo many bugs related to that before we switched those places to SafeArrayList.

MoffKalast · April 19, 2018, 7:07am

pspeed:

for( Object o : myList ) {
}
…has never been slower than:
for( Iterator it = myList.iterator(); it.hasNext(); ) {
    Object o = it.next();
}
…because it will generate almost byte-for-byte compatible byte code.

Waitwhat.

pspeed · April 19, 2018, 7:10am

For an Iterable, those are equivalent.

The for-each construct is creating an Iterator in either case, calling next(), hasNext(), etc…

Hopefully this is no surprise.

For arrays, that’s where the magic happens. And that’s why SafeArrayList is so cool… it exposes the array.

for( Object o : safeArrayList.getArray() ) {
}