Project JUnion - Struct Types for Java

The_Leo · April 23, 2018, 9:44pm

The wait for Java struct types is over. They are here.
(along with with netbeans & eclipse support.)

You can create a struct by using @Struct annotation.

@Struct
public class Vec3 {
    public float x,y,z;
}

You can create a struct array as:

Vec3[] arr = new Vec3[10];

Or you can wrap a native direct ByteBuffer

Vec3[] arr = Mem.wrap(a, Mem.sizeOf(Vec3.class));

Demo:

Check out github page for detailed description/features: Project JUnion | junion

Enjoy

nehon · April 24, 2018, 6:39am

That’s interesting, and impressive. I looked a bit at the code, I was wondering how you were ensuring data alignment.
Seems you use Unsafe in the end, but yeah can’t see how ou would do it otherwise.
I might test this in my little data oriented design bench.

Could you explain the Pros and Cons of this VS a classic objects. I understand the pros, not sure about the cons. And you know…there is always a catch.

The_Leo · April 24, 2018, 9:51am

When you define a Struct, the compiler check you fields offset.
eg @Struct class A { byte x; short y }
In the above it goes over the fields, sees that x is at offset 0, short is at offset 1
Since short has length of 2 bytes, its offset should be multiple of 2 bytes.
It detects this, and then realign the fields:(includes padding if necessary)
offset 0: short,
offset 2: byte

A byte has an align of 1, short 2, float 4, double 8, etc.
The align of a struct is the maximum align of its members. In this case short: 2
Finally it calculates the struct size as highest offset + fields size = 2 + 1 = 3;
Since 3 is not multiple of struct align(2), it adds one byte of endPadding, to make the size of the struct 4.

Finally, the alignment depends on Unsafe.alloc to return address that is multiple of 8. Since 8 is the highest alignment requirement.
Eg. on 64 bit systems, Unsafe.alloc returns addresses aligned to 16 bytes.

Regarding pros and cons:
-pros:

less memory usage,
better performance than objects
very similar performance to primitives
can wrap native direct byte buffers
index checking , null reference checking

-cons:

not ideal to store Java object references (still have to test performance for this)
depends on Unsafe

There are plenty of other cons, but these are related to current version and are subject to change:

currently 1D arrays are supported
Struct Constructor, methods, not supported
Cannot allocate single instance
No stack allocation

javasabr · April 24, 2018, 10:08am

It’s one of reasons why I wait for release of the project valhalla

The_Leo · April 24, 2018, 1:58pm

It seems I was worried over nothing. I’ve made a performance test which showed Java reference speed is just as fast within struct types as in object types.

Also added support for generics, so you can write:

ArrayList<Vec3> list

javasabr · April 24, 2018, 2:21pm

Do you create byte buffer converters on runtime or compiletime?

The_Leo · April 24, 2018, 2:52pm

Are you asking about what the Mem.wrap does?

ByteBuffer a = ByteBuffer.allocateDirect(10*Mem.sizeOf(Vec3.class))
   .order(ByteOrder.nativeOrder());

Vec3[] arr = Mem.wrap(a, Mem.sizeOf(Vec3.class));

It reads the pointer of a direct native bytebuffer.
Changes to the returned array are reflected in the original buffer.

The_Leo · May 6, 2018, 6:07pm

Released version 1.1.1

Added:
- stack allocation for single instances
- maven, gradle support

Other info:
I was also reading project Valhalla updates slideshots (link here):

value types: not mutable, pass-by-value only (slide 9, 13)

Currently, my implementation uses pass-by-reference, which I think is more useful, but there is one catch: If you lose reference to allocated array it will be freed/or free it but still have references to the array elements, they are now wild pointers.
This is not an issue if you stick to accessing the data from the array, or ensure the array objects stay alive as long as needed.

There are plenty of fun things one can come up within a project like this. (Btw. there is one design rule I decided upon: Not to extend Java syntax. So every new feature has to use Java syntax.) Here are a few ideas that might come up as features later.

The obvious ones: true multi dimensional arrays & slices, struct constructors, methods, useful API methods, doc, tests, etc
Remember the problem above, about the wild pointers. I had an idea how to go about this one. Obviously for performance reasons one cannot check every dereference. But performance is important for release, when fixing such bug one does not prioritize performance. Thus to allow a compile argument which enables wild pointer checks. You can enable it for debugging and that should help a lot.
Structs are useful for math: eg Vec3, Quat, Matrix, etc. Especially on such objects we would like … … … Can you guess? (gnidaolrevo rotarepo [read in reverse]). Yes, exactly that. With that feature done, math in Java would look finally more like math should. I would however add this feature for structs only, because structs are not polymorphic. Since my rule is not to extend Java syntax, it becomes a little more tricky to implement this. In other words the syntax has to be as minimal as possible but existing IDEs should not report it as an error.

That’s it for this overly long post, if you have any comments, ideas, feel free to share. (Also post any benchmarks if you do some!)

iamcreasy · May 13, 2018, 12:36pm

@The_Leo I don’t quiet get how you calculated the memory requirement of array of 1000 Integers type: (4 + ~12 + ~4)*1000 + ~16

I understand the ~16 part since it’s the header information for every instance. But can you explain where does 4, ~12 and ~4 comes from. Is one of the 4 represents the byte length of int?

The_Leo · May 14, 2018, 1:29pm

@iamcreasy Yes one 4 represents byte length of int. A reference in Java can be 4 or 8 bytes long. 32bit Java 4 bytes, 64 bit 8 bytes. With compressed oops(default), it is 4 bytes on 64 bit systems.

new Integer[1000];

Integer arrays stores references to Integer objects. Assuming 4 bytes per reference, 4000 bytes + ~16 for array header, total 4016 bytes.

Once the array is filled with numbers, there will be 1000 instances of Integer. Now how many bytes does Integer take?

To store int we need 4 bytes + Object header ( ~ 12 bytes). that is (4+~12) = 16 bytes.
So that is 16000 bytes. With array that is

16000+4016 = 20016 = (4+~12)*1000+~4*1000+~16 = (4+~12+~4)*1000+~16

iamcreasy · May 15, 2018, 7:42am

Learned a few new things from your post. Thank you sir.

In the Performance Test chart, why the direct native buffer is just as slow as Java Object array?

The_Leo · May 15, 2018, 9:48am

If you check the source code of DirectByteBuffer, you will find that eg:

   private ByteBuffer putFloat(long a, float x) {
        if (unaligned) {
            int y = Float.floatToRawIntBits(x);
            unsafe.putInt(a, (nativeByteOrder ? y : Bits.swap(y)));
        } else {
            Bits.putFloat(a, x, bigEndian);
        }
        return this;
    }
    public ByteBuffer putFloat(float x) {
        putFloat(ix(nextPutIndex((1 << 2))), x);
        return this;
    }

As you can see a lot of work is done inside the direct byte buffer. It is checked whether it is aligned or not, then another check for endianness.

This makes the DirectByteBuffer more of a generic class, as it can edit unaligned or aligned, bigendian or little endian buffers.

My library offers to edit Native order DirectByteBuffers, the data is also aligned, thus it does not need to perform these checks and can modify the buffer with performance of primitive types.