Project JUnion - Struct Types for Java

The wait for Java struct types is over. They are here.
(along with with netbeans & eclipse support.)

You can create a struct by using @Struct annotation.

@Struct
public class Vec3 {
    public float x,y,z;
}

You can create a struct array as:

Vec3[] arr = new Vec3[10];

Or you can wrap a native direct ByteBuffer

Vec3[] arr = Mem.wrap(a, Mem.sizeOf(Vec3.class));

Demo:

Check out github page for detailed description/features: Project JUnion | junion

Enjoy :slightly_smiling_face:

9 Likes

That’s interesting, and impressive. I looked a bit at the code, I was wondering how you were ensuring data alignment.
Seems you use Unsafe in the end, but yeah can’t see how ou would do it otherwise.
I might test this in my little data oriented design bench.

Could you explain the Pros and Cons of this VS a classic objects. I understand the pros, not sure about the cons. And you know…there is always a catch.

1 Like

When you define a Struct, the compiler check you fields offset.
eg @Struct class A { byte x; short y }
In the above it goes over the fields, sees that x is at offset 0, short is at offset 1
Since short has length of 2 bytes, its offset should be multiple of 2 bytes.
It detects this, and then realign the fields:(includes padding if necessary)
offset 0: short,
offset 2: byte

A byte has an align of 1, short 2, float 4, double 8, etc.
The align of a struct is the maximum align of its members. In this case short: 2
Finally it calculates the struct size as highest offset + fields size = 2 + 1 = 3;
Since 3 is not multiple of struct align(2), it adds one byte of endPadding, to make the size of the struct 4.

Finally, the alignment depends on Unsafe.alloc to return address that is multiple of 8. Since 8 is the highest alignment requirement.
Eg. on 64 bit systems, Unsafe.alloc returns addresses aligned to 16 bytes.

Regarding pros and cons:
-pros:

  • less memory usage,
  • better performance than objects
  • very similar performance to primitives
  • can wrap native direct byte buffers
  • index checking , null reference checking

-cons:

  • not ideal to store Java object references (still have to test performance for this)
  • depends on Unsafe

There are plenty of other cons, but these are related to current version and are subject to change:

  • currently 1D arrays are supported
  • Struct Constructor, methods, not supported
  • Cannot allocate single instance
  • No stack allocation
1 Like

It’s one of reasons why I wait for release of the project valhalla :slight_smile:

It seems I was worried over nothing. I’ve made a performance test which showed Java reference speed is just as fast within struct types as in object types.

Also added support for generics, so you can write:

ArrayList<Vec3> list
1 Like

Do you create byte buffer converters on runtime or compiletime?

Are you asking about what the Mem.wrap does?

ByteBuffer a = ByteBuffer.allocateDirect(10*Mem.sizeOf(Vec3.class))
   .order(ByteOrder.nativeOrder());

Vec3[] arr = Mem.wrap(a, Mem.sizeOf(Vec3.class));

It reads the pointer of a direct native bytebuffer.
Changes to the returned array are reflected in the original buffer.

Released version 1.1.1

Added:
- stack allocation for single instances
- maven, gradle support

Other info:
I was also reading project Valhalla updates slideshots (link here):

  • value types: not mutable, pass-by-value only (slide 9, 13)

Currently, my implementation uses pass-by-reference, which I think is more useful, but there is one catch: If you lose reference to allocated array it will be freed/or free it but still have references to the array elements, they are now wild pointers.
This is not an issue if you stick to accessing the data from the array, or ensure the array objects stay alive as long as needed.

There are plenty of fun things one can come up within a project like this. (Btw. there is one design rule I decided upon: Not to extend Java syntax. So every new feature has to use Java syntax.) Here are a few ideas that might come up as features later.

  1. The obvious ones: true multi dimensional arrays & slices, struct constructors, methods, useful API methods, doc, tests, etc
  2. Remember the problem above, about the wild pointers. I had an idea how to go about this one. Obviously for performance reasons one cannot check every dereference. But performance is important for release, when fixing such bug one does not prioritize performance. Thus to allow a compile argument which enables wild pointer checks. You can enable it for debugging and that should help a lot.
  3. Structs are useful for math: eg Vec3, Quat, Matrix, etc. Especially on such objects we would like … … … Can you guess? (gnidaolrevo rotarepo [read in reverse]). Yes, exactly that. With that feature done, math in Java would look finally more like math should. I would however add this feature for structs only, because structs are not polymorphic. Since my rule is not to extend Java syntax, it becomes a little more tricky to implement this. In other words the syntax has to be as minimal as possible but existing IDEs should not report it as an error.

That’s it for this overly long post, if you have any comments, ideas, feel free to share. (Also post any benchmarks if you do some!) :slightly_smiling_face:

2 Likes

@The_Leo I don’t quiet get how you calculated the memory requirement of array of 1000 Integers type: (4 + ~12 + ~4)*1000 + ~16

I understand the ~16 part since it’s the header information for every instance. But can you explain where does 4, ~12 and ~4 comes from. Is one of the 4 represents the byte length of int?

@iamcreasy Yes one 4 represents byte length of int. A reference in Java can be 4 or 8 bytes long. 32bit Java 4 bytes, 64 bit 8 bytes. With compressed oops(default), it is 4 bytes on 64 bit systems.

new Integer[1000];

Integer arrays stores references to Integer objects. Assuming 4 bytes per reference, 4000 bytes + ~16 for array header, total 4016 bytes.

Once the array is filled with numbers, there will be 1000 instances of Integer. Now how many bytes does Integer take?

To store int we need 4 bytes + Object header ( ~ 12 bytes). that is (4+~12) = 16 bytes.
So that is 16000 bytes. With array that is

16000+4016 = 20016 = (4+~12)*1000+~4*1000+~16 = (4+~12+~4)*1000+~16
1 Like

Learned a few new things from your post. Thank you sir.

In the Performance Test chart, why the direct native buffer is just as slow as Java Object array?

1 Like

If you check the source code of DirectByteBuffer, you will find that eg:

   private ByteBuffer putFloat(long a, float x) {
        if (unaligned) {
            int y = Float.floatToRawIntBits(x);
            unsafe.putInt(a, (nativeByteOrder ? y : Bits.swap(y)));
        } else {
            Bits.putFloat(a, x, bigEndian);
        }
        return this;
    }
    public ByteBuffer putFloat(float x) {
        putFloat(ix(nextPutIndex((1 << 2))), x);
        return this;
    }

As you can see a lot of work is done inside the direct byte buffer. It is checked whether it is aligned or not, then another check for endianness.

This makes the DirectByteBuffer more of a generic class, as it can edit unaligned or aligned, bigendian or little endian buffers.

My library offers to edit Native order DirectByteBuffers, the data is also aligned, thus it does not need to perform these checks and can modify the buffer with performance of primitive types.

1 Like