Procedural voxel terrain and vegetation generation

Hi!

This is JME3 - and yep, it’s definitely difficult to keep the performance up on a voxel engine.

I spent a lot of time optimizing the Java, the GLSL, and importantly, the traffic between the CPU and the GPU. I don’t render any low detail meshes - there is the capability to render point sprites for very distant voxels - though it’s not seen in any of these videos… Mostly because by the time you get out that far, you’re running out of RAM anyways, so it’s not that useful.

16x16x16 seems quite small for a chunk size - if you increase it to 32x32x32 you’ll half the mesh count, and unless your chunk regeneration code is relatively slow, it shouldn’t make much of a performance difference during digging/building.

Most importantly though, you want to be minimizing the amount of data you’re pushing into buffers to send to the graphics card as much as possible.

You’re using a cubed short array - which might be quite large. A short is a 16-bit signed two’s complement integer - so it’s small, but not as small as you can possibly go.

For example, I need to send all sorts of data to the shaders - to provide all the fancy effects that happen depending on voxel type, state (solid, liquid, or gas), temperature, humidity, etc - but I also want to minimize traffic.

So, I could send the voxel state as a short between 0 and 2 inclusive, and the voxel type as another short between (let’s say) 0 and 39 (assuming I have 40 possible voxel types). That would result in 32 bits on the line to the GPU

On the other hand, I could take a single byte, which can express 128 possible values between 0 and 127. Then I get my (voxel type+1) and multiply it by the (voxel state+1). This will give me a value between 1 and 120. I then send this value to the GPU as a single byte. On the GPU I know that the values will be:

1-40: A solid
41-80: A liquid
81-120: A gas

For example, I have a gas voxel of type 29. Gas is 2, so I send the value (29x2 = 58) to the GPU. In the vertex shader I divide the maximum value for this variable (120) by the received value and floor the result - so (119/58 = 2.0689), and the floor that to get a voxel type of 2, a gas. Then I divide the received value by the state to get the voxel type - (58/2 = 29).

In this way I can avoid sending 2 shorts = 32 bits, and instead just send one byte = 8 bits, resulting in a 75% reduction in data sent to the GPU, and a real performance improvement.

I’d advise getting into packing the buffers yourself - reducing their size wherever possible.

You might also consider not sending big floaty texture indices, but instead just sending the voxel type and have constants in the vertex shader to tell you what the texture coordinates are for given voxel types - in this way you can reduce traffic quite a bit.

Obviously there’s a balance between reducing traffic and increasing complexity in the shaders - but I’d definitely recommend looking at every single byte sent down the line to see if you can pack more data in there - in some cases I’ve even managed to pack three values into a single byte!

4 Likes

How do you keep the complexity of that kind of packing under control?
E.g. if you change the ranges, you’d have to consistently update the Java side (data source) and the shader (sink); I’d expect that to be a source of nasty subtle bugs but maybe I’m too pessimistic?

Well, I haven’t found that problem myself - maybe because in the end there isn’t actually that many values moving from the Java side to the shaders. Maybe nine or ten values per vertex, packed into a few bytes, and thoroughly commented on both sides.

Maybe also because the values being packed are not generally variable ranges. I’ve never introduced a new state apart from solid, liquid and gas, since I first defined them - nor do I expect to (plasma voxels! Now I’ve said it!).

The address range for voxel types is actually 64 - and I only have 38 types currently, so there’s no plan at the moment to expand that - though if I did, yep, I’d certainly want to be careful. The same goes for the temperature and humidity ranges, various booleans - the values vary, but the possible values are constant, so the packing functions very rarely change - in fact, they’re essentially constant themselves.

1 Like

@roleary I wish I knew half as much as you do about the GPU pipeline! I don’t even know where in my code I am sending data to the GPU from the CPU. I think I am programming at a much higher level than you are. I simply have a class called ChunkRenderData that accepts a short[][][] and spits out:

-a float[] for colors
-a Vector3f[] for vertices
-a Vector3f[] for Normals
-a Vector2f[] for TexCoords
-an int[] for indices

Then, I call:

mymesh = new Mesh();

mymesh.setBuffer(Type.Color, 4, BufferUtils.createFloatBuffer(colors));
mymesh.setBuffer(Type.Position, 3, BufferUtils.createFloatBuffer(vertices));
mymesh.setBuffer(Type.TexCoord, 2, BufferUtils.createFloatBuffer(normals));
mymesh.setBuffer(Type.Normal, 3, BufferUtils.createFloatBuffer(texcoords));
mymesh.setBuffer(Type.Index, 3, BufferUtils.createIntBuffer(indices));

    chunk.setMesh(mymesh);

So am I really sending that short[][][] to the GPU? It seems like my use of short[][][] is only making the build time longer, not the render time. Also, where would I even begin? Should I extend the Mesh class and make a bunch of modifications to optimize for voxels? Or, to save your breath, is there just some literature I should dig through?

Thanks,
Andy

Ohh, no - you’re good - you’re pretty much already there! Take a look at this for example:

[java]

mymesh.setBuffer(Type.Color, 4, BufferUtils.createFloatBuffer(colors));

[/java]

Here you’re saying that for each vertex you need 4 single-precision 32-bit floating point numbers to express the color at that vertex… But do you? Really?

If you could settle for 4 values between 0 and 255 (so, RGB and A), then you could instead send 4 bytes, reducing the size of that buffer to 25% of what it is now, which will certainly improve performance.

For example, you could iterate through each of your vertices in order, pushing four bytes onto an ArrayList for each vertex. These four bytes are what you previously had in a color float[] for each vertex. Let’s call this list vertexData. You need to do this in order to know the size you’ll need to allocate for your buffer. Then, you can create the color buffer for the vertex using something like this:

[java]

    final int bufferSize = vertexData.size();

    final ByteBuffer byteBuffer = BufferUtils.createByteBuffer(bufferSize);

    for (i = 0; i < bufferSize; i++) { byteBuffer.put(vertexData.get(i)); }

    byteBuffer.flip();

    mesh.setBuffer(Type.Color, 4, VertexBuffer.Format.Byte, byteBuffer);

[/java]

Alternatively, if you can settle for a limited palette (and maybe you really can’t - but for example!) - then you might just push a single byte in there for a color ID, and a function or array in the shader that maps that ID to an actual color value before applying it… That way you’d reduce your palette, but also reduce your buffer down to a single byte per vertex - resulting in traffic for colors between the Java and the GPU of just 6.25% of what you currently have.

1 Like

@roleary thanks, that is sorta what I was thinking! I was trying to change it from float to byte and it was crashing on some other error. Good to know that I am on the right path! I saw your engine on reddit JUST before seeing it on here, it looks stunning and really shows off the capabilities that Java has with JME3. Best of luck!

Hm. It would be nice if all that encoding and decoding that goes with these optimizations could be shoved off into a lib.
I wouldn’t want to rebuild the shaders just because the number of material types crossed a magical boundary.

Mmm… so… something like a library that does the bit packing and encoding, allocates a buffer with a large enough element size to hold all the bits, and sends that thing.
Plus something that generates GLSL code that decodes the stuff for the shader. Maybe put the decoder into a shader node - the shader node system should be able to do that, but the devil is in the details.

Just tossing around a rough idea.
Not sure if or when anybody wants or needs that.

It would be really nice to have some tools like that. I feel like Voxel engines really flourish in JME3 but the existing libraries like Cubes and such are VERY lacking, especially because they force the developer to operate too high-level which defeats many of the great advantages of going voxel in the first place! Voxel development is meant to be low level, and if we had a library that made low-level voxel mesh generation easier (by automatically using more bytes and less floats as discussed) that would be AWESOME! Using floats for meshes is really only for loaded models, not voxel terrain or terrain in general. Really, it could just be a static Factory that has functions for converting float arrays to byte arrays or whatever with some documentation, maybe tack it on to this:
https://wiki.jmonkeyengine.org/legacy/doku.php/jme3:advanced:custom_meshes

Wow… That seems like a really good idea to me! Certainly in voxel engines, where the vertex count is through the roof, and you tend to have a lot of data you want to send - automagically packing and unpacking the data would be really useful… Interesting!

I could probably modify the core of the buffer generator in my engine to be reusable as a library - and add something to actually generate the GLSL.

I also have functions to break down and free up off-heap buffer memory immediately, as soon as a mesh is dropped from the scene - since I found that when you’re tooling around the world with all the dials in the red (as per usual when you’re really pushing for performance), you want to be releasing resources as soon as possible to free up memory for new buffers.

1 Like

First, to @admazzola, switching to 32x32x32 chunks will probably get you your best performance increase. It’s interesting that when starting Mythruna 2.5 years ago that I experimentally came up with the same sweet spot. That will reduce your object count and drop dispatch considerably. To 25% of previous in the case of 3D chunks.

Second, I thought I would mention that in the case of mesh buffers like Type.Color, you pass a byte array instead of a float array then the shader will still see them as floats. You will have to divide them by 255, but in some cases it may be better than packing and unpacking and it certainly seems more convenient and less error-prone to me.

In fact, I’m kicking myself for never having thought of this before. I may have to do some experiments to see if the extra divide has much impact but the memory savings should be huge in my case and I won’t even have to change much code.

Edit: also @admazzola, where your chunks abut are you rendering those faces or eliminating them also? If you are leaving those in then that’s another huge source of drain also… so I thought I would ask. Eliminating visible faces between chunks is harder and so sometimes skipped but it makes a huge difference.

I wrote a function called getByteArray that converts the Vector4f arraylist into a Byte arraylist. The bytes in the list have values of 0 to 128. When I pass them into mesh.setBuffer, the color results I get are very strange! All of my terrain is white, and some areas fade into yellow. Do I have to change something in the shaders so it reads the color buffer differently? If so, where is that shader code located? I don’t think I have ever seen or touched it before.

[java]
public static List<Byte> getByteArray(ArrayList<Vector4f> colors){
List<Byte> ret = new ArrayList<Byte>();

	for (int i = 0; i &lt; colors.size(); i++) {
		ret.add((byte) ((colors.get(i).getX()*128))  );
		ret.add((byte) ((colors.get(i).getY()*128)) );
		ret.add((byte) ((colors.get(i).getZ()*128)) );
		ret.add((byte) ((colors.get(i).getW()*128)) );
		System.out.println(ret.get(ret.size()-3));
	}		
	
	return ret;		
}

[/java]
[java]
final int bufferSize = c5.size();
final ByteBuffer colorByteBuffer = BufferUtils.createByteBuffer(bufferSize);
for (int i = 0; i < bufferSize; i++) { colorByteBuffer.put(c5.get(i)); }
colorByteBuffer.flip();
if(colorByteBuffer!=null)
mymesh.setBuffer(Type.Color, 4, VertexBuffer.Format.Byte, colorByteBuffer);

[/java]

Thanks,

Andy

@pspeed said: You will have to divide them by 255....

Meaning in the shader… divide the values by 255 or whatever so that they are again in 0-1 range.

Oh - you’re absolutely right - good point!

And yep - it’s not easy to try to filter voxel faces between chunks (I called them blocks in my implementation - I don’t know why - but now it’s confusing to talk about chunks : ).

My approach is for each chunk to actually contain an overlap layer of one voxel all the way around. So while a chunk might (in theory) be meshed as a cube with 32 voxels on each side, the chunk held in memory contains 33 voxels on each side - with the outer layer actually composed of voxels belonging neighbouring chunks (though not all voxel data is included in the overlap).

This means that updates need to update the overlaps too - but I found that many chunks don’t see much activity after meshing - and that doing it this way avoids a awful lot of lookups - especially during terrain generation, meshing, and lighting updates.

Ohhhh so I have to write a .frag and .vert that do the color value division and then attach them to the terrain material. That makes a lot of sense actually… Those should be wrapped into the library too, if one is made!

@roleary said: Oh - you're absolutely right - good point!

And yep - it’s not easy to try to filter voxel faces between chunks (I called them blocks in my implementation - I don’t know why - but now it’s confusing to talk about chunks : ).

My approach is for each chunk to actually contain an overlap layer of one voxel all the way around. So while a chunk might (in theory) be meshed as a cube with 32 voxels on each side, the chunk held in memory contains 33 voxels on each side - with the outer layer actually composed of voxels belonging neighbouring chunks (though not all voxel data is included in the overlap).

This means that updates need to update the overlaps too - but I found that many chunks don’t see much activity after meshing - and that doing it this way avoids a awful lot of lookups - especially during terrain generation, meshing, and lighting updates.

In the long run, I found that the look ups were cheaper than the memory consumption. Overlapping borders buys more problems than it solves and costs a lot (calculate the memory costs yourself… and then the costs of retrieving up to 8 blocks to edit one conjoined corner [if you do it in 3D]). If it’s really an issue, perhaps you can cache some bits in with your cell data to show which sides are solid and avoid a lookup during mesh creation. 6 bits per cell is enough if you have them to spare.

I ended up doing essentially that so that I streamlined my mesh generation and I could use it for light propagation, a* searches, all kinds of stuff.

@admazzola said: Ohhhh so I have to write a .frag and .vert that do the color value division and then attach them to the terrain material. That makes a lot of sense actually.. Those should be wrapped into the library too, if one is made!

Just a .vert, really.

I just made a test with Unshaded to see if there was a net performance drop by doing this and the results are encouraging. I can post some code if anyone is interested (and perhaps will see somewhere I screwed up the benchmark).

Oh, and also - since the values are 0-255 and you’re dividing by 255 in the shader - the buffer type should actually be:

[java]VertexBuffer.Format.UnsignedByte[/java]

Hehe - that’s more like it… Sorry about that!

Even without code, here is 50,000 batched boxes using regular Unshaded.j3md and random float-based vertex colors:

Here is the same image with a modified Unshaded.vert and sending a byte-based color buffer:

The performance difference seems within the margin for error.

@roleary said: Oh, and also - since the values are 0-255 and you're dividing by 255 in the shader - the buffer type should actually be:

[java]VertexBuffer.Format.UnsignedByte[/java]

Hehe - that’s more like it… Sorry about that!

If you are passing arrays directly then just setBuffer(… new byte { my values } ) seems to do the right thing, also.