Hi!
This is JME3 - and yep, it’s definitely difficult to keep the performance up on a voxel engine.
I spent a lot of time optimizing the Java, the GLSL, and importantly, the traffic between the CPU and the GPU. I don’t render any low detail meshes - there is the capability to render point sprites for very distant voxels - though it’s not seen in any of these videos… Mostly because by the time you get out that far, you’re running out of RAM anyways, so it’s not that useful.
16x16x16 seems quite small for a chunk size - if you increase it to 32x32x32 you’ll half the mesh count, and unless your chunk regeneration code is relatively slow, it shouldn’t make much of a performance difference during digging/building.
Most importantly though, you want to be minimizing the amount of data you’re pushing into buffers to send to the graphics card as much as possible.
You’re using a cubed short array - which might be quite large. A short is a 16-bit signed two’s complement integer - so it’s small, but not as small as you can possibly go.
For example, I need to send all sorts of data to the shaders - to provide all the fancy effects that happen depending on voxel type, state (solid, liquid, or gas), temperature, humidity, etc - but I also want to minimize traffic.
So, I could send the voxel state as a short between 0 and 2 inclusive, and the voxel type as another short between (let’s say) 0 and 39 (assuming I have 40 possible voxel types). That would result in 32 bits on the line to the GPU
On the other hand, I could take a single byte, which can express 128 possible values between 0 and 127. Then I get my (voxel type+1) and multiply it by the (voxel state+1). This will give me a value between 1 and 120. I then send this value to the GPU as a single byte. On the GPU I know that the values will be:
1-40: A solid
41-80: A liquid
81-120: A gas
For example, I have a gas voxel of type 29. Gas is 2, so I send the value (29x2 = 58) to the GPU. In the vertex shader I divide the maximum value for this variable (120) by the received value and floor the result - so (119/58 = 2.0689), and the floor that to get a voxel type of 2, a gas. Then I divide the received value by the state to get the voxel type - (58/2 = 29).
In this way I can avoid sending 2 shorts = 32 bits, and instead just send one byte = 8 bits, resulting in a 75% reduction in data sent to the GPU, and a real performance improvement.
I’d advise getting into packing the buffers yourself - reducing their size wherever possible.
You might also consider not sending big floaty texture indices, but instead just sending the voxel type and have constants in the vertex shader to tell you what the texture coordinates are for given voxel types - in this way you can reduce traffic quite a bit.
Obviously there’s a balance between reducing traffic and increasing complexity in the shaders - but I’d definitely recommend looking at every single byte sent down the line to see if you can pack more data in there - in some cases I’ve even managed to pack three values into a single byte!