Attaching large geometry without performance issues

Hey! I’m working on a game that uses voxel graphics to render the game world (sort of like the game Cube World, or Minecraft). The game is stored in chunks of blocks and they get converted to a Mesh and Geometry on a seperate thread. After countless hours of making optimizations I managed to get at least a decent FPS with quite a few blocks on the screen but there’s one issue that I haven’t been able to solve yet.

Since every chunk is turned into one big mesh and rendered at once (since not all blocks are visible this is faster and easier) its attaching quite a large geometry to the root node. This causes a lag spike of about half a second for every chunk that is loaded into the game. The actual generation of the geometry is done in a seperate thread and then passed on to the main thread to be rendered.

Is there any way to either:
A. Speed up the time it takes to attach the geometry
B. Attach the geometry from a seperate thread or place where it does not cause any lag?

1 Like

It sounds to me like your approach may be the root issue:

How is this faster and easier, is it not an extra step ? Often large meshes are cut up into smaller pieces to feed to the GPU, it sounds like you are doing the opposite, which does make me wonder why you are doing it this way?

I tried it with smaller Geometries at first, per block. But i noticed a pretty massive difference in FPS if i didnt combine it into one big mesh. I could try reversing it and seeing if it changes, I did make quite a few modifications since i made that change.

“per block” as in per cube or per voxel is wrong, collecting “cubes” up into “chunks” of say 64x64x64 (number obviously depends on implementation) is the way to go. It’s the middle ground between meshes that are too large to handle, and too many objects.

Moving forward, if you did figure out a way to get a large mesh to the gpu, and you want the user to change it, you will start to hit performance issues again.

A. No.
B. No.

It’s a crappy limitation of the engine and its (I suspect) underlying libraries and drivers.

Even if you make it all properly into chunks like @thetoucher said you’ll still have to attach them slowly one after the other to avoid a bigass freeze (and will still possibly have noticeable lag spikes depending on how large you make the chunk geoms) . Kind of how minecraft loads chunks slowly as well.

There’s absolutely no way around it to my knowledge. I really really fucking hate this issue if I’m completely honest.

I used to do it per block but it was too laggy, right now it stores each chunk as 16x256x16 geometries. The world currently never generates above 90 though so anything above that is pretty much empty.

The system I’m using right now is pretty much the same as Minecraft, I think either I’m doing something wrong somewhere or jmonkeyengine is slow. It would be perfect if there was a way to attach objects in a separate thread and they would only actually render once finished but I’m guessing that’s not possible

Whoa whoa I always thought it was faster to merge everything into one whenever possible? I guess you are just talking about attaching right?

It’s faster at rendering time because you reduce the total number of draw calls, but it’s slower at GPU upload time because you have more data to send across the bus in one go.

When you say upload time, I’m assuming that isn’t something occuring every frame and is that just in relation to adding new geometry to your scene? (Sorry for jacking the thread btw)

Adding one 16x16x16 mesh won’t cause this slow upload to the card. At 16x256x16 it probably will. Find a middle ground. Then find out how many per frame you can add. Remove old chunks first… there are a lot of tricks you have to fiddle with. Don’t use too many threads because the main thread will be bogged down with no cpu time left… so many things that contribute and alleviate this…

Like jayfella said, the bottleneck might be the transfer from main memory to the graphics card memory. If your DMA-bus is overloaded/slow you need to figure out what is least painful for your use case. Uploading to the card must happen on the OpenGL thread, which is the same thread that is issuing draw-calls. There’s just no easy way around that.
However, there might be other things slowing it down too, maybe the engine needs to generate physics collision shapes et c. that is not related to the DMA transfer.

I used to have a system with 16x16x16 chunks, same issue

Currently there is nothing present aside from those couple meshes and a basic fly camera that base jmonkeyengine provides. I haven’t gotten around to making a player yet

Correct, this is in reference to optimizing GPU bandwidth.

How many vertices do you have in your chunk after it’s merged?

This is a video where I employ field of view culling to simplify the scene - another thing you can do to help out the complexity. Each chunk cell is 16x16x16 and then merged into one mesh of 16 layers to make up a single chunk. I had to ensure in this instance that I wasn’t generating a ton of threads. I have a quad core cpu so I give the client 2 threads plus the GL thread and the server has two threads also. This means I should be fine for cpu time. Generation time becomes the issue now. Only add one chunk to the scene per frame. It gives you some alleviation for time. Generate collision data while you’re building the mesh threaded. Save generated data to file to increase regen time. And so on…

Doesn’t JMonkeyEngine already have a basic system for culling? I noticed that objects disappear once they’re out of view (especially visible if you dont call the bounding box update)

Yes. But my scene is probably complex or large enough that removing over half of it helps reducing more of a complex calculation than a simple dot product (which is all the FOV test is).

Yes it does, but what it does is not render scene-graph nodes that are out of FOV. What jayfella does (I guess) is making his own FOV culling to modify the scenegraph and remove/add nodes that should be in the scene.

1 Like