Asynchronousley Tranfer Models to GPU

The game I am currently working on has a fairly large number (~64-256) of complex models. I am running into an in that the game stutters when (I think) the models are transferred over the PCI bus to the GPU. As such, I would like some method to do this whilist the game continues rendering with the preexisting models and just shows the new models when they are ready. I am already building the dynamic meshes via an ExecutorService so that part is fully asynchronous. I also have the minimum number of verticies and triangles in my model and the only code that is run on the main thread is the following:

mesh.setBuffer(VertexBuffer.Type.Position, 3, positionBuffer);
mesh.setBuffer(VertexBuffer.Type.Normal, 3, normalBuffer);
mesh.setBuffer(VertexBuffer.Type.Index, 3, indexBuffer);
mesh.updateBound();
mesh.createCollisionData();
geometry.updateModelBound();

I realize that this may not be possible, so, assuming that making simpler or fewer meshes isn’t an option what can I do to ensure that the game continues to run smoothly as new models are transferred to the GPU?

Information (it’s probably irreverent, but it can’t hurt to include it):
GPU: MSI Nvidia GTX 770
OS: Ubuntu 14.04.2 LTS x64 (Linux Kernel 3.16.0-67-generic)

Add one model per tick of the game? I have to do this for infinite terrain to avoid the same thing.

I tried that (after writing the post) and the FPS drops to ~20 from a stable 60.

Do you have compressed (DDS) textures?

Right now, I have no textures – just a Lighting.j3md with an Ambient and Diffuse color set.

Put there any textures, even empty ones to see how it work when you need to upload more data.

It has no effect, although, all of the models share the same material.

How big are the meshes?

Adding one mesh per frame should have virtually no effect unless you are doing some other odd synchronization.

For reference, the IsoSurfaceDemo pages in all kinds of geometry and rarely drops frames.

On average 34816 verticies and 17408 triangles. The calls to apply the mesh data are in a synchronized block but that is only for the happens-before relationship rather than for any sort of actual locking (there is no way that any locking should occur with the way the code is setup now).

And you have up to 256 of these?

Yeah, that’s quite a lot of data. No matter what you will do you will have issues, I guess.

Still, adding one per frame shouldn’t drop your frame rate so badly. I again refer to the IsoSurfaceDemo that has no problems with this.

How are you transferring the mesh/geometry from the background thread to the render thread?

Issues occur with only 64 of them.

Each mesh is implemented via extending the Mesh class. The extended class has two methods, one that builds the mesh data (and is called asynchronously) and another that pushes it to JME. Both are synchronized and the task that calls the first method calls a custom enqueue() implementation that only runs one task per frame (I notice issues if it runs more than one task per half-second) to call the JME-pushing method. The actual mesh buffers are just stored as a member of the extended Mesh class.

Avoid synchronized. It’s the 50 lb sledge hammer of multithreading and it may be the root of your problem. It can be expensive even when there is no contention… and super-duper expensive if there is.

Really, extending Mesh to do this is dangerous as it opens you up to all kinds of problems… and necessitates really bad things like having to use the synchronized keyword on meshes.

Better is to build the mesh (a regular mesh) in the thread and then add it to a concurrent queue. The render thread can pull one off per frame and attach it.

The IsoSurfaceDemo uses an open source paging library that does this very efficiently.

What parts of the mesh class can be used from non-JME threads? These meshes are updated as the game progresses and, as such, updateBound () and such are used. Should I just add and remove the geometries as needed rather than worrying about asynchronous mesh updates?

Yes. You cannot update them from another thread without locking the main thread once they are used by the main thread. That’s very bad for performance and totally negates the reason for using background threads.

Only one thread at a time can use a mesh. This includes reading it. So if the background thread has it then the render thread should not touch it at all… unless you gate every single access with a synchronized call.

You are better off just creating new meshes on the other thread when you need to and freeing them again when you are done with them.

At the risk of repeating myself, this is what the IsoSurfaceDemo does and you can fly around with virtually no frame drops… and every single piece of geometry in there (except the trees) are generated at runtime on the fly.

All right, thanks. I’ll implement this next time I work on my game (I’m on mobile right now).

I ran into the same issue when I did a generated world project a while back. The way I kinda got around it loading in the chunks firstly weighted by visibility (in front of character first and close first …), and did custom LOD loading, so chunks were initially loaded in super low res, and I inclemently added more LOD’s, using the same visibility weighting… all the while adding or changing 1 mesh per tick.

I just moved the setBuffer() calls to a background thread (along with updateBound() and createCollisionData()) and the game runs smoothly even when I use plain enqueue() to attach the meshes so that as many as possible get added each frame.

createCollisionData() is probably really slow. You might be able to increase performance by a lot if you create an executor and submit tasks to generate collision data on all CPU cores.
Alternatively, use native bullet physics with the ray cast / collision detection features instead of jME3’s collision system.

updateBound() is also not the fastest thing… If you’re generating geometry you might want to compute the bound at the same time to avoid doing twice the work.