Asynchronousley Tranfer Models to GPU

john01dav · March 19, 2016, 8:47pm

The game I am currently working on has a fairly large number (~64-256) of complex models. I am running into an in that the game stutters when (I think) the models are transferred over the PCI bus to the GPU. As such, I would like some method to do this whilist the game continues rendering with the preexisting models and just shows the new models when they are ready. I am already building the dynamic meshes via an ExecutorService so that part is fully asynchronous. I also have the minimum number of verticies and triangles in my model and the only code that is run on the main thread is the following:

mesh.setBuffer(VertexBuffer.Type.Position, 3, positionBuffer);
mesh.setBuffer(VertexBuffer.Type.Normal, 3, normalBuffer);
mesh.setBuffer(VertexBuffer.Type.Index, 3, indexBuffer);
mesh.updateBound();
mesh.createCollisionData();
geometry.updateModelBound();

I realize that this may not be possible, so, assuming that making simpler or fewer meshes isn’t an option what can I do to ensure that the game continues to run smoothly as new models are transferred to the GPU?

Information (it’s probably irreverent, but it can’t hurt to include it):
GPU: MSI Nvidia GTX 770
OS: Ubuntu 14.04.2 LTS x64 (Linux Kernel 3.16.0-67-generic)

jayfella · March 19, 2016, 8:57pm

Add one model per tick of the game? I have to do this for infinite terrain to avoid the same thing.

john01dav · March 19, 2016, 9:00pm

I tried that (after writing the post) and the FPS drops to ~20 from a stable 60.

FrozenShade · March 19, 2016, 9:12pm

Do you have compressed (DDS) textures?

john01dav · March 19, 2016, 9:12pm

Right now, I have no textures – just a Lighting.j3md with an Ambient and Diffuse color set.

FrozenShade · March 19, 2016, 9:14pm

Put there any textures, even empty ones to see how it work when you need to upload more data.

john01dav · March 19, 2016, 9:15pm

It has no effect, although, all of the models share the same material.

pspeed · March 19, 2016, 9:20pm

How big are the meshes?

Adding one mesh per frame should have virtually no effect unless you are doing some other odd synchronization.

pspeed · March 19, 2016, 9:21pm

For reference, the IsoSurfaceDemo pages in all kinds of geometry and rarely drops frames.

john01dav · March 19, 2016, 9:22pm

On average 34816 verticies and 17408 triangles. The calls to apply the mesh data are in a synchronized block but that is only for the happens-before relationship rather than for any sort of actual locking (there is no way that any locking should occur with the way the code is setup now).

pspeed · March 19, 2016, 9:23pm

And you have up to 256 of these?

Yeah, that’s quite a lot of data. No matter what you will do you will have issues, I guess.

Still, adding one per frame shouldn’t drop your frame rate so badly. I again refer to the IsoSurfaceDemo that has no problems with this.

How are you transferring the mesh/geometry from the background thread to the render thread?

john01dav · March 19, 2016, 9:27pm

Issues occur with only 64 of them.

Each mesh is implemented via extending the Mesh class. The extended class has two methods, one that builds the mesh data (and is called asynchronously) and another that pushes it to JME. Both are synchronized and the task that calls the first method calls a custom enqueue() implementation that only runs one task per frame (I notice issues if it runs more than one task per half-second) to call the JME-pushing method. The actual mesh buffers are just stored as a member of the extended Mesh class.

pspeed · March 19, 2016, 9:46pm

Avoid synchronized. It’s the 50 lb sledge hammer of multithreading and it may be the root of your problem. It can be expensive even when there is no contention… and super-duper expensive if there is.

Really, extending Mesh to do this is dangerous as it opens you up to all kinds of problems… and necessitates really bad things like having to use the synchronized keyword on meshes.

Better is to build the mesh (a regular mesh) in the thread and then add it to a concurrent queue. The render thread can pull one off per frame and attach it.

The IsoSurfaceDemo uses an open source paging library that does this very efficiently.

john01dav · March 19, 2016, 10:18pm

What parts of the mesh class can be used from non-JME threads? These meshes are updated as the game progresses and, as such, updateBound () and such are used. Should I just add and remove the geometries as needed rather than worrying about asynchronous mesh updates?

pspeed · March 19, 2016, 10:58pm

Yes. You cannot update them from another thread without locking the main thread once they are used by the main thread. That’s very bad for performance and totally negates the reason for using background threads.

Only one thread at a time can use a mesh. This includes reading it. So if the background thread has it then the render thread should not touch it at all… unless you gate every single access with a synchronized call.

You are better off just creating new meshes on the other thread when you need to and freeing them again when you are done with them.

At the risk of repeating myself, this is what the IsoSurfaceDemo does and you can fly around with virtually no frame drops… and every single piece of geometry in there (except the trees) are generated at runtime on the fly.

john01dav · March 19, 2016, 11:11pm

All right, thanks. I’ll implement this next time I work on my game (I’m on mobile right now).

thetoucher · March 20, 2016, 6:09am

I ran into the same issue when I did a generated world project a while back. The way I kinda got around it loading in the chunks firstly weighted by visibility (in front of character first and close first …), and did custom LOD loading, so chunks were initially loaded in super low res, and I inclemently added more LOD’s, using the same visibility weighting… all the while adding or changing 1 mesh per tick.

john01dav · March 20, 2016, 1:26pm

I just moved the setBuffer() calls to a background thread (along with updateBound() and createCollisionData()) and the game runs smoothly even when I use plain enqueue() to attach the meshes so that as many as possible get added each frame.

Momoko_Fan · March 20, 2016, 4:06pm

createCollisionData() is probably really slow. You might be able to increase performance by a lot if you create an executor and submit tasks to generate collision data on all CPU cores.
Alternatively, use native bullet physics with the ray cast / collision detection features instead of jME3’s collision system.

updateBound() is also not the fastest thing… If you’re generating geometry you might want to compute the bound at the same time to avoid doing twice the work.