How does one reduce the lag when attaching huge geometries to the rootNode?

Ben1 · April 7, 2014, 9:08pm

Hi, my project runs smoothly except for the one second that it attaches geometries to the rootNode. I’m talking about the hundreds of thousands of triangles kind of geometries. It’s running fine after that attachment second. The problem is that I cannot preload those models, because they’re loaded on the fly.

I’ve read some people load the assets (j3o model, textures, etc…) in a thread and when isDone() they attach them to the rootNode. I succesfully implemented this threading model and found it was not faster at all in this case.

I also read that some people got lucky by not generating mipmaps for textures, but it doesn’t change anything in my case (only got 3 textures as of yet).

I know it’s impossible to attach geometries outside of the main loop, so I’m kind of stuck here. Of course, the assetManager already caches the very few j3o model and textures I use, so it’s definitely not loading or interpretation time that causes such lags but probably more the number of triangles it has to upload to OpenGL. I’ve read OpenGL is threadless and thus cannot receive triangles and draw at the same time, which really sucks. Isn’t there a way I could reduce the lag when attaching big meshes to the rootNode?

Thx for reading

normen · April 7, 2014, 9:11pm

Not really, no. The problem is most probably what you say, uploading the data to the GPU. Your only choice right now is to make smaller chunks that you upload one frame by frame.

Ben1 · April 7, 2014, 9:21pm

Hi Normen

…just for the record: I also made sure it was not the GeometryBatchFactory taking those 1-2 seconds and it turns out it is not. GeometryBatchFactory at worst when there are EXTREMELY huge amounts of geometries (million triangles) to batch only takes 100ms or so. It’s not the issue.

OK, that’s an interesting avenue to make smaller chunks. I could also maybe push them in different frames so that OpenGL can breathe in between updates and at least give the player a few fps in between.

I’ll try this. I +1’ed you, thank you for sharing your knowledge, it’s very appreciated.

I’m frustrated to discover that OpenGL has such an absurd limitation. Can you believe this in 2014 that it still can’t render and receive data at the same time? Especially with PCI-E ports, it’s so much bandwidth but the limitation pushes me to understand why computers use so much RAM nowadays compared to before. Basically, I’d have to preload gigabytes and gigabytes of mesh data in RAM if I wanted to completely avoid the hiccups. It makes no sense, especially with trendy open world games we’re seeking nowadays.

normen · April 7, 2014, 9:24pm

Technically you can with OpenGL too, the recommended way to multithread is to use multiple contexts but jME doesn’t support this atm. However, depending on the actual hardware and driver implementation even doing that might yield exactly the same result.

pspeed · April 7, 2014, 9:27pm

@.Ben. said: Hi Normen :D
…just for the record: I also made sure it was not the GeometryBatchFactory taking those 1-2 seconds and it turns out it is not. GeometryBatchFactory at worst when there are EXTREMELY huge amounts of geometries (million triangles) to batch only takes 100ms or so. It’s not the issue.

OK, that’s an interesting avenue to make smaller chunks. I could also maybe push them in different frames so that OpenGL can breathe in between updates and at least give the player a few fps in between.

I’ll try this. I +1’ed you, thank you for sharing your knowledge, it’s very appreciated.

I’m frustrated to discover that OpenGL has such an absurd limitation. Can you believe this in 2014 that it still can’t render and receive data at the same time? Especially with PCI-E ports, it’s so much bandwidth but the limitation pushes me to understand why computers use so much RAM nowadays compared to before. Basically, I’d have to preload gigabytes and gigabytes of mesh data in RAM if I wanted to completely avoid the hiccups. It makes no sense, especially with trendy open world games we’re seeking nowadays.

This is not an OpenGL issue. Graphics pipelines frown on memory that can be written to and read from at the same time. Modern cards can support it to some degree but at great slow-down.

Everything else is the bus limit… which is what you might be hitting.

If you are adding one big geometry and get a one-two second slowdown then you know it’s probably bus related. Do note that my nVidia card sometimes pauses for 2-30 (yes 30) seconds when I’m maxing out performance and it’s running hot.

Ben1 · April 7, 2014, 9:46pm

Hi Paul

I must be missing something but… the BUS limit? Are you serious? With a couple million triangles? PCI-E 2.0 supports up to 5 Gbps lol… how can a couple megabytes of vertices data fill a 5 Gbps bandwidth? At worst, I blindly assume JME3 stride is 32 bytes per vertex (not sure) thus we’re talking about 60MB of data sent to the GPU for 2 million vertices. That’s 15% of the bandwidth… I mean, is it me or it’s VERY unlikely to be the bottleneck here?

pspeed · April 7, 2014, 10:05pm

@.Ben. said: Hi Paul :D
I must be missing something but… the BUS limit? Are you serious? With a couple million triangles? PCI-E 2.0 supports up to 5 Gbps lol… how can a couple megabytes of vertices data fill a 5 Gbps bandwidth? At worst, I blindly assume JME3 stride is 32 bytes per vertex (not sure) thus we’re talking about 60MB of data sent to the GPU for 2 million vertices. That’s 15% of the bandwidth… I mean, is it me or it’s VERY unlikely to be the bottleneck here?

Well, I guess if your model has no textures and they don’t need mipmaps and they are already in the form the graphics card needs them… then it must be something else. Perhaps gremlins… or as I said your GPU is overheating swapping things around or whatever. I’m not kidding about that because my app has frozen for 30+ seconds often enough that I put a timer deep in the JME rendering code to dump a log when a frame takes more than 100 ms to render.

60 MB of data is a lot to copy, move across the bus, shuffle into VRAM, etc… nevermind the other stuff. Still, 1-2 seconds… seems like mipmap generation.

pspeed · April 7, 2014, 10:08pm

Note: with objects that large you may find that even culling and unculling them will take a time hit. Some people have this problem with Mythruna where even when the scene is fully loaded, just looking around can cause significant frame lag as things come into and out of view again.

Ben1 · April 7, 2014, 10:16pm

I understand 30$ graphics cards could overheat easily, but I mean… we’re not kidding ourselves here, are we? I have a GTX 650 Ti… all the computer case fans are STOPPED (not idling; completely stopped!) and it’s blowing air from the power supply COLDER than the room temperature. My GPU could care less about 2 million vertices. I’ve run the same project with more than 70 million triangles and would still give me 15fps. It’s really the fact that OpenGL doesn’t draw and receive triangles at the same time (like you said, GPU RAM can’t be read and written at the same time I guess). It’s either one or the other, but not both at the same time.

I’m currently trying Normen’s suggestion to attach big mesh models by SMALLER CHUNKS (like instead of batching 50 models, do 5 batches of 10 models) and I’ll also try to upload them to the GPU in 5 different frames instead of all in the same frame to see if it makes a difference and I already know it will, because on some parts of my project, I use the same models but in less instances and it doesn’t hiccup like that.

I’ll post my results here in a few minutes.

pspeed · April 7, 2014, 10:46pm

Yeah… that’s the way it has to be done on OpenGL… that’s the way it has to be done on DirectX… or Xbox… or PS4… or PS3. (Actually on last gen consoles it’s even worse.)

They all have gremlins. It will be nicer for culling anyway.

Here are the changes I made to my local copy of JME. This is in RenderManager:

If you make that change and you don’t see the log message during slow downs then you can be sure it is not OpenGL or Bus related or Gremlins. Otherwise, that’s where the rubber hits the road, essentially. So if copying your 60 MB into the driver, shuffling it across the bus, and readying it for rendering is taking a long time then you will see that output.

I assume you are also using textures. Do they have NoMips specified as part of their mag filter? That could be another thing to try.

Ben1 · April 8, 2014, 12:33am

Hi again, thank you for continuing to provide additional information, Paul.

Concerning mipmaps, see OP:

@.Ben. said: I also read that some people got lucky by not generating mipmaps for textures, but it doesn't change anything in my case (only got 3 textures as of yet).

I tried different solutions concerning the mipmap generation, but it’s not affecting the issue at all. My video card must be too fast for this to even make a difference. In my mind it’s absolutely not the bandwidth, but it’s really the VRAM that was written too much at the same time (too many triangles sent to it in the same frame). I have succesfully implemented what @normen said: make smaller chunks (e.g. 5 batches of 10 models each instead 1 batch of 50 models) and so I made a Callable thread generate the vertices (for the 10 models) and return the Node (batched of course) to the update loop and in the update loop, once any thread isDone() I attach the resulting Node. I make the program breathe by only starting a new concurrent thread every 500ms and so what happens is that twice a second, you can see lines of 10 models progressively filling up the space until all the 50ish models are all attached. No more hiccups and +70fps (well… to be honest, it still drops to like 40fps while it attaches the models, but hey… it’s only for 2.5 seconds so WTH… who cares, it’s playable now!)

Thank you Paul and Normen, your support is very appreciated. I gave you +1 too Paul, because you still contributed.

THE MORALE is: Do not attempt to attach more than like 100000 triangles in a single update frame or you WILL get a fps hiccup, so try to be smart about it and split the attachments in smaller batches of geometries.

Thx.