What counts as a scene graph update?

I’ve been trying to optimize things lately by using an executor thread pool and using that to process certain parts of game logic - so the eternal “Scene graph is not properly updated for rendering” may come into play quite often.

So far I’ve managed to escape from its infuriating grasp, but to stay ahead I would like a clarification as to what exactly counts as a scene graph update.

The current list would be this I guess:

  • attaching and detaching from/to a root-attached node

  • setting any transformations on a root-attached node

I’ve recently realized that setting cull hints causes no problems although it seems like a thing that should be an issue. Is this bulletproof or was that just my luck?

I guess there is a misunderstanding.
ANY operation on the scene graph is unsafe from another thread.
The error “Scene graph is not properly updated for rendering” you can see some time is just an error the engine raise when you “Obviously” modified the scene graph between the update phase and the render phase. But that’s just the one the engine can raise with no doubt.

When I say any, even reading from the scene graph is unsafe, because you can’t know in what phase you will read data from it, and this can result in inconsistant states.
Every operation MUST be done on the render thread. So basically if you are computing things on another thread, enqueue every scene graph manipulation to the app. (app.enqueue(Callable / Runnable))

Changing cull is definitely an operation on the scene graph.
Changing a material parameter is an operation on the scene graph (objects are ordered depending on the materials params, a material change during the sort operation can crash the engine)
Anything that relates to spatials, materials, post processes, shadows is unsafe to use from another thread.

The one thing that is safe, is loading assets with the asset manager, BUT, loaded assets still need to be attached to the scene graph on the render thread.

1 Like

Well that really depends on the situation, doesn’t it? If you’re looking up (for example) transformation info about a node that never moves and always stays attached you should be in the clear right?

Yeah that part I got clear, otherwise I’d have a complete nuthouse on my hands :smile:

I guess that makes sense. It does also however seem a bit like a design flaw as far as the future of the engine goes with single cores being about as fast as they possibly can physics-wise so we’ll than likely only see much higher core count CPUs in the future. Well unless light-based circuitry will be backwards compatible.

Attaching/detaching in general seems extremely problematic performance wise with large amounts of nodes at a time for Lightspeed, and I’ve recently heard the same problem being faced by Spoxel as well for chunk updates.

Unity for example, is developing some sort of Jobs System thing that basically lets you run major parts of games in parallel. I don’t know the details though but it sure seems like a step in the right direction. Perhaps we could follow suit?

Oh hey I actually have an odd, completely unrelated architectural question about this, if you don’t mind sticking it in this thread that has just met a dead end otherwise.

Let’s say you have multiple calls to the said manager like so:

1 CPU.game.getAssetManager().loadStuff(stuffpath);
2 CPU.game.getAssetManager().loadStuff(stuffpath);
3 CPU.game.getAssetManager().loadStuff(stuffpath);

Now from what I’ve recently learnt at university, deep down the CPU would go to a DMA controller and ask if it can be arsed to fetch that stuff from the HDD.

Then the CPU switches threads or processes until it gets the data back, having the loading thread stuck at call #1 until the data is loaded… So. What if we went along it this (approximate) way instead?

Future stuff1, stuff2, stuff3;

stuff1 = executor.submit(()->{CPU.game.getAssetManager().loadStuff(stuffpath);});
stuff2 = executor.submit(()->{CPU.game.getAssetManager().loadStuff(stuffpath);});
stuff3 = executor.submit(()->{CPU.game.getAssetManager().loadStuff(stuffpath);});

while(!stuff1.isDone() || !stuff2.isDone() || !stuff3.isDone());

That would in turn force the CPU to continue along the thread and sent multiple requests in short order to the DMA, then wait at the end of the thread.

The question would be, would then the HDD serve the data faster, knowing what it has to do before the old data has to go there and back again for the 1st request? I’m mostly just asking this because Lightspeed currently takes about two years and a half to fully load from a cold start.

I know it’s probably really really context dependent but I would need more than a hunch to take the time to go out and test it on a large scale.

You can pretty much make a blanket rule that you should never modify or read anything not created in the thread you are in regardless of what you are doing. Very similar to static methods but with an added restriction on reading inputs. As far as I’m aware jme doesn’t differentiate from the standard threading model and its rules.

Well sure if you don’t happen to need anything done at all or have neatly encapsulated calculations for something. That’s really not very helpful.

The standard threading model also describes volatile and synchronized objects, mind you.

I get what you mean I think. It has the chance to reduce its travel distance by giving it all instructions at once.

In a SSD the change would be less than nothing. In a mechanical HDD it might be very marginally faster but the problem here is the HDD limits. They are “single threaded” in any event and reducing the travel of the head by a few microseconds really isn’t the issue you need to solve.

Right, I felt like the savings would be minimal. I guess just buying a faster HDD is the only option here :smile:

Ok. Well I guess i would have to ask for a specific situation where you would find this a problem. If I need data from the main thread, I can’t give it the data directly unless it’s a primitive, so I have no choice except to make my own copy.

https://i.imgur.com/rPNCqww.jpg

If I had to copy every single thing I use on other threads I might as well buy another few sticks of ram for my entire userbase.

References will do fine.

As you wish.

Actually this is a bit wrong. I should say “lives in” rather than “created on”.

Until another thread moves it :p. A spatial that never moves/change or w/e is just a very very very particular case. And even in this cas I wouldn’t assume it’s safe and read the transforms from another thread, because if in 6 month you decide to change that behavior (like… in the end it’s a flying mountain), and that you totally forgot that you were reading from another thread… it’s gonna be a problem. As a rule of thumb perform scene graph operation on the render thread.

About the HDD and DMA stuff, it goes a bit beyond my knowledge, I’m afraid. What I know is that in practice, reading chunks of data from the same file from multiple thread will be slower than reading it from a single thread, as the HDD is a lot faster at reading data sequentially than switching from random positions.
Note that the HDD reading in the loading process is not the most consuming, data is most often read in one chunk and loaded into the memory instead of streaming small chunks.
In theory anyway, asset loading from multiple thread is not necessarily faster, BUT, it’s totally thread safe. Also it does load the data in an asynchronous way letting the render thread update without fps drop.
However in practice I always noted better performances when multi threading loading.

Well that’s true, but as far as I know those chunks are usually set up to be 4kb in size. That’s just about literally nothing when loading game assets.

Hmm interesting. I also have it setup to load async in a single thread to stop the game from not responding at all for a minute…but it’s definitely a lot slower than it would be in the main thread in my experience.

Ha, I think you guys have a slightly wrong idea about the typical thread I run. These are for the most part really short utility functions that run 10-100 ms tops. If I can determine with sufficient reliability that something won’t be changed in that time I think I can live with it. Especially in cases where two or three of those functions will be launched a fraction of a second after the last one completes to fix any lingering change problems due to ongoing update requests.

But yeah I’ll make sure all scene graph stuff is in enqueues. I’m just not going to copy anything much since that will just give me more overhead than I would’ve gained by threading the thing anyway.

If you are checking a position just pass the primitives and not the object. That is to say vector.xyz and not the vector itself.

Well sure I do that sort of thing in a case when I need one or a handful of them. Not really when I need 500 of them.

I don’t make the rules :slight_smile:

Perhaps…the rules make you? :wink:

@jayfella is right though - if you’re going to do this safely you have to use volatile/synchronized fields/methods, and it’s very, very, very, very easy to introduce subtle and very nasty deadlocks and other subtle bugs. Accessing data that isn’t protected by volatile/synchronized means there are no guarantees about memory visibility and you may read data that’s in an inconsistent state. Using classes from java.util.concurrent and “multi-threaded” design patterns (pub/sub, work queues, one-off tasks, etc.), and minimizing the number of “access points” between threads greatly reduces the likelihood of setting yourself up with some very nasty threading bugs (and your code will be much cleaner and probably will be much more performant anyway).

Is the file size architected by yourself? Because reading a single large file once is a lot faster than reading many single files. Linearly. I’m just thinking of other ways to circumvent the issue.

True on the CPU-side, but OpenGL is single threaded, all OpenGL calls must be on the same thread that owns the OpenGL Context. It is a hard problem to make that multi threaded, especially if the framework should be backwards compatible to OpenGL 2 or GLES. Hand crafted memory barriers and such is ofc possible but not very easy and only makes sense if you set the minimum requirement to OpenGL 4.2 core or higher IMO. I think the most common problem (at least for me) is that uploading meshes/textures to the GPU takes time and stutters the rendering. Sparse texture arrays and those tricks would be nice to have in the engine but I don’t know how I would go about including that.