Multithreaded JME Core

Have been holding off attempting to multithread the JME core, the urge is now getting too great.



Idea of this thread is to evaluate WHAT and could be multithreaded and HOW.



The mechanism is simple using two threads.

The first thread is the master and just delegates work to the other thread. The master thread will be the only thread making OpenGL calls. If the second thread is busy when the master thread discovers work, it must either put it in a queue ( design idea ) or it must process itself.



The second thread MUST NOT make OpenGL calls, this limits it to checking frustrum intersects and, collision detection, event processing ( delegating the call back to master thread if OpenGL calls would be made ), obstruction detection ( is this occlusion ?, when a geometry completly obscures another geometry )



The threads would need to cooperate with each other. Design patterns do exist for this ( junctions, gates ).

For a new update cycle to start, both threads need to have completed processing the last update cycle.



The reason in avoiding both threads in making OpenGl calls may not be the best way, but it is the simplest way to avoid OpenGl grumbling and avoiding setting locks for isWritingToOpenGL



Any thoughts on this approach ?

What can be done and when on the second thread ?



Edit LWJGL is probably a better referenced term here than OpenGl






technically, culling can be done using multiple threads. checking collision and such which is very similar to culling can also be done using multiple threads.

In my view queues are not a good solution to threading. The good solution is to separate the threads and make them independent of each other. What i favour is multiple threads all doing all kind of work, but in separable stages of rendering frames, and on different data sets. While one thread renders the previous frame, the other is already updating the the next one. Updating, physics, culling all can be delegated to threads, if the work to be done is big enough.



The first step to multi threaded core is to make the engine thread-safe. There are lots of static variables, and some data structures (the collision tree for example), that is changed in process of querying it. What is the current state of jME 2.0? Has this work already begun?

Problem with games is that the game loop depends on a lot of tasks executing in a certain order which makes it difficult to multi-task.

Simply separating the rendering and updating won't do anything, the render thread still has to wait till the update thread gives results so that it can render, which means you're not really gaining anything.

The idea is to execute multiple independent tasks in multiple threads, so e.g when you need to calculate pathfinding for mobs all worker threads start working on that and get a list of mobs each to calculate pathfinding for.  

Momoko_Fan said:

Problem with games is that the game loop depends on a lot of tasks executing in a certain order which makes it difficult to multi-task.
Simply separating the rendering and updating won't do anything, the render thread still has to wait till the update thread gives results so that it can render, which means you're not really gaining anything.
The idea is to execute multiple independent tasks in multiple threads, so e.g when you need to calculate pathfinding for mobs all worker threads start working on that and get a list of mobs each to calculate pathfinding for.  


Actualy one thread updates, culls and gets the render queues ready. Then waits for the other thread to finish rendering, then it renders its queues, while the other thread updates, culls, and prepares its own queues.

Ofc, heavy-duty pathfinding can always be run on multiple threads, but its not core functionality.

Culling and updating can be executed on multiple threads, if the branches of the scene are big enough to warrant the overhead of managing the threads.
vear said:

Actualy one thread updates, culls and gets the render queues ready. Then waits for the other thread to finish rendering, then it renders its queues, while the other thread updates, culls, and prepares its own queues.


As you say yourself, in that case the threads just wait for the other thread to finish work. As for performance that would not make things better but even worse taking into account the synchronization and other threading overheads. As for heavy mob calculation and other stuff like that, you can already multithread that using StandardGame and the Java ThreadPoolExecutor. I did things like that already.

Multithreading the jME core is an overall bad idea I guess, for it will only make problems because synchronization is not an easy thing to do and many projects I have seen failed to do so and had to perform a fallback to singlethreading.

All in all, multithreading is something that should be done with StandardGame and a ThreadPoolExecutor at the application level, not in the framework itself.

just my 2 cents.

All my game logic is running in separate thread(s) and update the jme scenegraph (Pathfinding, AI processing, movement calculations).



It gives me higher performance on my quad-core :wink:

The easiest to do (in jME itself) would probably be splitting up update/cull/etc over multiple threads, to cut down their time in the render loop.



For that you would only need some changes in Node (or you could even make your own extended Node), and a threadsafe math framework. For that, this is a must-read thread: http://www.jmonkeyengine.com/jmeforum/index.php?topic=6184.0

vear said:

Actualy one thread updates, culls and gets the render queues ready. Then waits for the other thread to finish rendering, then it renders its queues, while the other thread updates, culls, and prepares its own queues.

All in all, multithreading is something that should be done with StandardGame and a ThreadPoolExecutor at the application level, not in the framework itself.

This is not multithreading the core.

[/quote]
Multithreading the jME core is an overall bad idea I guess, for it will only make problems because synchronization is not an easy thing to do and many projects I have seen failed to do so and had to perform a fallback to singlethreading.
[/quote]
Multithreading the core will boost FPS, something much needed. What of the projects that worked.
You do not require synchronisation in the java keyword synchronisation sense, there are better ways to do it.


vear said:

In my view queues are not a good solution to threading. The good solution is to separate the threads and make them independent of each other. What i favour is multiple threads all doing all kind of work, but in separable stages of rendering frames, and on different data sets. While one thread renders the previous frame, the other is already updating the the next one. Updating, physics, culling all can be delegated to threads, if the work to be done is big enough.

Originally liked this approach when analysis began, but after analysing concluded that too many threads can actually slow down a system. As Some games already have a few threads ( networking, spatial preperation, file loading, AI calcualtions, pathfinding), it is prudent to use few as possible so that the developer isnt restricted in their own use of threads.

Momoko_Fan said:

Simply separating the rendering and updating won't do anything, the render thread still has to wait till the update thread gives results so that it can render

Both threads can dual process on similar tasks ( eg culling ). Agree that there are several stages of an update and one of these is Render which is likely that only one thread can do, can the other thread do anything whilst the master thread is rendering ??, this might even be preparing work for the next update cycle

[/quote]
llama said:

The easiest to do (in jME itself) would probably be splitting up update/cull/etc over multiple threads, to cut down their time in the render loop.

For that you would only need some changes in Node (or you could even make your own extended Node), and a threadsafe math framework. For that, this is a must-read thread: http://www.jmonkeyengine.com/jmeforum/index.php?topic=6184.0

Just beat me to posting a reply, gonna need to take a moment to digest ThreadLocal impacts ..

There have been similar discussions by the developers about multi-threading jME.  The best way of accomplishing this in my opinion is to simply abstract away the OpenGL thread away from direct reference in the game itself.  For example, instead of making direct calls via a render loop to draw something in OpenGL you use Object representations of any OpenGL states in your game.  This allows you to make thread-safe calls to a Queue that can be modified at any time…even while the OpenGL thread is current parsing it.  The way I think it should be done also changes some fundamental ways we think about "rendering" though as well by changing to a state system that is modified rather than redrawn per update.  This way you are not thinking about FPS anymore but rather updates per second.  If nothing changes then nothing needs to be re-drawn.  Further, when one thing changes it simply modifies the state that is re-drawn by the renderer instead of re-processing everything.  This would very likely trim back the work done significantly.

darkfrog said:

There have been similar discussions by the developers about multi-threading jME.  The best way of accomplishing this in my opinion is to simply abstract away the OpenGL thread away from direct reference in the game itself. 

Do we not need to keep the hooks in place so that people can do direct effects with OpenGl, would this approach impact GLSL shaders ?

darkfrog said:

This allows you to make thread-safe calls to a Queue that can be modified at any time...even while the OpenGL thread is current parsing it. 

Feel uncomfortable about this - what if there is a change to a geometry and the GPU has already been sent the bytes. Is this delegated to the next cycle ??

darkfrog said:

The way I think it should be done also changes some fundamental ways we think about "rendering" though as well by changing to a state system that is modified rather than redrawn per update.  This way you are not thinking about FPS anymore but rather updates per second.  If nothing changes then nothing needs to be re-drawn.  Further, when one thing changes it simply modifies the state that is re-drawn by the renderer instead of re-processing everything.  This would very likely trim back the work done significantly.

Perhaps we should do some profile analysis to substantiate this. A complex scene perhaps and analyse what proportion of time spent in/out of OpenGl.
Would there be any advantage in this approach in a fast moving scene with lots of camera rotation.

One more try to convince You. I rewrote jME to my own liking, thus its a branch version. I'll explain its structure in regards to threading.



Frame

This is the central object in regards to constructing a single on-screen image. It controls the process of updating, culling, rendering and calling methods of plugged in gamestates. In multithreaded environment there can be multiple (2 makes sense) Frame objects, in different stages of processing. Each Frame object restarts its processing after it finished with the previous frame. Frame objects are independent of each other, each one has its own thread, its processing contexts, render queues, etc. Frame objects are synchronized on using the renderer, each one has exclusive access to the renderer in the rendering stage. But no access to OpenGL is done outside rendering phase whatsoever.



Model objects

These objects are immutable during the normal run of the game. There are no SharedMesh objects, all of the Geometry is a model, and is reusable multiple times in the scene. This also means that doing queryes on model geometry are thread safe.



Scene objects

Subclassed from Spatial, the scene calls no OpenGL modifying code. The scene state is separated from rendering state by duplication of data.



Renderable objects

Everything that needs to be fed into rendering is represented by Renderable objects, even lights and even sounds. In old jME these objects would be called Batch. These objects are the link between the scene state and the rendering state. Gathering of Renderable objects is done into Frame specific queues, and the processing is later done from these queues, with no access to the scene. The Renderable objects hold two sets of transforms, one for the updating Frame, and another for rendering Frame. Data held in RenderState objects is also duplicated this way.



AppContext, UpdateContext, CullContext, RenderContext

Methods called in the framework receive one of these. These hold references to classes needed to get some job done, and also control access. For example access to OpenGL is only possible, if the RenderContext is accesible. Each of these has a thread associated, when using a single thread only, all the contexts reference the same thread. There can be more Update and CullContext objects per Frame object, depending if the update or culling decided that a sub-node of the scene is big enough to be worth processed in additional thread. The total number of children of nodes in the scene is kept track. New threads are used for culling only if the total number of children is above some threshold, there are idle culler threads, and only if the delegated work is more than 1/3 but less than 2/3 of the work the parent thread has to do. This prevents delegating small chunks of work to new threads, or delegating so many work, that the parent thread is left with no work to do. All the result of cull threads is collected into local queues and are merged with queues in the Frame object when a given cull thread finishes all its work. That is the only time synchronization is needed during culling. With this scenario, there is no possibility to render directly, only trough the queues. Queues are used not only to gather Geometry to be rendered, but also lights and sounds. These are later processed with special renderpasses. Note that multiple cameras are supported for culling, each one filling queues specified in the renderpass using that camera. So big part of the work needed to render to textures is moved from rendering stage to the culling stage, so the rendering stage becomes more tight.



Preprocessing

Once the queues are filled, preprocessing the queues is done. For example lights are sorted. Renderpasses are sorted based on their dependency on each other. GameState methods are called in this phase, so custom preprocessing of passes and queues is also possible.



Rendering

This begins with locking the renderer, which potentialy pauses the thread. Once the renderer is locked, all the materials which need update, are updated, the renderpasses are called with their associated queues, and actual rendering is done from the queues, without access to the scene. Note that while pausing is a possibility, OpenGL does queue commands sent to it, so the end of this stage is not necessarily the moment when the frame is shown on screen.



This may all read complicated, i may be wrong with this design, but my idea was to get all the work gathered, sorted and processed all in a proper stage. Creating Callable objects and making OpenGL calls from wherever the programmer feels like is not what a performant game engine is about.


Interesting ideas Vear, completely different approach to the problem. Have you any benchmarks to compare the standard JME to your own, if so please list.



I was really trying a simpler approach at just dual processing the scenegraph, one thread creates display lists whilst the other does xyz…






u wrote ur own branch of jme? thats nice~ ive always wanted to do that. its really annoying how jme barely uses any interfaces.



but anyways, i would love to c some numbers here.

theprism said:

Do we not need to keep the hooks in place so that people can do direct effects with OpenGl, would this approach impact GLSL shaders ?


The problem with this is that we would have to create Object representations to support every aspect of feeding of information into OpenGL...including shaders.

theprism said:

Feel uncomfortable about this - what if there is a change to a geometry and the GPU has already been sent the bytes. Is this delegated to the next cycle ??


We could always use a CopyOnWrite or double-queues so a snapshot is taken at beginning of rendering, etc.  We have a lot of options here including specifying a nanotime on the beginning of the render and each instruction has its own update time associated that would get ignored until the next cycle if it was created or modified after that rendering cycle had already begun.

theprism said:

Perhaps we should do some profile analysis to substantiate this. A complex scene perhaps and analyse what proportion of time spent in/out of OpenGl.
Would there be any advantage in this approach in a fast moving scene with lots of camera rotation.


In a fast moving scene you would end up setting a maximum number of updates per second (peak FPS if you will), which would end up being pretty similar on the load as a typically game today, but do you really think things are changing in even a fast-paced game consistently at 60 times per second or higher?  Seems pretty wasteful to be constantly updating the screen if nothing has changed.  To me I find it concerning because I would rather have that CPU power to do AI, Networking, or run something else in the background than processing something that hasn't changed and gives me zero benefit other than keeping my processor toasty warm. :)

Thanks DF.



Think there is mileage in some flag to signafy hasSceneGraphChanged.



To elaborate, you may have various nodes in the scene graph. One node is close up, contains players erratically running around and doing irratic things ( anyone like the dance emotive ). Some nodes are background, hold more static world data ( terrain, trees rocks ). Tress might be animated, maybe belong in a different structure - but terrain and rocks are mostly static.



Does the above sound like a new type of node, something with state added, eg if you attach a child, it stores a flag to say so - or is it already like this…





Object representation of shaders doesnt yet exist, not sure if it will ???. If not, can we discount this ??







pondering on your other aspects …

looks like this is split in two directions now



First is how to multithread -  what, when, how and why



Second is to prepare for efficiency - what and how


We also have to realise that we are now in JME version 2.0. Drastic changes may be appropiate for 3.0 but not for 2.0, guess that means if it breaks existing code for 2.0 then it should be a consideration for 3.0.



What can we do with 2.0 to start the multithreaded system going without breaking existing code ?.

theprism said:

We also have to realise that we are now in JME version 2.0. Drastic changes may be appropiate for 3.0 but not for 2.0, guess that means if it breaks existing code for 2.0 then it should be a consideration for 3.0.

What can we do with 2.0 to start the multithreaded system going without breaking existing code ?.


2.0 is not final at all, so if there are some changed needed to accomodate multithreading I wouldn't rule that out. You'll need to proof there is a benefit to it though (not just argue about theoretical queue systems and the like).
llama said:

2.0 is not final at all, so if there are some changed needed to accomodate multithreading I wouldn't rule that out. You'll need to proof there is a benefit to it though (not just argue about theoretical queue systems and the like).


I agree...I think 2.0 is perfect for something like this if we can prove that it has enough advantages.