Multithreading Revisited

@Nyphur, the task stuff in my terrain rendering, it can do "load balancing" using several methods. That doesn't stop stuttering completly though, escp. not with disk i/o. The terrain demo still stutters a lot when something is loaded because of several things (some stuff doesn't use the task framework, the task framework is not really finished yet, disk io, SimpleGame's timer mechanism, etc)

@Darkfrog, priorities help (I use them), but it's still better to do IO intensive tasks when you're not rendering (on a single core). This is because of the way CPUs work (L1 and L2 caches). File IO could be ok, because it's interupt driven leaving threads locked, reducing context switches. But for example decompressing or parsing something or something else memory/cpu intensive will really bring down your FPS.

@Badmi, look at the topic I linked. It would already solve the problem of making sure textures using the same buffer get the same texture id.

@vear, I don't think creating texture ids is the problem. The problem is loading textures from disk, and to a lesser degree uploading them to videocard (which always has to be done from the thread that holds the OpenGL context).

As far as your considerations for threading the scenegraph, that would really only make sense on a multi core system. For a single core system it's always best to just do 1 thing. On multicore systems your suggestions are more intresting, but there are some flaws in your system: updates in jME tend to be hierachical (from Node to Node), while the draw order is usually sorted (based on the TextureState). And you can't update objects until they've been drawn (unless you want to "double buffer" the update results for Spatials).

Right now the draw method only does culling and adding to the RenderQueue. This queue is then sorted at the end (this is the most efficient method available at hand for general purpose sorting, much easier than insertion sorting), and only after it's sorted can you begin to drawing (you have to know which is first to be drawn after all).

So this would result mostly in a single thread doing work while the rest of the threads wait for it to be done. Add to that the need for inter thread communication, locking &c. on which objects can be touched and which not (eg. the update thread waiting for a Spatial to be drawn, so it can be updated), and so on.

I think it would be a much easier, and possible more efficient, to keep the current rendering model, and split the task at hand between different threads.

Take for example updating, you can easily split that over several threads. When a Node has to childeren (escp. if they're two Nodes) just use two threads. There is no need for locking or synchronization.

Same goes for culling, no need to do a lot of locking, if you're clever on how you add object to the renderqueue (not all at once).

Collision detection should be splittable too (though if you want to try your hand at doing things while OpenGL drawing is in progress, this would be a candidate).

Sorting the renderqueue: there are many reasonably simple algorithms for doing multithreaded sorting.

That leaves the actual drawing (of the renderqueue objects).

In theory your idea sounds better (at least that's what you're inclined to think when you read it), and if it all works like it should (having a core doing OpenGL calls all the time) maybe it would be better. But getting that to work well enough… that's no easy task. I think you'll get results faster and more reliable if you add some threads for tasks that are easy to split and don't get in each other's way, and concurrently run other tasks that don't effect more than one object in the scene (such as animation, collision detection) or other tasks (like the texture loading discussed here).

It all depends a lot on the architecture you're on too. Eg. AMD has much memory bandwith and cache per core than Intel (so you could do more IO intensive tasks while rendering on another core), XBox 360 has a special "streaming" cache which is good for IO, while on a PS3 the SPE cores have their own (limited) local memory space.

Well, someone buy me a nice 4 way Opteron and we might find out what's best :slight_smile:

Yes i should have first finish that book on threading, then clear how it applies to graphics card IO and the whole system, before making assumptions. Guess LWJGL/OpenGL is doing whole lot of stuff in software too, besides pushing data to graf card which would not benefit, but absolutely suffer for threading. I know a top FPS game runs 10 threads, one on priority. Now what are the 9 doing if that one most probably does the rendering?

The problem is this line in TextureManeger:

 if (state != null) {
            m_tCache.put(tkey, texture);

and this in Texture:


What I suggest is something sumuler to the folowing:

In Texture:

public Texture createSimpleClone() {
return rVal;


public void load(int unit) {
  Texture tex=texture.getNext();

llama said:

@Nyphur, the task stuff in my terrain rendering, it can do "load balancing" using several methods. That doesn't stop stuttering completly though, escp. not with disk i/o. The terrain demo still stutters a lot when something is loaded because of several things (some stuff doesn't use the task framework, the task framework is not really finished yet, disk io, SimpleGame's timer mechanism, etc)

Hmm. I don't know anything about the internal workings of OpenGL or how different operating systems handle disk IO but is there any time when the computer stalling is unavoidable and if so why? It seems to me that any time a piece of rendering code specifically causes a program to wait for something before continuing, it's a mistake. However, I know there are often hardware reasons why it occurs and I know that with multicore processors and multi-SLI graphics card arrays, these problems are avoided to an extent rather than being addressed.

My idea would be to remove Input, Loading and Rendering into separate threads and have them communicate via common object queues. For example, you would have Input in a separate thread which acts as a buffer for input to the rendering thread by putting all input actions into a queue which the rendering thread has access to. Then you would have the rendering thread do all the dirty work of moving the camera, rendering, culling etc. Objects in the world would be represened by an object containing all the data about the object like the model, texture, name, relevant attributes etc. and also a flag to determine if it has loaded yet. These objects would be placed into a custom renderqueue that checks the flag before rendering and if that flag is set to false, it culls the entire object before rendering. As objects come into range, they are added to the renderqueue if in your FOV as normal but are also added to a loading queue. The loader thread would have access to this queue and would load the model and texture into memory and onto the graphics card. Once on, it would set the flag to true.
This way, rendering and loading can be run independantly and pretty much entirely unsynched through the use of a few public object queues. When multicore programming takes off, they could even be assigned to separate processors.

Why is it that file IO causes stalling anyway?

Could any of the Sound stuff be multi threaded

llama said:

And as said before, loading things onto the graphics card has to happen from the thread which hold the Open GL context.

The key is in what you said, from a thread which holds A context. Multiple threads can have multiple contexts. And here comes the fact which makes my theory round: textures can be shared between the contexts. Here is a quick&dirty background texture loader class:

package threaded;

import com.jme.image.Texture;
import com.jme.renderer.lwjgl.LWJGLRenderer;
import com.jme.system.DisplaySystem;
import com.jme.system.JmeException;
import com.jme.system.lwjgl.LWJGLDisplaySystem;
import com.jme.util.LoggingSystem;
import com.jme.util.TextureKey;
import com.jme.util.TextureManager;
import java.util.HashMap;
import java.util.Vector;
import java.util.concurrent.ConcurrentHashMap;
import java.util.logging.Level;
import org.lwjgl.LWJGLException;
import org.lwjgl.opengl.GLContext;
import org.lwjgl.opengl.Pbuffer;
import org.lwjgl.opengl.PixelFormat;

 * @author vear
public class TextureLoader implements Runnable {
    static Vector loadQueue=new Vector();
    static ConcurrentHashMap finishedQueue=new ConcurrentHashMap();
    static boolean runing=false;
    static boolean working=false;
    private Pbuffer headlessDisplay;
    /** Creates a new instance of TextureLoader */
    public TextureLoader() {
        PixelFormat format = new PixelFormat( 32, 0, 0, 0, 1 );
        try {
            headlessDisplay = new Pbuffer( 160, 100, format, null, null );
        } catch (LWJGLException ex) {
                    "Could not create OpenGL context", ex);
            throw new JmeException("Could not create new OpenGL context");

    public void run() {
        try {
            // init a new GL context with this object for this thread
        } catch(Exception e) {
                    "Could not attach OpenGL context", e);
            throw new JmeException("Could attach OpenGL context");
                    "Texture loader starting");
        // create a new renderer
        //renderer = new LWJGLRenderer(0,0);
        while(runing) {
            while(!loadQueue.isEmpty()) {
                TextureKey tKey=(TextureKey) loadQueue.remove(0);
                LoggingSystem.getLogger().log(Level.INFO, "Background loading texture ");
                Texture t=TextureManager.loadTexture(tKey);
                if(t!=null) {
                    synchronized(finishedQueue) {
                    LoggingSystem.getLogger().log(Level.INFO, "Background loaded texture ");
            try {
            } catch (InterruptedException ex) {
        LoggingSystem.getLogger().log(Level.INFO, "Textureloader finished ");
    public static boolean isRuning() {
        return runing;
    public static void finish() {
    public static boolean isWorking() {
        return working;
    public static TextureKey loadTexture(URL file, int minFilter,
                            int magFilter, int imageType, float anisoLevel, boolean flipped) {
        if (null == file) {
            System.err.println("Could not load image...  URL was null.");
            return null;
        String fileName = file.getFile();
        if (fileName == null)
            return null;
        TextureKey tkey = new TextureKey(file, minFilter, magFilter,
                anisoLevel, flipped, imageType);
        // add it to the job queue
        return tkey;
    public static Texture getTexture(TextureKey tKey) {
        if(!runing) {
            return TextureManager.loadTexture(tKey);
        if(finishedQueue.containsKey(tKey)) {
            Texture tx;
            synchronized(finishedQueue) {
                tx=(Texture) finishedQueue.remove(tKey);
            return tx;
        return null;

And a simple test class:

package threaded;

import com.jme.image.Image;
import com.jme.image.Texture;
import com.jme.math.Quaternion;
import com.jme.math.Vector3f;
import com.jme.scene.TriMesh;
import com.jme.scene.shape.Box;
import com.jme.scene.state.CullState;
import com.jme.scene.state.TextureState;
import com.jme.util.TextureKey;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import threaded.TextureLoader;

 * @author vear
public class TestAsyncTexLoading  extends SimpleGame {
    HashMap<TextureKey,TextureState> neededStates=new HashMap<TextureKey,TextureState>();
  int loadmode=2;       // 0-load all at start, 1-load one texture every update cycle, 2-load in another thread
  public static void main(String[] args) {
    TestAsyncTexLoading app = new TestAsyncTexLoading();
    /** Creates a new instance of TestAsyncTexLoading */
    public TestAsyncTexLoading() {
    protected void simpleInitGame() {
        // start the texture loader
            new Thread(new TextureLoader()).start();
        display.setTitle("Async Texture loading");
        cam.setLocation(new Vector3f(0, 0, 60));

        Vector3f max = new Vector3f(2, 2, 2);
        Vector3f min = new Vector3f( -2, -2, -2);

        for(int i=0;i<10;i++) {
            TriMesh t = new Box("Box", min, max);
            CullState cs=display.getRenderer().createCullState();

            // call init once
            TextureState ts=display.getRenderer().createTextureState();
            t.setLocalTranslation(new Vector3f(i*4,0,0));
            TextureKey tk=null;
            try {
                tk = TextureLoader.loadTexture(new URL("file:c:/temp/8/tex" + i + ".tga"), Texture.MM_LINEAR, Texture.FM_LINEAR, Image.GUESS_FORMAT, 1.0f, false);
            } catch (MalformedURLException ex) {
                    neededStates.put(tk, ts);

    public void simpleUpdate() {
        if(!neededStates.isEmpty()) {
            boolean loadedone=false;
            // there is to be loaded
            HashSet removable=new HashSet();
            Iterator<TextureKey> tki=neededStates.keySet().iterator();
            while(tki.hasNext() && !(loadmode==1 && loadedone)) {
                Texture t=TextureLoader.getTexture(tKey);
                if(t!=null) {
                    TextureState ts=neededStates.get(tKey);
            while(tki.hasNext()) {
                TextureState ts=neededStates.remove(tKey);
                if(!neededStates.containsValue(ts)) {
                    // finished with this TextureState
            if(neededStates.isEmpty()) {
                // finished loading textures, release the texture loader

Use "loadmode" to control how textures are loaded, and see the difference. Threre is still a hick-up when loading the textures in background, for which i have a guess. The draw() method does not spend time on uploading the texture, but i guess when the other thread does it, OpenGL has to pause other drawing activity anyway.

The whole "trick" is in creating a dummy context, and asigning it to the other thread. LWJGL does context handling per thread. From LWJGL javadoc of org.lwjgl.opengl.GLContext:

The class is also thread-aware in the sense that it tracks a per-thread current context (including capabilities and function pointers). That way, multiple threads can have multiple contexts current and render to them concurrently.

This approach could be incorporated into TextureManager, TextureState and Texture. The textures would be created as before, but as an "empty" texture and their loading be deferred by TextureManager. The Texture could have a "ready" attribute, and TextureState would ignore all textures which are not yet ready. The Texture loader thread would set this flag when it is finished with loading. That way the application code would not need this queue-juggling i put into simpleUpdate().

So is it working for you, is it a sound concept?

Yes, LWJGL can create a seperate context per thread (since .97 or so I think). That's a clever trick vear, makes me wish I had a multi core CPU :slight_smile:

When ever I read such discussions about multithreading and limited benefit I remember my old BeBox. It's a two-way PowerPC 133MHz with a PCI graphics card. It could render a 3d cube smoothly and you could just drag and drop a bitmap or a quicktime movie to any of the cube sides with immediate effect. Even with three movies rendering was still smooth.

Nowerdays we have AGP and PCI Express and CPUs running a clock about 20 times higher.

Either we expect too much or we just have some software problems we try to work around.

Well, noone said you can't do that. You can use JFjr to test it  XD

jME-Networking fully supports multithreading by the way (because so does JGN). :wink:


interesting, sorry to speak before the frog.  Does it just have one for receiving and sending?  How does it make use of it

Actually, it doesn't pre-suppose anything about threading.  It supports multi-threading, but can be run in a single thread or multiple threads.  Each MessageServer has an update() method that can be called.  That update() calls updateIncoming() and updateEvents().  You can break it down further by specifically calling updateIncoming() and updateEvents() in separate threads.  The NetworkingClient expands on this with updateIncoming(), updateEvents(), and updateNoop().  NetworkingServer has updateIncoming(), updateEvents(), and updateExpired().  This allows every MessageServer to broken up into at least two threads if necessary.  There are some further break-downs that can occur if desired, but that is little bit deeper discussion. :o


That would only be useful on server side really when you have 12 clients to provide a smooth gamplay don't you almost have to break up incoming and outgoing?

not necessarily…it seems to work perfectly fine in a single thread actually for the most part.

On a server that has multiple cores and/or multiple processors it would likely give you a performance boost to do so though.


Anyways, in a couple of months when i get a new CPU with multiple threads I am gonna try putting the renderer on it's own thread.  It won't speed up rendering of a frame(is it take 20ms it takes 20ms) but it will how ever speed up the time in between renders if you use it right.

But even more on target darkfrog you weren't even talking about putting stuff on seperate threads you were talking about showing progress of game states.  Would you like your hi-jacked forum thread back?

Nah, I don't care…I hijack enough threads myself. :-p


Gaheris said:

epending on how heavy the impact would be we could throw an exception if method using OpenGL isn't called from the renderer thread.

This would help a *lot*.  Ideally the mechanism would use 'assert's so that the checking overhead could be completely eliminated at run-time.  But currently it's really hard to know what is "safe" to do in another thread.  Throwing a big ol' exception would make it really clear.

As I mentioned in another thread, a good start would be for all the Game.start() type main-loop methods to save the current thread someplace global, so that writing an 'boolean inGlThread()' method would be trivial.

Personally I would expect this should be a suggestion to the LWJGL peeps in their JNI wrapper stuff so it will throw an exception if it's not in the correct thread when attempting to make a call that it shouldn't.

By the way: Is there a special reason why a Trimesh can be created outside the GL thread but not a SwitchNode? If a SwitchNode is created in another thread it simply keeps invisible without an error message or exception.

Arg! It's so hard to guess what's safe and what's not!

Is Spatial.updateRenderState() likely to be safe?