Thread efficiency vs time per frame deficiency - opinion welcomed :)

Hi all!
(Hope I posted in the right section).

I am developing a project in which several geometries are placed at the scene in simpleInit. Those geometries are movable by pressing the KEY_LEFT or KEY_RIGHT and when such a key is pressed, a different object moves them around (call this object “Mover”).
The movement is done “bits by bits” using the updateLoop (I imagine this is standard). With each iteration, he updateLoop checks for a field in Mover which says whether to move the geometries or not, and if that field says “move them”, a method in Mover is called, moving all geometries. When the movement is finalized, the field is set to “don’t move them”, and the updateLoop therefore doesn’t call the moving method.
(I hope I was clear :slight_smile: )

However,
I have a somewhat bizzare need to create multiple other geometries in the background and attach those to the rootNode when they are done. I create many of those almost all the time. These geometries are seperate from the ones we move.

After the scene has been initialized, all updates must be done from the update loop. But making the update loop go over creation of multiple geometries at every iteration seems inefficient, so I have created a “Geometry Factory” which creates geometries via threadding, i.e., it keeps creating geometries, each with a different thread in order to not “interfere” with the updateLoop’s operation.
This “Factory” is set and run from the simpleInit method and therefore runs in parallel with the update loop.
I used Future object (from java’s Cuncurrent) to tell when a geometry is done, inside this Factory. At each iteration, the update loop checks a fixed size array in that Factory for “ready” objects. If the geomery is ready, it is then attached to the rootNode.

However, this seems to kill the update loop anyways. The Fps drops tremendously from about 350 to 20 or even less. I am running about 10 threads in parallel.
The Fps seems to increase again if I use Thread.sleep(1000) for each geometry creation thread.

The entire thread creating process was done similarly to the advanced tutorials, using an Executor.

It seems plausible that creating many threads somehow impacts the performance, but is it that extreme? for constant creation of geometries “in the background”, the Fps drops sometimes to 0.

Is it possible to create things in the background (perhaps many things), but somehow telling jmonkey that the Fps must be kept high? even at the cost of the other threads getting “hurt”?

I would attach the code but it is quite long and I think the question is more “idea-based” than “code-based”.
If I’m wrong I would be happy to post the code.

Thanks so much in advance,

Amir.

Only glancing over… first of all, you’ve turned something that could have been completely asynchronous into almost synchronous by having to do a (potentially thread-blocking) call every frame.

Send some kind of request object to a ThreadPoolExecutor to build the geometry and then send them to a ConcurrentLinkedQueue when done. Then in your update just pull one or two off the queue at a time (if they are there at all)

Geometry geom = myResultQueue.poll();
if( geom != null ) {
scene.attachChild(geom);
}
…or whatever.

…because your other problem is that if lots of geometry is ready at once then you will still kill the rendering thread. Better to add one or two only per frame.

2 Likes

Thanks a lot (as always) for your answer! I will try your solution. :slight_smile:
I’m not sure what calls might be thread-blocking in my code (if any) but I will look into it.

Amir.

I did an initial implementation of the Queue you suggested, and it indeed boosted up my Fps tremendously. I still don’t exactly understand where is the difference (I mean, I am still quite unsure as to where did I block threads by calling the Future objects every time from the update loop), but I will try and look further into it. As you all know, Jmonkey is awesome.

Thanks again!!

OK, here’s the thing:
Oddly (?) when I load a simple colour into the Geometries, they load very quickly in the thread-based code.
I can load them “in stock”, I mean, dozens of them every second, and it’s still OK.
But when I put “heavy weighted pictures” (say, jpgs of over 1MB) as textures to the Geometries, the game starts to be sluggish (this shouldn’t happen, since I’m doing all ther loading work from another thread, with a newly instantiated AssetManager!)
The Fps is above 80 (which is good) but the movement doesn’t happen smoothly, if at all (it’s like the game is non responsive to movement sometimes).

You’ve already been of a very large assistance so I’d be totally OK if you don’t waste more time on this issue, but if you have any idea as to why this happens (why the “heavyweightness” of the image even effects the issue? It’s on a totally different thread! :slight_smile: )
it’d be great.

Thanks again (even if no new replies come),

Amir.

Are you only adding one per frame?

The frame hit will be from transferring the data over to the GPU for the first time. Note the size in meg of the JPG doesn’t really matter it’s the X by Y size that matters as the jpg will already have been expanded. How big is your texture? It sounds pretty huge if it’s a 1 meg jpg but it’s hard to assume.

1 Like

I followed your suggestion, this is what happens in the updateLoop:
[java]
if(_cr.shouldMove())
{
_cr.move(tpf);
}

Geometry g;

if((g = _queue.poll()) != null)
{
_rcl._handle.attachChild(g);
_rcl.howManyCreated++;
}
[/java]

where _cr is a CubeRow object (holds all the cubes and handles the movment) and _rcl is an object which holds the “ThreadCubes”, i.e., the Geometries that were created via threadding. I’m only adding one per frame.

the texture is simply a jpg, the one that gets the thing stuck is with dimensions 1920x1200. This is really a lot (I know) but I thought this shouldn’t matter because the thread handles it all.

Thank you!

The thread will load it into RAM but when you attach it to the scene it still has to be handed off to OpenGL which will transfer it over the bus to the GPU… this will take time. I guess especially if it has had mipmaps added or whatever.

Still, I’m not exactly sure how much extra time we are talking about. It shouldn’t take “seconds” for example.

Also if you have added and removed a bunch of objects then the GC may pick that time to reclaim those objects and the native handles associated with them. That native reclaimation loop only does 100 at a time or so but if there are thousands ready then it could bog you down for a few frames, I guess. (This is all assuming you are running recent JME).

1 Like

I limited the creation to 8 Geometries at a time, but I’m trying to create them constantly.

The situation is as follows: I instantiate 8 different threads almost at the same time, but I’m having a thread pool size of 1 so only one of those is carried at a time. When they are all done (all Geometries created) I wipe everything out (detach the geometries) and start over - create 8 new threads with the executor… etc (this process goes on indefinitely).
If I’m not putting a “Thread.sleep(SOME_TIME)” command when call() is done, this is created super rapidly.
If I’m not making the threads sleep at all (pushing it to the extreme), here is what I’m getting:

No images (set the material to a random color): Fps after 3 seconds at around 95, movement of movable geometries smooth.
Very small images: 32x32 pizels, image size 1.67KB - Fps after 3 seconds at around 80, movement of movable geometries smooth almost constantly.
Small images: 170x150 pixels, image size 3.9KB - Fps after 3 seconds at around 53, movement of movable geometries usually smooth.
Medium images: 422x317 pixels, image size 17.4KB - Fps after 3 seconds at around 25, movement of movable geometries sluggish but works.
Large images: 600x600 pixels, image size 106KB - Fps after 3 seconds at around 12, movment of movable geometries extremely sluggish, sometimes non responsive.
Extremely large images: 1920x1200 pixels, image size 245KB - Fps after 3 seconds at around 2, movement of movable geometries completely non responsive.
Also note: with the previous cases (especially no image or small image) the geometries were created “super fast”. In this scenario, it takes obvious and visible time to load the texture, which causes the geometries to appear slower than before. I’m still not quite sure why this should impact the Fps of the main update loop.

It seems the image size decreases the performance.
This is my first attempt at threadding (with jmonkey). So anything you said and might say is very helpful even if the problem is not solved as it helps to focus.

Can I ask why you are even doing this? Why create and destroy so many objects?

Sure, thats a legitimate question :slight_smile:
This is a “test case program” I’m using to study the limitations of threadding.
In my “real” game, which has a far lengthier code (this is why I stripped the issue to the core), I sometimes have a situation in which some objects are moved and in response ~20 geometries (with jpg picture texture) need to be created and placed in the scene as soon as possible.
I cannot perdict when the player will do so. He/She might not move at all for a minute, but then suddently want to move 7 times in a row. Then, a set of ~20 geometries need to be created. I cannot used the previous ones.
I “threadded” my main program and it froze in the update loop so I created this “trial” to see why.
So far it seems closely related with the size or dimensions (as you said) of the images.
I hope it help to clarify my odd code :slight_smile:

Things I can think of that might have been a problem (sorry if I’m missing something, I have read just the initial description):

  1. Calling future.get() from the update loop without checking future.isDone() before. Defeats the whole purpose.
  2. Starting the worker threads at the same priority. You MUST put them at a lower priority, else they’ll compete with the main thread and slow it down on a round-robin fashion. If the update loop is slowing down proportionally to number of background threads, that’s likely your problem.
1 Like

Thanks toolforger for your suggestions.
I tried setting the Thread priority of the workers to minimum and the render to maximum (via Thread.getCurrent().setPriority(Thread.MAX_PRIORITY)) but it didn’t have any effect. Also, I’m not using future objects anymore, I’m simply working with a concurrent linked queue which is shared between threads.

I feel there is no way around this, so I have stripped the code to the bare minimum in a way which still leaves the problem. I would be very grateful if you could illuminate any blatant errors I’m making. The current code has the behaviour I mentioned - small images work quickly, heavy images kill the render thread resulting in Fps of 1 or 2. Which is… not enough.
The coding conventions are not fully met here so the code would be more readable.

This is as simple as it gets. So… a final attempt at salvaging my sanity :slight_smile:

The main thread, with SimpleInit etc, creating 10 cubes at a time:

[java]
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import com.jme3.app.SimpleApplication;
import com.jme3.scene.Geometry;

public class Main extends SimpleApplication
{
private ConcurrentLinkedQueue<Geometry> queue;

private ScheduledThreadPoolExecutor exec;

private int howmany = 0;

public static void main(String[] args)
{
	(new Main()).start();
}


public void simpleInitApp() 
{
	// Init the queue and executor with a pool of 1.
	queue = new ConcurrentLinkedQueue&lt;Geometry&gt;();	
	exec = new ScheduledThreadPoolExecutor(1);
	
	createCubes();
}

public void simpleUpdate(float tpf)
{
	Geometry g;
	
	if((g = queue.poll()) != null)
	{
		rootNode.attachChild(g);
		howmany++;
	}
	
	if(howmany == 10)
	{
		rootNode.detachAllChildren();
		howmany = 0;
		createCubes();
	}
}

private void createCubes()
{
	for(int i = 0; i &lt; 10; i++)
	{
		new ThreadCube(exec, queue);
	}
}

}
[/java]

The ThreadCube object, which creates the cube in a different thread:

[java]
import java.util.concurrent.Callable;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ScheduledThreadPoolExecutor;

import com.jme3.asset.AssetManager;
import com.jme3.asset.plugins.FileLocator;
import com.jme3.material.Material;
import com.jme3.math.Vector3f;
import com.jme3.scene.Geometry;
import com.jme3.scene.shape.Box;
import com.jme3.system.JmeSystem;

public class ThreadCube implements Callable<Object>
{
private ConcurrentLinkedQueue<Geometry> queue;

public ThreadCube(ScheduledThreadPoolExecutor exec,
				  ConcurrentLinkedQueue&lt;Geometry&gt; queue)
{
	this.queue = queue;
	exec.submit(this);
}

public Object call()
{
	// This is a code for getting a new AssetManager object (cannot instantiate it).
	AssetManager assetManager = JmeSystem.newAssetManager(Thread.currentThread().
														  getContextClassLoader().
														  getResource("com/jme3/asset/Desktop.cfg"));
	Box b = new Box(Vector3f.ZERO, 1, 1, 1);
	Geometry g = new Geometry("Box", b);
	Material mat = new Material(assetManager, "Common/MatDefs/Misc/Unshaded.j3md");
	assetManager.registerLocator("C:\\void\\", FileLocator.class);
	mat.setTexture("ColorMap", assetManager.loadTexture("b.jpg"));
	g.setMaterial(mat);
	
	queue.add(g);
	
	return null;
}

}
[/java]

Am I doing something inherently wrong?

Hmm… is AssetManager threadsafe?
If it’s using static variables, it most likely is not, unless it’s taking special care - such as using ThreadLocal types and/or the synchronized keyword.

Putting the submit() call into the constructor can be confusing to reviewers, constructors are usually just used to set parameters, they don’t “do” anything. That’s just a stylistic gotcha, and not relevant to the problem we’re having here.

I’m not sure why you’re using a ScheduledThreadPoolExecutor; I’m not seeing the threads needing to repeat themselves. (Maybe that’s because the program was slashed down to its minimum.)
Anyway, I’m not 100% sure about ScheduledThreadPoolExecutor’s behaviour, in particular what it does when more threads than the allocated 1 pool slot are waiting for execution.

One thing that I notice is that you’re constantly creating more threads.
I think (suspect) that ScheduledThreadPoolExecutor repeats its tasks automatically, so the submitted therads never die and the total thread count increases until some bottleneck kicks in.
I tend to single-step into the JDK classes and see what they’re actually doing - I found that while the docs are usually accurate, my understanding of them sometimes isn’t, and a quick debugging session can flush such misunderstandings quickly.
A breakpoint inside the Callable and looking at the higher frames in the stack and see how they were called can help, too.
But that’s just shots in the dark; normally, I’d fire up a debugger myself, but I’m a bit too sick for that (but fit enough to blabber incoherent stuff in the hopes that it will help you with a new perspective and find out new things, maybe even the cause of your troubles).

2 Likes

toolforger, really, your help is much appreciated!
If you’re sick then get some rest! And drink tea. Or beer :wink: Unless the gratitude of a fellow programmer is making you fell better :slight_smile:

I did fire up the debugger several times but haven’t done so with the “trimmed” code, so I’d get right into it. The issues you suggested were plausible, yet I was able to eliminate everything related to the Executor by instantiating the threads myself, without the executor. The problem still persists.

I checked and AssetManager is thread safe. To avoid instantiating new asset managers, I passed the asset manager from the main thread - still slow…

In any case, here is the code that still causes the same problem.

Main thread:

[java]
import java.util.concurrent.ConcurrentLinkedQueue;

import com.jme3.app.SimpleApplication;
import com.jme3.scene.Geometry;

public class Main extends SimpleApplication
{
private ConcurrentLinkedQueue<Geometry> queue;

private int howmany = 0;

public static void main(String[] args)
{
	(new Main()).start();
}


public void simpleInitApp() 
{
	queue = new ConcurrentLinkedQueue&lt;Geometry&gt;();
	
	createCubes();
}

public void simpleUpdate(float tpf)
{
	Geometry g;
	
	if((g = queue.poll()) != null)
	{
		rootNode.attachChild(g);
		howmany++;
	}
	
	if(howmany == 10)
	{
		rootNode.detachAllChildren();
		howmany = 0;
		createCubes();
	}
}

private void createCubes()
{
	for(int i = 0; i &lt; 10; i++)
	{
		new ThreadCube(queue, assetManager);
	}
}

}
[/java]

The cube creation via threads (this time with “Runnable” and hand made threads):

[java]
import java.util.concurrent.ConcurrentLinkedQueue;

import com.jme3.asset.AssetManager;
import com.jme3.asset.plugins.FileLocator;
import com.jme3.material.Material;
import com.jme3.math.Vector3f;
import com.jme3.scene.Geometry;
import com.jme3.scene.shape.Box;
import com.jme3.system.JmeSystem;

public class ThreadCube implements Runnable
{
private ConcurrentLinkedQueue<Geometry> queue;
AssetManager assetManager;

public ThreadCube(ConcurrentLinkedQueue&lt;Geometry&gt; queue, AssetManager assetManager)
{
	this.queue = queue;
	this.assetManager = assetManager;
	Thread t = new Thread(this);
	t.start();
}

public void run()
{
	Box b = new Box(Vector3f.ZERO, 1, 1, 1);
	Geometry g = new Geometry("Box", b);
	Material mat = new Material(assetManager, "Common/MatDefs/Misc/Unshaded.j3md");
	assetManager.registerLocator("C:\\void\\", FileLocator.class);
	mat.setTexture("ColorMap", assetManager.loadTexture("b.jpg"));
	g.setMaterial(mat);
	
	queue.add(g);
}

}
[/java]

Of course asset manager is thread safe. How else would you possibly be able to load things from a different thread?

Also, regarding ScheduledThreadPoolExecutor … just don’t use the scheduled version. You want a regular ThreadPoolExecutor. Also, it’s weird to have them self-submit themselves… but it’s not a big deal.

Those aren’t really issues but they will clean up your code.

Adding new Mesh+texture data to a scene will always take time. The data has to travel from the CPU to the GPU. The more data that there is, the longer it will take. Writing well-performing 3D games is all about reducing the activity on the bus.

1 Like

OK so this is an important input for me, you’re basically saying:

  1. I’m not doing anything “wrong” (the update loop’s work is minimal, thread execution is plausible).
  2. I should fine tune the attaching and threadding operations to achieve better performance.

Am I correct?

Yes. The problems with these sorts of tests is that you get too focused on “how bad is the wrong way to do things?” instead of “maybe there is a right way to do this”.

I don’t know exactly what you are trying to do. I could guess maybe you are receiving computer screens over the network or something but I have no idea. It could be a big help to know that stuff. (For example, if you are just updating a screen then you can update an image in place instead of throwing them away and recreating them all the time thus causing full GC and native reclaimation to happen very often = bad)

1 Like

Thanks for your reply.

I see what you mean about testing the “wrong way”. My problem is that I’ve been given a task to program something with jmonkey without proper knowledge and I have to “learn on the fly”, so naturally I’ll be making some wrong judgement calls and I really appreciate the educational effort on your part guys :slight_smile:

The entire game is kind of difficult to explain but I’ll try and explain what the scenario looks like in a more detailed way.
A module (which I do not program, that’s someone else’s job :slight_smile: ), retrieves images from a dedicated server of ours and saves them on the player’s computer. Those images are of various different size. We do not choose images based on their size or dimensions, we just retrieve what’s there - This is because many other players have uploaded these images, but that is irrelevant for this discussion.

Imagine a setting where I place the player inside a “gallery”, like a museum (in fact its not that far away from reality, but no, we’re not developing a museum game :stuck_out_tongue: ). The player can look around and enjoy the images. But then he can move to another “room” (in a smooth transitional phase, which makes it important for the “previous” images to still be visible partially until the new “room” is completely visible). The wall on the “room” is full of placeholders for images (mainly, simething like a solid coloured “white” geometry, blank), and the images are then “filled in” in a thread based way, which still enables the player to move around inside the room (look left, right).

I created all of that setting, but the threads which created the images, mainly: created a new Quad, filled it with a texture via a FileLocator, then attached it to the root node) caused the movment of the main character to stop. Completely. That is, at least, until all images were loaded.
“This is odd”, I thought, since I threadded the entire thing, why does this happen?

And that’s when I created this demo which constantly creates geometries via threads, attaches and detaches them to check the limitations of the issue.

If you have a better suggestion which could speed up the thing I’m all ears :slight_smile: but if not, at least I hope this is a better description of the problem and shows why I chose this particular solution.

Amir.

@amirkr said: then attached it to the root node) caused the movment of the main character to stop. Completely. That is, at least, until all images were loaded. "This is odd", I thought, since I threadded the entire thing, why does this happen?

With your original code, way back when… the “stop completely” part was a certainty. With the new code using queues, etc. this should not be true anymore or something else is wrong.

Anyway, you will have quite a time managing memory consumption, I think. If this is for “any random player” then you will be exceeding the average players’ graphics card memory limits several times over, I think. Some cards will only load power-of-two texture sizes anyway. So consider your minimum hardware requirements carefully if you have not done so already.

You may need to consider different techniques to minimize the impacts of these things (thumbnail textures for farther away, atlasing the thumbnails, etc.). Enforcing a maximum size is probably a necessity, too.

Ultimately, loading time will not be your biggest issue.