Multithreading speed

pspeed · October 27, 2012, 3:54am

@seann999 said:
@pspeed Goddamnit! I think I've tried what you've said, I'm not fully sure yet, but can you tell me how you did it in Mythruna?

@pspeed said:
Anyway, your architecture would be better served by having an update queue (ConcurrentLinkedQueue is best). Submit your callables like normal but when they are done then add the result to the queue. Every update frame on the render thread, pull one item off (if the queue is not empty) and update it.

What you submit to your own queue could even be a callable that you just invoke yourself... but only one per frame or whatever.

seann999 · October 27, 2012, 6:41pm

@pspeed Does something like this seem okay?

[java]

public void simpleUpdate(float tpf){

executor.submit(updateCommandsCallable); //updates ConcurrentLinkedQueue chunkCommands, including by building a new node for chunks needing replacement

if (chunkCommands.size() > 0) {

final Object o = chunkCommands.poll();

if (o instanceof RemoveChunkCommand) {

final RemoveChunkCommand c = (RemoveChunkCommand) o;

executor.submit(new Callable<Void>() {

public Void call() throws Exception {

game.enqueue(new Callable<Void>() {

public Void call() throws Exception {

removeChunk(c.chunk); //this method basically removes the chunk’s physics and removes itself from its parent node

return null;

}

});

return null;

}

});

} else if (o instanceof ReplaceChunkCommand) {

final ReplaceChunkCommand c = (ReplaceChunkCommand) o;

executor.submit(new Callable<Void>() {

public Void call() throws Exception {

game.enqueue(new Callable<Void>() {

public Void call() throws Exception {

removeChunk(c.oldChunk);

attachChunk(c.newChunk); //this method basically attaches the chunk’s physics and attaches itself to its parent node

return null;

}

});

return null;

}

});

} else if (o instanceof Chunk) {

executor.submit(new Callable<Void>() {

public Void call() throws Exception {

game.enqueue(new Callable<Void>() {

public Void call() throws Exception {

Chunk c = (Chunk) o;

attachChunk©;

return null;

}

});

return null;

}

});

}

}

//move player, other stuff

}

[/java]

pspeed · October 27, 2012, 7:38pm

You don’t need to enqueue them since you are already on the render thread at that point. Just run them. By using a callable I meant that your ConcurrentLinkedQueue could be full of Callables then all you’d have to do is grab the first one and call() it.

As an aside: when checking for if a collection is empty it is always better to use isEmpty(). size() may require some amount of calculation where isEmpty() will be potentially more optimized.

seann999 · October 27, 2012, 8:44pm

@pspeed Wow. That sped things up a lot. For the executor.submit(updateCommandsCallable) which happens every frame, should I do this instead, or does it not matter?

[java]

if (future == null) {

future = executor.submit(chunkUpdateCallable);

} else if (future2.isDone() || future2.isCancelled()) {

future = null;

}

[/java]

pspeed · October 27, 2012, 10:16pm

@seann999 said:
@pspeed Wow. That sped things up a lot. For the executor.submit(updateCommandsCallable) which happens every frame, should I do this instead, or does it not matter?
[java]
if (future == null) {
future = executor.submit(chunkUpdateCallable);
} else if (future2.isDone() || future2.isCancelled()) {
future = null;
}
[/java]

I don't think that's necessary. Why do you submit chunkUpdateCallable every frame? Why not only submit it when chunks need updating?

seann999 · October 28, 2012, 12:10am

@pspeed

Whoops, “future2” was supposed to be “future” too…

Here’s part of how chunks update in my code:

In simpleUpdate, check which chunks got edited by the player. Add those chunks to a list (list1).
In simpleUpdate, submit a callable that rebuilds the chunks from list1. Basically add those newly built chunks to list2 (ConcurrentLinkedQueue)
In simpleUpdate, add/remove/replace those chunks from the list2 (ConcurrentLinkedQueue) once per frame (as you suggested).

Yeah, I should check if list1 isn’t empty first before doing #2.

When you said that it’s probably not necessary, do you mean the “future checking”? So you think it’s okay if I submit every frame (or at least when list1 is not empty), like this:

[java]

if (!list1.isEmpty()) { //check if list1 isn’t empty (check if chunks needs updating)

executor.submit(chunkUpdateCallable);

}

[/java]

instead of this:

[java]

if (future == null && !list1.isEmpty()) {

future = executor.submit(chunkUpdateCallable);

} else if (future.isDone() || future.isCancelled()) {

future = null;

}

[/java]

where I add “future checking”? I don’t fully understand how ScheduledThreadPoolExecutor works, but won’t submitting every frame (or even only when updates are required) be inefficient? Wouldn’t doing something like that cause submitting even though the one submitted from the previous frame isn’t done? Kind of like people talking over each other? I thought that if I check the futures, I can tell if the one submitted from the previous frame is done, and only if it is, submit again (if chunks need updating). Then, there wouldn’t be excessive updating by submitting?

Again, sorry for asking so much. Thanks for staying with me so far.

pspeed · October 28, 2012, 1:54am

But you may have new ones to submit. I wouldn’t use a scheduled executor. Just queue the ones that need updates in regular thread pool executor… mark them as pending somehow. Don’t submit them again until they aren’t pending. You have to make sure that your workers mark them as “not pending” as soon as they start working on them.

I don’t know how you determine when stuff needs to be submitted or not. This is actually a tricky bit of stuff to get right. The last thing you want to do is miss submitting a change request because you thought one was already submitted but in fact it was being processed but was far enough along it wouldn’t take the new change into account. It’s almost always better to queue something for work and then discover you don’t need to do anything with it. For example, Mythruna keeps a version number in its chunk-equivalent (leafs). Every time a leaf changes then the version is incremented. If I go to build geometry for a leaf and it’s already the right version then I skip it. Better to do that accidentally 100 times than to miss even 1 update. So just be careful.

seann999 · October 28, 2012, 5:02am

@pspeed Okay, I changed ScheduledThreadPoolExecutor to ThreadPoolExecutor, but why? I read the documentation, I’m guessing it’s because I don’t even use its features?

For executor.submit(chunkUpdateCallable), I think you might have misunderstood something. It doesn’t update just one chunk, it iterates through a list and rebuilds them all, and passes them to another list to be used in attaching (#2 in quote).

@seann999 said:
@pspeed
Whoops, "future2" was supposed to be "future" too...
Here's part of how chunks update in my code:
1. In simpleUpdate, check which chunks got edited by the player. Add those chunks to a list (list1).
2. In simpleUpdate, submit a callable that rebuilds the chunks from list1. Basically add those newly built chunks to list2 (ConcurrentLinkedQueue)
3. In simpleUpdate, add/remove/replace those chunks from the list2 (ConcurrentLinkedQueue) once per frame (as you suggested).
Yeah, I should check if list1 isn't empty first before doing #2.
When you said that it's probably not necessary, do you mean the "future checking"? So you think it's okay if I submit every frame (or at least when list1 is not empty), like this:
[java]
if (!list1.isEmpty()) { //check if list1 isn't empty (check if chunks needs updating)
executor.submit(chunkUpdateCallable);
}
[/java]
instead of this:
[java]
if (future == null && !list1.isEmpty()) {
future = executor.submit(chunkUpdateCallable);
} else if (future.isDone() || future.isCancelled()) {
future = null;
}
[/java]
where I add "future checking"? I don't fully understand how ScheduledThreadPoolExecutor works, but won't submitting every frame (or even only when updates are required) be inefficient? Wouldn't doing something like that cause submitting even though the one submitted from the previous frame isn't done? Kind of like people talking over each other? I thought that if I check the futures, I can tell if the one submitted from the previous frame is done, and only if it is, submit again (if chunks need updating). Then, there wouldn't be excessive updating by submitting?
Again, sorry for asking so much. Thanks for staying with me so far.

Er--well actually, as of now, it's just one, since I added a return statement right after a chunk is generated procedurally (which is computationally expensive). Doing this seemed to reduce lag. Also, it used to be a for loop with an ArrayList, but now it's a while loop with a ConcurrentLinkedQueue. Basically like this (this is basically within the submitted callable):
[java]
while (!updates.isEmpty()) { //"updates" is a ConcurrentLinkedQueue with the class "UpdateCommand" (list1)
UpdateCommand command = updates.poll(); //UpdateCommand holds x and z coordinates for the replacement-needing chunk and other variables
Chunk chunk = getChunk(command.point); //getChunk(Point) gets the chunk with the right point coordinates

updateChunk(chunk); //this method replaces (updates) chunks; it adds the old chunk and the rebuilt chunk to a different list (list2) to be actually attached or detached from the world.
return;
}
[/java]
I guess the while loop there can be replaced since it returns after one...
I thought that you thought that the executor.submit(callable) submits only one chunk to be rebuilt.

pspeed · October 28, 2012, 6:17am

This is getting to complicated for me to really follow.

It just seems like this:

In simpleUpdate, check which chunks got edited by the player. Add those chunks to a list (list1).
In simpleUpdate, submit a callable that rebuilds the chunks from list1. Basically add those newly built chunks to list2 (ConcurrentLinkedQueue)

Could be:

In simpleUpdate, check which chunks got edited by the player and post them to a ThreadPoolExecutor… which then has workers that rebuild them and put them on the ConcurrentLinkedQueue.

One runnable versus many… hardly seems to matter.

In my case, I don’t use ThreadPoolExecutor and rolled my own… so I can pass whatever thing I want to it… in my case it’s a GeometryBuilder object but it essentially acts like a Runnable.

(I love the Java util.concurrent threading stuff and will evangelize that stuff up and down… but the executor services have what I think is a critical bug in how they handle uncaught exceptions and it bugs the heck out of me. It’s not at all possible to fix in user code, either.)

zarch · October 28, 2012, 10:59am

You mean the fact that they silently eat all uncaught exceptions without allowing you to see that it has happened? It’s damn annoying that. I even considered writing my own AbstractRunnable that would wrap an internal abstract doStuff() type method with a try/catch/log sequence. Never went beyond idly musing though.

Empire_Phoenix · October 28, 2012, 11:15am

I aboslutly recommend doing so, catch Throwable and you should not miss anything anymore. Cause later this can be really annoying to determine what the error was with hidden errors.

zarch · October 28, 2012, 1:25pm

Yeah, for the most part I just put the try catch inside my implementation of runnable as then I can have the error handling be more specific to that case…but it’s a far from ideal solution.

pspeed · October 28, 2012, 7:04pm

@zarch said:
You mean the fact that they silently eat all uncaught exceptions without allowing you to see that it has happened? It's damn annoying that. I even considered writing my own AbstractRunnable that would wrap an internal abstract doStuff() type method with a try/catch/log sequence. Never went beyond idly musing though.

Yeah, you can force all of your Runnables to do it but there is no way to do it in the executor itself. You can plug in your own worker factory, etc... but the exception swallowing happens in a wrapper for which you have no control over. It's the thing that handles the future stuff, as I recall. The worst part about it is that it ignores the thread's exception handler, too. For all of the beautiful work in the concurrent package, the fact that you can't used any of the standard Java approached for dealing with a thread's unhandled exceptions is pretty boneheaded.

I have the "benefit" of having written dozens of my own thread pool stuff over the years that Java didn't have such a thing... so I just went back to using my own in most cases. The nice thing is that I can submit objects directly and don't have to wrap them in Runnables/Callables first. Java generic typing even makes this nice and clean.

seann999 · November 3, 2012, 5:38pm

@pspeed Most of the lag is now gone, but there’s still an occasional fraction-of-a-second freeze. It usually happens when I keep traveling straight, which is when there’s a lot of procedural generation. I’m guessing it’s garbage collection, since I heard they freeze execution when there’s too much garbage or something, and that’s one of the drawbacks for games in Java (if you have inefficient or expensive code). If I’m right, all I can do now is optimize and clean the code. Does that make sense?

pspeed · November 3, 2012, 5:59pm

@seann999 said:
@pspeed Most of the lag is now gone, but there's still an occasional fraction-of-a-second freeze. It usually happens when I keep traveling straight, which is when there's a lot of procedural generation. I'm guessing it's garbage collection, since I heard they freeze execution when there's too much garbage or something, and that's one of the drawbacks for games in Java (if you have inefficient or expensive code). If I'm right, all I can do now is optimize and clean the code. Does that make sense?

If your heap settings are properly tuned then you can get rid of a lot of this... unless it's direct memory churn. If you are creating and releasing a lot of meshes then you might get some benefit from manually freeing them when done so that the JVM doesn't have to do it when GC is run. JME has a utility method for doing this.

I started using it a while back to avoid out of memory errors when there was still plenty of memory. I think it also affected performance but it's hard to tell because right around that time we also fixed the native buffer freeing to not free everything in one frame (different than the direct memory issue but related).

seann999 · November 3, 2012, 8:26pm

@pspeed Heap settings are things like -Xms, -Xmx in the command line (or the conf file in JME), right?

And “manually freeing” is setting unneeded variables to null? And where is the utility method?

pspeed · November 3, 2012, 9:01pm

@seann999 said:
@pspeed Heap settings are things like -Xms, -Xmx in the command line (or the conf file in JME), right?
And "manually freeing" is setting unneeded variables to null? And where is the utility method?

No, manually freeing is using BufferUtils to free the direct memory. Direct memory is only freed when the garbage collector is run but the amount of direct memory does not factor into the garbage collector running. So if you've set your heap really big then you can actually cause more problems because the GC will run less often and so direct memory doesn't get freed.

There is a command line setting for the direct memory size. If you churn through a lot of meshes my guess is that max heap should be no more than 512 M and direct heap should be 1024 M or more... and use BufferUtils to free the buffers (it's a little bit of manual work but it's worth it for lots of reasons).

http://hub.jmonkeyengine.org/javadoc/com/jme3/util/BufferUtils.html#destroyDirectBuffer(java.nio.Buffer)

seann999 · November 3, 2012, 10:13pm

@pspeed Okay, so I did 3 things:

After removing the chunk from the world and my chunk list, I basically did this:

[java]

for (int i = 0; i < node.getQuantity(); i++) {

SafeArrayList<VertexBuffer> buffers = ((Geometry) node.getChild(i)).getMesh().getBufferList();

for (int j = 0; j < buffers.size(); j++) {

VertexBuffer vBuffer = buffers.get(j);

BufferUtils.destroyDirectBuffer(vBuffer.getData());

}

}

[/java]
I went to jmonkeyplatform.conf and changed Xmx to 512m
There was already -J-XX:MaxDirectMemorySize=2048m, which I assume is what you were talking about.

Is this right?

pspeed · November 3, 2012, 11:34pm

@seann999 said:
@pspeed Okay, so I did 3 things:
1. After removing the chunk from the world and my chunk list, I basically did this:
[java]
for (int i = 0; i < node.getQuantity(); i++) {
SafeArrayList<VertexBuffer> buffers = ((Geometry) node.getChild(i)).getMesh().getBufferList();
for (int j = 0; j < buffers.size(); j++) {
VertexBuffer vBuffer = buffers.get(j);
BufferUtils.destroyDirectBuffer(vBuffer.getData());
}
}
[/java]
2. I went to jmonkeyplatform.conf and changed Xmx to 512m
3. There was already -J-XX:MaxDirectMemorySize=2048m, which I assume is what you were talking about.
Is this right?

Yeah... don't know how much that will help but it shouldn't hurt. And it's worth a try anyway. :)

seann999 · November 4, 2012, 7:51am

@pspeed Woot, I think I solved it. I read this (Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning) and set the project VM options to

[java]

-Xms32m -Xmx512m -XX:MaxDirectMemorySize=2048m -verbose:gc -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

[/java]

and it completely removed any noticeable lag (GC takes ~0.04 secs).

…Although that doesn’t mean I shouldn’t improve the code