Is it possible that a FOR loop is split across threads?... just wondering

pspeed · June 6, 2014, 4:30pm

And at the risk of beating a dead horse by now, I want to talk a second about the importance of the stack trace.

When debugging an issue, stack traces are glorious. They are the one (and often only) bit of ground truth data we will get to try to diagnose a problem. Java doesn’t lie about it’s stack traces. You can often infer huge amounts of stuff or rule out entire trees of possibilities just from the stack trace.

Every single other piece of information a developer provides is tainted by perception. Even the debug printlns are subject to preconceived notions of what was expected to be found. (This is why the output of debug printlns is nearly useless without also showing the exact code producing those lines in full context.)

The presumption is that most developers who post about their problems like this, it will be because they have reached the end of their rope and exhausted their own tricks and procedures. So, by definition, the bug has surpassed their own ability to diagnose. Thus, every single assumption on their part has become suspect. Consequently, any information that developer provides on what they think is going on is the least reliable information. Useful anecdotally, sure, but chances are if that dev had a good idea what was actually happening then they’d have also solved the problem.

This particular stack trace was an excellent example. If you’d posted nothing else I would have been able to tell you what your problem was. Let’s take a look at how:

If you had told me nothing else about your problem…

Clue #1: indicates that a collection is being modified while it is being iterated. This could be because of ONLY two reasons:

the loop is modifying the collection while iterating
multiple threads are modifying the same collection

Given the location I can already 100% rule out number 1. (so technically there is a clue 1.5)

Conclusion: multiple threads are modifying this data structure.

Clue #2: with just a few clicks through the online SVN (still relevant in this case and github’s interface is horrible) I can see that this is the collision results internal to BIHTree. But actually, before I even looked, I had my suspicions because why would BIHTree be calling getNearestCollsion() on the supplied results? It wouldn’t. So I checked the code to confirm. In theory, I could have diagnosed this even without looking.

Conclusion: multiple threads are performing collisions on the same Mesh instance.

Clue #3: now we get a little speculative. One conclusion I can make 100% is that threads are being used (but I already knew that 100%, see above). A probable but less sure conclusion is that it’s quite likely that multiple threads in the pool itself could be performing collisions on the same mesh instance.

But even if the render thread were also performing collisions on this same scene then the issue would happen.

Those are the 100% true facts and then all that would be left is to apply them to your code. Working with ground truth is wonderful.

pspeed · June 6, 2014, 4:43pm

As to the original problem, we agree that it is kind of ugly that BIHTree is doing this. At the moment it’s a “fact of life” until something better comes up… at any rate, it wouldn’t be available in a stable release for some time.

Ben1 · June 6, 2014, 5:09pm

@t0neg0d said: @.Ben. What's your end goal with this? (If I missed that in the posts sorry) Might be easier to help if people knew what you were trying to accomplish.
Specifically, what’s the original list for? (Did you ever specify the data type used, how you add to and how you are retrieving the info… or is this a moot point now?)
What’s the ray casting and then collideWith all about?

Hi. I am doing collideWith() on all spatials in the list, never adding to it. Just modifying their localTranslation() and they’re not attached to the screneGraph when I do that, as a matter of fact, nothing in the whole thread is attached to the rootNode.

@abies said: Regarding the actual subject - don't swallow exceptions. Don't handle them. They are there for a reason. You might want to do some serious defensive coding when you deploy into embedded space without possibility of easy updating, but I don't think it is a good approach during development. Trace exceptions, understand them and fix code so they are not thrown. Having "this exception is thrown once per 10000 calls, let's ignore it for now" is road to madness.

I know they’re there for a reason and no, I’m not ignoring or muting them, I’m using libs and sometimes they do swallow exceptions, I’m not happy about it either. I’m definitely not happy when I see random exceptions and I always fixed them before, but this one is tricky to me because it involves threads and I’m not used to that yet. That’s why I’m here to ask some hints.

@pspeed said: To me it looks like you are setting up your task to run over and over again? (Otherwise why use the scheduled thread pool executor)

Nope. At least that’s not what I thought it was for. What I want to do is launch a thread task that will create new geometries and sort them, set their translation, etc… by using collideWith() between each other, etc… and once this task is completed, obtain the resulting node (that contains thousands of geometries) from the future thread task and attach it to the scenegraph. AND ALL OF THIS works except the collideWith() node that randomly crashes with 4 different and random exceptions, always pointing to its “results” variable being sorted. It’s like it’s being reused or something, which should really not be, but I have not developped anything in the JME3 core, so that’s why I turn to you guys if somebody knows.

@pspeed said: Whatever the case, more than one thread is hitting the same CollisionResults which is a huge no-no. Actually, given the teeny little code snippets provided I can't see how that happens even if the same instance is being rerun.

I don’t see how this is possible beause each thread instanciate its own geometries, its own ray, its own collision results… I was reading on the Java threading topic yesterday and it looks like it’s not even supposed to be possible that variables instanciated inside a function called concurrently by 2 threads can share their values, so how the heck is this happening? You saw yesterday in my code snippets that every god damn variable is created from within the threads themselves, not from outside. And nothing, NOTHING is attached to the rootNode or anything sceneGraph related. It’s all (supposed) to be independant. Could it be that collideWith() is not meant at all to be called from within 2 threads at the same time? Does it use an inner MEMBER-VARIABLE that is static maybe? You guys know better that’s why I’m asking you.

@pspeed said: As stated in a previous thread, collisions against nodes are not thread safe. It's not just the collision results but the internal collision data. BIHTree keeps an internal CollisionResults and that is the one getting the concurrent mod exceptions. So two different threads cannot call collideWith() on the same Geometry. Period.

VERY CRUCIAL piece of information, but the problem is: I’m not doing the collideWith() on the same geometries from the different threads. The threads themselves create their OWN geometries and their own ray, collision results and such. They’re all unique and proper to themselves.

@pspeed said: If you want a pool of threads to hit against the scene graph then you will have to clone the scene graph and make sure the Geometry clones have their own copy of the internal collision data. I don't think it's worth it, personally. There are a dozen better ways to speed up collisions.

Nothing is (supposed to be) attached to the rootNode. Does that invalidate your statement? Or are you implying that simply doing this: Node n = new Node() is enough to make it crash? Because I repeat, there is nothing attached to the sceneGraph inside the threads.

@pspeed said: Useful anecdotally, sure, but chances are if that dev had a good idea what was actually happening then they’d have also solved the problem.

Well… that was kind of the same conclusion I told you yesterday tough: http://hub.jmonkeyengine.org/forum/topic/is-it-possible-that-a-for-loop-is-split-across-threads-just-wondering/#post-288447

@pspeed said: A probable but less sure conclusion is that it’s quite likely that multiple threads in the pool itself could be performing collisions on the same mesh instance.

Impossible. Explain to me how a function that creates its own geometries can end up reusing another thread’s own geoemtries? From all the Java Threading forums I’m reading, it’s impossible. It would only be possible if the variables were declared outside of the function or passed as parameters.

We’re definitely not out of the woods here. I’m already (and have been for 24 hours) blaming the collideWith() function for not being able to perform its operations concurrently. I’m pretty darn sure about that assumption. Can you guys confirm this? I’m not reusing geometries, so please stop mentionning it, I’m not that ignorant and would not have come here to ask help if this was the case. This never ever happens. EVERYTHING is declared from within the function scope. EVERYTHING. Nothing is attached to the rootNode or sceneGraph.

pspeed · June 6, 2014, 5:21pm

IT"S NOT YOUR COLLISION RESULTS THAT IS BEING SHARED. Repeat as needed.

It’s the collision results that is kept internally to the BIHTree… which is kept internally to the Mesh.

This is an irrefutable fact. You are doing a collideWith on the same mesh from multiple threads. It is FACT. 100% absolute guaranteed.

It is only in question how it is happening in your code. But I can tell you with 100% certainty that two threads are calling collideWith() on the same mesh instance. Like, how did you create these other instances just to do collisions against them? Did you construct them manually or load them from the cache… err… I mean AssetManager.

I do wonder… if you do not want to repeat tasks then why are you using the thread pool implementation that was designed specifically to repeat tasks.

pspeed · June 6, 2014, 5:22pm

@.Ben. said: We're definitely not out of the woods here. I'm already (and have been for 24 hours) blaming the collideWith() function for not being able to perform its operations concurrently. I'm pretty darn sure about that assumption. Can you guys confirm this? I'm not reusing geometries, so please stop mentionning it,

You may not be reusing geometries but you ARE reusing meshes. 100000000% guaranteed.

Ben1 · June 6, 2014, 5:38pm

@pspeed said: IT"S NOT YOUR COLLISION RESULTS THAT IS BEING SHARED. Repeat as needed.

Yes yes lol… sorry for the wrong term ^^; that’s what I meant.

OK… so if I get you right, you mean that collideWith() will store information in the meshes and not in the passed spatials, right? I’m not reusing the meshes either, but OUT OF CURIOSITY, are you implying that if I were to declare ONE mesh and reuse it as a model to construct thousands of geometries split across different threads, even if the nodes and geometries themselves are created from within the function scope, it would make the collideWith() crash?

If that’s the case, that’s some gigantic design flaw to me. I don’t understand the logic in how this architecture is done. What has the mesh anything to do with the collideWith() results? Each spatial has its own location/rotation, the mesh is just a wireframe “model” if you will, it’s just used to construct multiple geometries that are identical in terms of vertices, so it doesn’t make sense to me that I have to duplicate 1000 identical meshes just to be able to test collideWith() on their geometries! Are you sure about this? [again, I’m NOT reusing the meshes in this scenario]

The only thing in this whole class that is reused is the material and no, I’m not going to make 1000 different materials. I would not seriously consider anyone telling me this is the solution.

pspeed · June 6, 2014, 6:03pm

How would you collide with the triangles without colliding with the mesh? I’m not sure you’ve thought that through. You know, two minutes looking at code would answer all of your questions. The only design flaw here is that BIHTree keeps its own collision results object in a paranoid attempt to avoid creating garbage (even though the CollisionResult objects it contains will be created every time anyway and there will only ever by two of them.)

The BIHTree is (part of) what a Mesh caches so that it doesn’t have to check every triangle every time a collision check is done.

So, if I understand things correctly, you are manually constructing your Mesh for each Geometry? Like constructive geometry? Or are you loading it from the AssetManager? (hint cached)

Anyway, I still think your overall approach may be flawed. Hard to say but I’ve never need to have a pool of threads performing collision checks before.

Ascaria · June 6, 2014, 6:09pm

@.Ben. said: ...[again, I'm NOT reusing the meshes in this scenario] ...

i try to relieve pspeed’s suffering: assetManager will load only one mesh, then he is giving pointers to it everytime you want to load model. so you dont reuse meshes, but assetManager does it for you automatically. it is written in the tutorial. conclusion to this may be fact, that you can do collidewith on one thread at one time, no parallel collisions

its not design flaw, it is huge memory saving… lets imagine 10000 space ships in memory… do you want 10000 meshes of same ship, or 1 mesh? what can save memory? if you want to duplicate mesh, give it animation

im not happy with that collideWith is single-threaded, but it is fact now, you dont need to understand because you are not skilled enough to change it, so get it as it is, try to think how to improve your game that you dont have to do milions of collideWith

Ben1 · June 6, 2014, 6:32pm

@pspeed said: So, if I understand things correctly, you are manually constructing your Mesh for each Geometry? Like constructive geometry? Or are you loading it from the AssetManager? (hint cached)

I’m manually creating a mesh and a geometry, nothing is reused in this scenario because those boxes are later used for other operations. Normally, I only make 1 mesh and reuse it but not in this particular application. The material is reused tough, and it comes from the assetManager. Here is the function that is called 1000 times to create 1000 geometries that will (in the same threaded function call) be tested against the faulty collideWith() function.

[java]
private Geometry create_box(s, location){
Box faceShape = new Box(1, s, 1);

    Geometry face1 = new Geometry("Box", faceShape);
        
    face1.setUserData("S", s);
        
    face1.setMaterial(unshadedMaterial); // This comes from the assetManager and is reused on all geometries
    
    face1.setLocalTranslation(new Vector3f(location.x + (s * 5f), location.y, location.z + (s * 5f)));
    
    return face1;

}
[/java]

Before the collideWith() crash I create like 1000 of these by calling the function 1000 times and the returned Geometry is added to a list. This list is iterated and collideWith() is used to check if and at what point exactly they collide with a big spatial. Again, for people that just joined in, commenting out the collideWith() function makes it work PERFECTLY except that the IF branch when those 1000 geometries would collide is not executed, but everything else works perfectly and it’s quite efficient. As a matter of fact, before I started implementing this, the game would have huge fps drops when arriving on a new terrain chunk and now it’s as smooth as nothing would never happen, I’m quite happy with the result, except that collideWith() is not thread-safe and that really sucks.

@Ascaria >> “conclusion to this may be fact, that you can do collidewith on one thread at one time, no parallel collisions”

Yeah, but I don’t agree with your explanation… what more memory would it actually use if it stored in the spatials themselves instead of in the meshes (which to me seems obvious but what do I know) collideWith says it: COLLIDES WITH. spatial vs spatial… not mesh vs mesh. A mesh does not have a location/rotation. Therefore it logically can’t collide with anything. It’s ABSTRACT.

But what do I know…

Ascaria · June 6, 2014, 6:50pm

@.Ben. said:collideWith says it: COLLIDES WITH. spatial vs spatial... not mesh vs mesh. A mesh does not have a location/rotation. Therefore it logically can't collide with anything. It's ABSTRACT.
But what do I know…

i have to disappoint you, collide with does following

ray is a line, so not spatial vs spatial but line vs spatial
first traversing spatials (nodes), if spatial is geometry, then its mesh is taken, then mesh’s bounding box which have position and rotation, if hit, then continues to mesh’s triangles which have pos and rot, every triangle is collided with

im sorry but you should read tutorials thoroughly

i can agree with you that engine should be able to do collidewith on multiple threads because it is read-only process i think

Ben1 · June 6, 2014, 6:58pm

Yep, it’s counter-intuitive alright. Thanks for this valuable information.

So basically, back to the drawing board. Let’s rewrite the Node class… not.

Well, as I thought yesterday and after turning the problem in all directions, it’s indeed quite risky to use collideWith() from threads.

I’ll do something else for this part. Also, ALL OF THIS for a result that players would barely even notice at all

I +1’ed Ascaria for 2 valuable pieces of information and Paul also for some information, altough you do not trust what I do, we come to the somewhat same conclusion 1 day later. I’m not that stupid you know, I’m trying to learn JME3, but sometimes it’s not as intuitive as it first looks

Thank you all.

t0neg0d · June 6, 2014, 8:46pm

@.Ben. said: Yeah, but I don't agree with your explanation... what more memory would it actually use if it stored in the spatials themselves instead of in the meshes (which to me seems obvious but what do I know) collideWith says it: COLLIDES WITH. spatial vs spatial... not mesh vs mesh. A mesh does not have a location/rotation. Therefore it logically can't collide with anything. It's ABSTRACT.

Couple things:

A mesh is comprised of direct memory buffers, some of which describe where in space each vertice resides. This is local space… but it still has a physical local.
The reason JME loads a mesh once and shares it’s backing buffers should be fairly obvious from point 1. As well as, it mirrors how they are handled on the GPU.
The local translation + the spatial coordinates + parent coords etc combine to give you world coordinates of the instance of the mesh (this is likely used in geometry collision, as local coords would do no good).
The reason collideWith uses Spatial to Spatial is obvious from point 3. It still checks geometry collision… it’s just geometry in world space.

Hope this helps a little.

Ben1 · June 6, 2014, 9:09pm

It helps to understand how JME3 is designed, but it doesn’t help resolving the actual issue. I would have figured JME3 would have made a different (concurrent safe) non static list for each collideWith() test but it clearly isn’t the case.

Empire_Phoenix · June 6, 2014, 9:21pm

Nope thread savety costs much performance, so thats why.

pspeed · June 6, 2014, 9:47pm

@.Ben. said: I'm manually creating a mesh and a geometry, nothing is reused in this scenario because those boxes are later used for other operations. Normally, I only make 1 mesh and reuse it but not in this particular application. The material is reused tough, and it comes from the assetManager. Here is the function that is called 1000 times to create 1000 geometries that will (in the same threaded function call) be tested against the faulty collideWith() function.
[java]
private Geometry create_box(s, location){
Box faceShape = new Box(1, s, 1);
    Geometry face1 = new Geometry("Box", faceShape);
        
    face1.setUserData("S", s);
        
    face1.setMaterial(unshadedMaterial); // This comes from the assetManager and is reused on all geometries
    
    face1.setLocalTranslation(new Vector3f(location.x + (s * 5f), location.y, location.z + (s * 5f)));
    
    return face1;
}
[/java]

Before the collideWith() crash I create like 1000 of these by calling the function 1000 times and the returned Geometry is added to a list. This list is iterated and collideWith() is used to check if and at what point exactly they collide with a big spatial. Again, for people that just joined in, commenting out the collideWith() function makes it work PERFECTLY except that the IF branch when those 1000 geometries would collide is not executed, but everything else works perfectly and it’s quite efficient. As a matter of fact, before I started implementing this, the game would have huge fps drops when arriving on a new terrain chunk and now it’s as smooth as nothing would never happen, I’m quite happy with the result, except that collideWith() is not thread-safe and that really sucks.

@Ascaria >> “conclusion to this may be fact, that you can do collidewith on one thread at one time, no parallel collisions”

Yeah, but I don’t agree with your explanation… what more memory would it actually use if it stored in the spatials themselves instead of in the meshes (which to me seems obvious but what do I know) collideWith says it: COLLIDES WITH. spatial vs spatial… not mesh vs mesh. A mesh does not have a location/rotation. Therefore it logically can’t collide with anything. It’s ABSTRACT.

But what do I know…

If you truly have separate meshes then there is no problem. Every mesh instance gets its own collision data.

Your other assertion feels pretty ridiculous.

If you’d actually looked at the code (would have taken two minutes and saved me time explaining) then you’d see that if you do something like rootNode.collideWith() then it passes that down to the children and the children’s children, etc. until it gets to a Geometry. And what would you be colliding with if not for the mesh? There is nothing but the mesh that represents the object.

Geometry translates the ray (or whatever) into Geometry space (actually it passes the transformation info to Mesh, which passes it on the BIHTree, with the ray and BIHTree translates it)

How could it be done any differently, really? You show me different ways to do it and I will show you much slower code.

The issue here is that BIHTree tries to save some memory. During it’s collision test it collides with the bounding shape of the mesh and uses a CollisionResults to do it. Instead of recreating it every time, it clears it and reuses it. This means that FOR A SPECIFIC MESH INSTANCE you cannot access it from more than one thread.

So, somewhere you have more than one mesh instance and the two threads are hitting it. It’s up to you to find out why but I guarantee you it’s true.

pspeed · June 6, 2014, 9:48pm

@.Ben. said: It helps to understand how JME3 is designed, but it doesn't help resolving the actual issue. I would have figured JME3 would have made a different (concurrent safe) non static list for each collideWith() test but it clearly isn't the case.

All of the other less relevant stuff aside, you ARE sharing meshes somewhere.

Ben1 · June 6, 2014, 11:04pm

@pspeed said: And what would you be colliding with if not for the mesh? There is nothing but the mesh that represents the object.

…a copy of the mesh in a non concurrent safe array list?

@pspeed said: So, somewhere you have more than one mesh instance and the two threads are hitting it. It’s up to you to find out why but I guarantee you it’s true.

If you say so, but I’m unable to find where these shared meshes are, because I’m creating them right before calling collideWith() in the scope of the threaded function. If I’m unsuccesful at finding those meshes I’ll rewrite this part of the code to something else, maybe just do collideWith() batches in the simpleUpdate() loop. It’s not that big of a deal in the end, I’m just totally amazed at this problematic. I always end up finding a solution, but this puzzles me a really hard time.

Thank you so much Paul for your time, you can consider this issue cleared, altough it really is not, but you clearly explained why it happens, so I’ll deal with this or rewrite it another way.

pspeed · June 7, 2014, 12:34am

@.Ben. said: ...a copy of the mesh in a non concurrent safe array list?

A) 10000000% irrelevant since that’s not where the exception is.

B) 10000000% irrelevant since you’d only be reading those lists.

@.Ben. said:
If you say so, but I’m unable to find where these shared meshes are, because I’m creating them right before calling collideWith() in the scope of the threaded function. If I’m unsuccesful at finding those meshes I’ll rewrite this part of the code to something else, maybe just do collideWith() batches in the simpleUpdate() loop. It’s not that big of a deal in the end, I’m just totally amazed at this problematic. I always end up finding a solution, but this puzzles me a really hard time.

Thank you so much Paul for your time, you can consider this issue cleared, altough it really is not, but you clearly explained why it happens, so I’ll deal with this or rewrite it another way.

Everywhere you call collideWith() do a scene graph visitor on the node that prints all of the meshes and their System.identityHashCode() (and thread while you are at it).

Are you also doing collision checks from the render thread?

At any rate, we can’t see enough of the code to find the issue so at some you’ll end up doing the digging, yeah.

Why do you perform so many collision tests? It could be the sign of a problem anyway.

Ben1 · June 7, 2014, 12:51am

@pspeed said: Are you also doing collision checks from the render thread?

Yes.

@pspeed said: Why do you perform so many collision tests? It could be the sign of a problem anyway.

Because I’m positionning THOUSANDS of geometries on the surface of terrain chunks and I need a selection of those thousands of geometries to never overlap with another group of geometries. I have succesfully fixed my code. The problem never occurs anymore. Thank you

pspeed · June 7, 2014, 2:31am

@.Ben. said: Yes.
Because I’m positionning THOUSANDS of geometries on the surface of terrain chunks and I need a selection of those thousands of geometries to never overlap with another group of geometries. I have succesfully fixed my code. The problem never occurs anymore. Thank you

Just FYI: there are ways that are exponentially more efficient than what you are doing. In fact, if I had to think of the slowest way to do it… this is probably the way I’d propose. Not trying to pick on you but your optimization strategy took a wrong turn at the very beginning and it’s sometimes hard to get off later.

So as I suspected before, you may want to work on the meta-problem and then you can probably just do all of this in real time.

BTW: what turned out to be the issue? Did you find the shared mesh?