Of course the title already spoils the end but I’ll take you along on my journey anyway. (I’ll be filing an official problem report shortly and it will include much less color.)
This is partially a heads up for folks who might be in a similar situation and partially an “i-search” essay.
Some months ago when working out how animation rigs would work client/server through the SpiderMonkey/SimEtheral stack, I added animation to one of my MOSS demos… the one with the chickens and the barn. In that demo, I had nehon’s metal robot model walking around. However, occasionally I would notice little single-frame glitches where the model would morph funny or twist funny.
At the time, I suspected that maybe since I’m manually feeding time to AnimComposer that maybe I was hitting some floating point error in the interpolators or some “close to 1 but not 1” style interp problems. I spent a little time looking into it, sanitizing time inputs, etc… but ultimately decided that a small visual glitch was a problem for another day.
Fast forward to this past weekend: I’ve been working on getting the old “Mythruna girl” upgraded with new higher poly mesh, proper UVs for my material system, etc… and compatible with the network rigging.
She’s a work in progress but far enough along to test so I tried to bring her into the demo with the robot… now instead of a few glitched every now and then they happened quite often. Even worse, the animation code would occasionally throw IndexOutOfBoundsExceptions deep in the interpolation code.
SEVERE: Uncaught exception thrown in Thread[jME3 Main,5,main]
java.lang.ArrayIndexOutOfBoundsException: 58
at com.jme3.animation.CompactArray.getCompactIndex(CompactArray.java:221)
at com.jme3.animation.CompactArray.get(CompactArray.java:141)
at com.jme3.anim.interpolator.FrameInterpolator$TrackDataReader.getEntryClamp(FrameInterpolator.java:138)
at com.jme3.anim.interpolator.AnimInterpolators$1.interpolate(AnimInterpolators.java:51)
at com.jme3.anim.interpolator.AnimInterpolators$1.interpolate(AnimInterpolators.java:46)
at com.jme3.anim.interpolator.FrameInterpolator.interpolate(FrameInterpolator.java:68)
at com.jme3.anim.TransformTrack.getDataAtTime(TransformTrack.java:284)
at com.jme3.anim.tween.action.ClipAction.interpolateTransformTrack(ClipAction.java:41)
at com.jme3.anim.tween.action.ClipAction.doInterpolate(ClipAction.java:31)
at com.jme3.anim.tween.action.BlendableAction.interpolate(BlendableAction.java:52)
at com.jme3.anim.AnimLayer.update(AnimLayer.java:208)
at com.jme3.anim.AnimComposer.controlUpdate(AnimComposer.java:391)
at com.jme3.scene.control.AbstractControl.update(AbstractControl.java:118)
at com.jme3.scene.Spatial.runControlUpdate(Spatial.java:743)
at com.jme3.scene.Spatial.updateLogicalState(Spatial.java:890)
at com.jme3.scene.Node.updateLogicalState(Node.java:228)
at com.jme3.scene.Node.updateLogicalState(Node.java:239)
at com.jme3.app.SimpleApplication.update(SimpleApplication.java:262)
at com.jme3.system.lwjgl.LwjglAbstractDisplay.runLoop(LwjglAbstractDisplay.java:160)
at com.jme3.system.lwjgl.LwjglDisplay.runLoop(LwjglDisplay.java:201)
at com.jme3.system.lwjgl.LwjglAbstractDisplay.run(LwjglAbstractDisplay.java:242)
at java.lang.Thread.run(Thread.java:745)
Uh, oh… looks bad.
I added some more debug code in and then removed everything from my demo but Mythruna girl. The crashes stopped happening but the glitches were still there.
Here is a short video with some examples: (Mythruna girl without textures.)
Everything pointed to a threading bug so I started looking at how I load the back-end versus front-end models.
Architecturally speaking, the back-end loads a simplified rig that’s just supposed to be enough to keep track of the bones for physics collision shapes. In practice, the back-end loads a stripped down j3o… using a completely separate BinaryImporter, completely separate buffers, etc… it doesn’t even have an AssetManager. So no sharing there. I also confirmed that the instances of AnimComposer/SkinningControl on the “back end” were actually different than the ones loaded for visuals.
By “back end” here I mean that the gaming thread runs separately behind a network server and the client connects through the networking and uses SimEthereal children to keep track of the animations, animation times, etc… but “back end” and “front end” are running in the same Java process.
I also rechecked everything to do with timing and still had issues. If I turned off the timing pump on the client then the problem went away. If I turned off the timing pump on the server then the problem went away. Furthermore, the back-end model was also glitching. If I turned on my own physics debug shapes I could occasionally see them wigging out also.
Definitely threads sharing state that they shouldn’t be.
So I looked through the JME anim code… it’s rife with static and stateful objects. Like, don’t even know exactly the right way to fix it yet, rampant.
At the lowest level are the AnimInterpolators. Several of those are global static constants that internally keep state objects:
All animations will share the same NLerp instance, for example.
Even worse, FrameInterpolators (by default) is also shared across all morph and transform tracks. They refer to the default instance:
…and if you scroll down you can see all kinds of state that FrameInterpolator holds onto. This was the source of the IndexOutOfBounds exceptions as one thread tried to update a back-end instance of the robot rig while the front end tried to update a completely separate Mythruna-girl model.
I’ve temporarily hacked my JME so that AnimInterpolators no longer keeps static reference but provides static factory methods. I then modified FrameInterpolator to create its own instances instead of using the static constants for those interpolators… and then modified MorphTrack and TransformTrack to create their own FrameInterpolator.
That fixes the problem but creates a couple thousand FrameInterpolator instances. Really, one per “rig” would be enough to avoid all of the problems.
To be clear, this is kind of a serious issue and could even happen to folks loading spatials on a separate thread to later add to the scene. If you decide that you want to preset the pose of that model in any way then you potentially create visual glitches and/or crash your application.
That’s a big surprise.
I’m still trying to figure out the best way to fix this “for real”. I’m not sure that it can be done without a few breaking changes… but we’ll see. The current code did something convenient-but-lazy and now we pay the price.