Major thread state leakage in the new animation system

Of course the title already spoils the end but I’ll take you along on my journey anyway. (I’ll be filing an official problem report shortly and it will include much less color.)

This is partially a heads up for folks who might be in a similar situation and partially an “i-search” essay.

Some months ago when working out how animation rigs would work client/server through the SpiderMonkey/SimEtheral stack, I added animation to one of my MOSS demos… the one with the chickens and the barn. In that demo, I had nehon’s metal robot model walking around. However, occasionally I would notice little single-frame glitches where the model would morph funny or twist funny.

At the time, I suspected that maybe since I’m manually feeding time to AnimComposer that maybe I was hitting some floating point error in the interpolators or some “close to 1 but not 1” style interp problems. I spent a little time looking into it, sanitizing time inputs, etc… but ultimately decided that a small visual glitch was a problem for another day.

Fast forward to this past weekend: I’ve been working on getting the old “Mythruna girl” upgraded with new higher poly mesh, proper UVs for my material system, etc… and compatible with the network rigging.

She’s a work in progress but far enough along to test so I tried to bring her into the demo with the robot… now instead of a few glitched every now and then they happened quite often. Even worse, the animation code would occasionally throw IndexOutOfBoundsExceptions deep in the interpolation code.

SEVERE: Uncaught exception thrown in Thread[jME3 Main,5,main]
java.lang.ArrayIndexOutOfBoundsException: 58
        at com.jme3.animation.CompactArray.getCompactIndex(CompactArray.java:221)
        at com.jme3.animation.CompactArray.get(CompactArray.java:141)
        at com.jme3.anim.interpolator.FrameInterpolator$TrackDataReader.getEntryClamp(FrameInterpolator.java:138)
        at com.jme3.anim.interpolator.AnimInterpolators$1.interpolate(AnimInterpolators.java:51)
        at com.jme3.anim.interpolator.AnimInterpolators$1.interpolate(AnimInterpolators.java:46)
        at com.jme3.anim.interpolator.FrameInterpolator.interpolate(FrameInterpolator.java:68)
        at com.jme3.anim.TransformTrack.getDataAtTime(TransformTrack.java:284)
        at com.jme3.anim.tween.action.ClipAction.interpolateTransformTrack(ClipAction.java:41)
        at com.jme3.anim.tween.action.ClipAction.doInterpolate(ClipAction.java:31)
        at com.jme3.anim.tween.action.BlendableAction.interpolate(BlendableAction.java:52)
        at com.jme3.anim.AnimLayer.update(AnimLayer.java:208)
        at com.jme3.anim.AnimComposer.controlUpdate(AnimComposer.java:391)
        at com.jme3.scene.control.AbstractControl.update(AbstractControl.java:118)
        at com.jme3.scene.Spatial.runControlUpdate(Spatial.java:743)
        at com.jme3.scene.Spatial.updateLogicalState(Spatial.java:890)
        at com.jme3.scene.Node.updateLogicalState(Node.java:228)
        at com.jme3.scene.Node.updateLogicalState(Node.java:239)
        at com.jme3.app.SimpleApplication.update(SimpleApplication.java:262)
        at com.jme3.system.lwjgl.LwjglAbstractDisplay.runLoop(LwjglAbstractDisplay.java:160)
        at com.jme3.system.lwjgl.LwjglDisplay.runLoop(LwjglDisplay.java:201)
        at com.jme3.system.lwjgl.LwjglAbstractDisplay.run(LwjglAbstractDisplay.java:242)
        at java.lang.Thread.run(Thread.java:745)

Uh, oh… looks bad.

I added some more debug code in and then removed everything from my demo but Mythruna girl. The crashes stopped happening but the glitches were still there.

Here is a short video with some examples: (Mythruna girl without textures.)

Everything pointed to a threading bug so I started looking at how I load the back-end versus front-end models.

Architecturally speaking, the back-end loads a simplified rig that’s just supposed to be enough to keep track of the bones for physics collision shapes. In practice, the back-end loads a stripped down j3o… using a completely separate BinaryImporter, completely separate buffers, etc… it doesn’t even have an AssetManager. So no sharing there. I also confirmed that the instances of AnimComposer/SkinningControl on the “back end” were actually different than the ones loaded for visuals.

By “back end” here I mean that the gaming thread runs separately behind a network server and the client connects through the networking and uses SimEthereal children to keep track of the animations, animation times, etc… but “back end” and “front end” are running in the same Java process.

I also rechecked everything to do with timing and still had issues. If I turned off the timing pump on the client then the problem went away. If I turned off the timing pump on the server then the problem went away. Furthermore, the back-end model was also glitching. If I turned on my own physics debug shapes I could occasionally see them wigging out also.

Definitely threads sharing state that they shouldn’t be.

So I looked through the JME anim code… it’s rife with static and stateful objects. Like, don’t even know exactly the right way to fix it yet, rampant.

At the lowest level are the AnimInterpolators. Several of those are global static constants that internally keep state objects:

All animations will share the same NLerp instance, for example.

Even worse, FrameInterpolators (by default) is also shared across all morph and transform tracks. They refer to the default instance:

…and if you scroll down you can see all kinds of state that FrameInterpolator holds onto. This was the source of the IndexOutOfBounds exceptions as one thread tried to update a back-end instance of the robot rig while the front end tried to update a completely separate Mythruna-girl model.

I’ve temporarily hacked my JME so that AnimInterpolators no longer keeps static reference but provides static factory methods. I then modified FrameInterpolator to create its own instances instead of using the static constants for those interpolators… and then modified MorphTrack and TransformTrack to create their own FrameInterpolator.

That fixes the problem but creates a couple thousand FrameInterpolator instances. Really, one per “rig” would be enough to avoid all of the problems.

To be clear, this is kind of a serious issue and could even happen to folks loading spatials on a separate thread to later add to the scene. If you decide that you want to preset the pose of that model in any way then you potentially create visual glitches and/or crash your application.

That’s a big surprise.

I’m still trying to figure out the best way to fix this “for real”. I’m not sure that it can be done without a few breaking changes… but we’ll see. The current code did something convenient-but-lazy and now we pay the price.

11 Likes

Problem report: New animation system: shared stateful static objects cause unexpected threading issues · Issue #1806 · jMonkeyEngine/jmonkeyengine · GitHub

And a few example screen shots of glitching:

Versus non-glitching:

5 Likes

reminds me of the liquid robot from the terminator movies.

i did load models in other Threads and attach animations to them, but i didnt seen issue.

But it was in some earlier JME version. Hard to tell now, would need to check this myself.

AS i understand its about issues when load model in Thread? or is it related to networking?

How could i replicate it?

As it stands, if you load Jaime.j3o into a scene and let him run around while meanwhile loading a totally separate model in another thread... posing or modifying that animation from that other thread will cause Jaime to glitch or crash. Which breaks the mental threading model under which JME normally operates.

so just load Model A in main thread and run its animation and later load Model B in multi-thread and run its animation and moment later modify animation from Model B(what you mean modify? change anim?) i should replicate?

1 Like

Load a Jaime model and make him walk.

Load a totally separate animated model from another thread and run its animation from that thread (updateLogicalState() in a loop or whatever).

As long as they are both using AnimComposer, there will be glitches and possible crashes… especially if the animation sizes are very different.

4 Likes

here i made small TestCase(i know its not thread safe, but generally works), but im unsure how to replicate, can you help edit it a little to be able to replicate?

search in code for “// here use updateLogicalState() ?!” - because i dont understand what you mean by “(updateLogicalState() in a loop or whatever).” .

TestCase load model in main thread and other models in another thread, same about animations start to play.

Test.Java:

package test;

import com.jme3.anim.*;
import com.jme3.anim.util.AnimMigrationUtils;
import com.jme3.app.ChaseCameraAppState;
import com.jme3.app.SimpleApplication;
import com.jme3.input.KeyInput;
import com.jme3.input.controls.ActionListener;
import com.jme3.input.controls.KeyTrigger;
import com.jme3.light.AmbientLight;
import com.jme3.light.DirectionalLight;
import com.jme3.math.*;
import com.jme3.scene.Node;
import com.jme3.scene.Spatial;
import com.jme3.scene.control.Control;
import com.jme3.scene.debug.custom.ArmatureDebugAppState;
import java.util.LinkedList;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 * Test For animation Issue
 */
public class Test extends SimpleApplication {

    private ModelAPI modelAPI;
    private LinkedList<ModelAPI> modelAPIList = new LinkedList<>();
    public static Test app;

    public static void main(String... argv) {
        app = new Test();
        app.start();
    }

    public static <T extends Control> T findControl(Spatial s, Class<T> type) {
        T result = s.getControl(type);
        if (result != null) {
            return result;
        }
        if (s instanceof Node) {
            for (Spatial child : ((Node) s).getChildren()) {
                result = findControl(child, type);
                if (result != null) {
                    return result;
                }
            }
        }
        return null;
    }

    public static class ModelAPI {

        protected AnimComposer composer;
        protected ArmatureDebugAppState debugAppState;
        protected LinkedList<String> anims = new LinkedList<>();
        protected Spatial model;
        protected String modelPath;
        protected boolean migrate;
        protected float scale;
        protected Vector3f offset;

        public ModelAPI(String modelPath, boolean migrate, float scale, Vector3f offset) {
            this.modelPath = modelPath;
            this.migrate = migrate;
            this.scale = scale;
            this.offset = offset;
            loadModel();
        }

        public void loadModelWork() {
            model = Test.app.assetManager.loadModel(modelPath).scale(scale).move(offset);
            if (migrate) {
                AnimMigrationUtils.migrate(model);
            }
            Test.app.rootNode.attachChild(model);
            setupModel();
        }

        public void nextAnimWork() {
            String anim = anims.poll();
            anims.add(anim);
            composer.setCurrentAction(anim);
            System.err.println(anim);
        }

        public void nextAnim() {
            nextAnimWork();
        }

        public void loadModel() {
            loadModelWork();
        }

        public void setupModel() {
            debugAppState = new ArmatureDebugAppState();
            Test.app.stateManager.attach(debugAppState);
            debugAppState.setEnabled(false);
            composer = Test.findControl(model, AnimComposer.class);
            SkinningControl sc = Test.findControl(model, SkinningControl.class);
            if (sc != null) {
                debugAppState.addArmatureFrom(sc);
            }
            anims.clear();
            for (String name : composer.getAnimClipsNames()) {
                anims.add(name);
            }
        }
    }

    public static class ModelAPIThreaded extends ModelAPI {

        public class WorkThread extends Thread {

            public boolean load = false;
            public boolean nextAnim = false;

            public void run() {
                while (true) {
                    if (load) {
                        System.out.println("Thread load model: " + (model != null ? model.getName() : "no model"));
                        loadModelWork();
                        load = false;
                    }
                    if (nextAnim) {
                        System.out.println("Thread nextAnim model: " + (model != null ? model.getName() : "no model"));
                        nextAnimWork();
                        nextAnim = false;
                    }
                    if(model != null) {
                        model.updateLogicalState(0.01f);
                    }
                    // here use updateLogicalState()  ?!
                    
                    try {
                        sleep(10);
                    } catch (InterruptedException ex) {
                        Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
                    }
                }
            }
        }

        public WorkThread updateThread = new WorkThread();

        public ModelAPIThreaded(String modelPath, boolean migrate, float scale, Vector3f offset) {
            super(modelPath, migrate, scale, offset);
            updateThread.start();
        }

        @Override
        public void nextAnim() {
            updateThread.nextAnim = true;
        }

        @Override
        public void loadModel() {
            Test.app.enqueue(() -> {
                updateThread.load = true;
            });
        }

    }

    @Override
    public void simpleInitApp() {
        rootNode.addLight(new DirectionalLight(new Vector3f(-1, -1, -1).normalizeLocal()));
        rootNode.addLight(new AmbientLight(ColorRGBA.DarkGray));
        //JME Testdata models copy from TestData package Models folder
        modelAPIList.add(new ModelAPI("Models/Oto/Oto.mesh.xml", true, 0.12f, new Vector3f(-2, 0.7f, 0)));
        modelAPIList.add(new ModelAPIThreaded("Models/Sinbad/Sinbad.mesh.xml", false, 0.12f, new Vector3f(-1, 0.7f, 0)));
        modelAPIList.add(new ModelAPIThreaded("Models/Elephant/Elephant.mesh.xml", false, 0.01f, new Vector3f(0, 0, 0)));
        modelAPIList.add(new ModelAPI("Models/Jaime/Jaime.j3o", true, 0.7f, new Vector3f(1, 0, 0)));
        modelAPI = modelAPIList.poll();
        modelAPIList.add(modelAPI);
        flyCam.setEnabled(false);
        Node target = new Node("CamTarget");
        target.move(0, 1, 0);
        ChaseCameraAppState chaseCam = new ChaseCameraAppState();
        chaseCam.setTarget(target);
        getStateManager().attach(chaseCam);
        chaseCam.setInvertHorizontalAxis(true);
        chaseCam.setInvertVerticalAxis(true);
        chaseCam.setZoomSpeed(0.5f);
        chaseCam.setMinVerticalRotation(-FastMath.HALF_PI);
        chaseCam.setRotationSpeed(3);
        chaseCam.setMinDistance(0.01f);
        chaseCam.setZoomSpeed(0.01f);
        chaseCam.setDefaultVerticalRotation(0.3f);
        chaseCam.setDefaultDistance(5);
        initInputs();
    }

    public void initInputs() {
        inputManager.addMapping("nextModel", new KeyTrigger(KeyInput.KEY_RIGHT));
        inputManager.addListener(new ActionListener() {
            @Override
            public void onAction(String name, boolean isPressed, float tpf) {
                if (isPressed) {
                    modelAPI = modelAPIList.poll();
                    modelAPIList.add(modelAPI);
                }
            }
        }, "nextModel");
        inputManager.addMapping("nextAnim", new KeyTrigger(KeyInput.KEY_UP));
        inputManager.addListener(new ActionListener() {
            @Override
            public void onAction(String name, boolean isPressed, float tpf) {
                if (isPressed) {
                    modelAPI.nextAnim();
                }
            }
        }, "nextAnim");
        inputManager.addMapping("toggleArmature", new KeyTrigger(KeyInput.KEY_SPACE));
        inputManager.addListener(new ActionListener() {
            @Override
            public void onAction(String name, boolean isPressed, float tpf) {
                if (isPressed) {
                    modelAPI.debugAppState.setEnabled(!modelAPI.debugAppState.isEnabled());
                }
            }
        }, "toggleArmature");
    }

}

build.gradle:

apply plugin: 'java'
apply plugin: 'application'

mainClassName='test.Test'

ext {
   jmeVersion = '[3.3,)'
}

repositories {
    jcenter()
    mavenCentral()
}

dependencies {
    implementation "org.jmonkeyengine:jme3-core:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-desktop:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-lwjgl3:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-bullet-native:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-bullet:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-effects:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-jogg:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-jogl:$jmeVersion"
    implementation "org.jmonkeyengine:jme3-plugins:$jmeVersion"
}

Arrow-Right → change current model
Arrow-Up → play anim / next anim
Space → add skeleton debug

1 Like

You never update the model on the other thread so the animation never actually interpolates anything.

By “call updateLogicalState() in a loop or whatever”, I meant exactly that. In your while true loop call model.updateLogicalState(0.01f) or whatever.

ie: make the animation actually do something. FrameInterpolator isn’t used until it actually tries to interpolate something.

3 Likes

yes, i replicated when add

                    if(model != null) {
                        model.updateLogicalState(0.01f);
                    }

into there. (updated above code)

Right, there is issue.

But why would thread call updateLogicalState? should not all updateLogicalState happen in main thread anyway?

Even Physics multi-threaded was about sync with main thread for update logic?

It did happen for you when you mix Animation + Minie physics somehow right?

here just quick video from TestCase(everyone can now replicate themself - issue start to be visible since 6 second of video):

1 Like

So that whatever pose/animation you’ve added actually gets applied. Maybe you need to run the animation forward until the feet are in a certain position or just want to check bone positions at some frame.

Calling updateLogicalState() over and over is just so that you are more likely to see it since it’s ultimately a timing issue. But to me, calling AnimComposer.update() from some other thread on an AnimComposer not associated with any scene anywhere… shouldn’t cause the application to crash.

Can you think of another JME class that works this way?

I have one thread animating an AnimComposer directly and then looking at the bones to position collision shapes in my own physics engine. This rig doesn’t even have any mesh data. I just set time and update the anim composer so the bones get interpolated.

Meanwhile, on the render thread, totally different models will glitch or cause the application to crash with IndexOutOfBounds exceptions.

3 Likes

Ah ok. I belive noone still report this, because Minie + animations somehow did not have this issue themselfs.

But yes, issue is there. Just wanted to help by provide TestCase so anyone can just copy paste and replicate issue. Myself i feel too stupid to mess arround animation thread-unsafe code.

1 Like

May be Mesh#updateBound(), it actually leads to monitor issues when it gets updated from another thread unless you apply a semaphore (sometimes not a good practice).

1 Like

False.

Mesh A not connected to the scene in any way, ie: loaded directly from a file. Call Mesh.updateBound() a million times and it will never affect any other totally separate Mesh.

If you have Mesh A that is SHARED_ by the render thread and the main thread, then yes you will obviously have an issue.

But I think you are still misunderstanding the gravity of what I’m saying.

If you load Jaime.j3o from the render thread from the regular asset manager. And you load Robot.j3o from a totally separate thread, totally separate AssetManager, never ever attach it to the main scene or otherwise give it to the render thread. ie: the Render thread has no knowledge of this robot model and has never touched it in any way…

It’s possible to do things to the robot model that crash the render thread.

That’s bad.

3 Likes

Yeah, and thanks for that.

I was trying to preempt a bunch of other folks from asking “Why would you even do that?!?” Didn’t mean to sound defensive to your post.

JME has a particular threading model in mind and animation breaks that.

2 Likes