Thread-Based TaskManager System

Pavl_G · May 31, 2022, 11:59pm

Hello fellows,

I am currently learning advanced software architectures and design patterns, so i thought what’s the best place to apply what i learn ? and searched the old jme issues and found 2 similar issues that could be solved with one feature a AppTaskManager which will have a binding similar to an appstate to inter-work with the jme application update (Mainly using AppTasks with java.util.concurrent and Futures i think), so what do you think ? Do you have any inputs about this ? before getting started to layout my work on this feature and start implementing it

Initial Approach:

Like any jMonkeyEngine application manager architecture, the AppTaskManager or TaskManager we will decide on the final name later on, should be accessible via the SimpleApplication using getTaskManager() and so users can asynchronously run heavy time consuming tasks on other threads (using the CPU-cores) and then the TaskManager incorporates the result of those FutureTasks to the jMonkeyEngine update.
Then, we could forcefully make the AssetManager use the TaskManager to load assets internally too and this will fix our second issue…
So, what’s the plan ? I will try to implement the TaskManager and work it locally on my machine as a user code, then i will test it with loading assets and after that i will use the TaskManager inside AssetManager#loadAsset(), generally in separate PRs.
There is still a lot more, i haven’t decided on a final architecture yet, but you can always help by replying here or DM me (i really appreciate those replies even if criticizing).

The issues:

github.com/jMonkeyEngine/jmonkeyengine

Multithreading in AssetManager

opened 07:34PM - 24 Mar 14 UTC

ghost

enhancement

Multithreading is an important issue right now. Taking advantage of it in jME3 i…s critical. Some areas where multithreading can be used: - Skeleton animation, mesh skinning - Particle update - Physics - Networking - Input polling - OGG/Audio streaming - Resource loading - Terrain streaming The details on how to implement these areas using multithreading will be specified later. --- Some parts of jME3 are already multi-threaded: Physics runs in a parallel thread. Networking runs in a parallel thread. Audio and OGG streaming is done on a parallel thread. Terrain streaming/LOD is done in a parallel thread. jME3 is still missing asset loading on a separate thread, which can be critical if the application loads assets dynamically. --- The AssetManager is the last part of jME3 that needs to take advantage of multithreading. Reference on googlecode: https://code.google.com/p/jmonkeyengine/issues/detail?id=94

See the google code archive link for the feature request advantages:

https://code.google.com/archive/p/jmonkeyengine/issues/535

pspeed · June 1, 2022, 2:02am

Here is SiO2’s solution:

It uses a priority-based thread pool and job tasks that have a “run on background” and “run on update” pass. It can even limit how much it runs on the update thread because often it’s not the asset loading that’s the problem but dumping too many things into the scene in one update pass.

The above is the cornerstone of how Mythruna generates its world which requires multiple layers of CPU-intensive generation of different priorities, etc…

These things should not really be incorporated directly into AssetManager because that is too low-level. 9 times out of 10, there is more than just loading the asset that needs to be done and that prep can still be done on the separate thread. Plus, sometimes when you load an asset then you want it right now… you need to wait for it because you will do some manipulation. Without a continuation architecture (Jave does not have one) you can’t continue the render thread just to do that… so either you block anyway (defeating the purpose of the background thread) or you must kick it off and be called back. Futures are not a good idiom for something you already know you want to just be called back later… since that future will already know when it’s done. It’s wasteful to poll it every frame just to see.

As such, an app state makes the most sense. It already has all of the init/cleanup/update hooks that it needs and can easily be looked up from anywhere. (In fact, most of the things that LegacyApplication hardcodes in directly probably should have been app states.) I’m kind of against hard-coding any more new fields into LegacyApplication or SimpleApplication… too many already.

If you do choose to integrate something like this right into the low-level asset manager caching then you should really really really (like be totally stupid not to) look into Guava’s loading cache. It’s a thing that’s very hard to get right and LoadingCache does it perfectly already.

…but then to meet the loadAsset() contract you would have to use Futures and a thread pool anyway and then block on the result. In which case, it might as well have been loaded right on the main thread.

Asynchronous threading meets a constantly polling architecture like JME, gives some interesting benefits and some weird constraints that make a lot of “academic” task architectures less useful.

(In case it wasn’t obvious, I disagree with the PR you commented on… loading assets on a background thread is already pretty trivial on its own. An architecture should bring additional purpose over new Thread().run { do it, enqueue it} which can already be done in three lines of code now and is more powerful than anything we’d embed.)

Aufricer · June 1, 2022, 7:40am

Not sure what is the best approach if we talk about architecture.
But I like the effort @Pavl_G is willing to put.

Could a detailed investigation about both approaches be something that Pavl_G is interessted in ?
An ready AppState in UserCode/Softwarestore could be a thing for others to use? But would it be similiar/same to what pspeed has shared?
I cant judge about changes to “AssetManager/AppTaskManager” but tend a bit towards following pspeeds assumptions.

Whatever the outcome is, thanks for working on those issues.

Pavl_G · June 1, 2022, 8:26am

Thank you guys for your support ! I really appreciate that.

@pspeed Thanks for these insightful resources, and yeah i completely agree with you that the architecture should bring more values than just normal java threading, and it should be optimized in an appstate and shouldn’t reinvent the State~Update combo, i will study the SiO2’s and guava’s approach, though there is a lot of similarity in logic between guava and android internal thread management, as you might know android uses 2 main threads on the same Looper, a choreographer (ui-thread) and a GLES thread…

There are also a lot of useful very low-level stuff like IPC (Interprocess communication) and message queues that i am already studying while learning GNU/Linux C and i think it may help us dramatically.

Android OS core uses a Handler to do the IPC with the Activities/Screens from other threads like GL thread, Assets loading threads…etc and it has an adapter for java executors too…i am thinking of similar things, but to be honest i will completely decide after studying both SiO2 and guava…

I will let you know soon of my final layout for this feature.

pspeed · June 1, 2022, 9:06am

Just note that for a long time C’s version of threading and IPC was like banging rocks together to make sparks where Java already had hi tech computers since it built threading in from the beginning.

My reference to Guava was about the LoadingCache specifically. This is like a Map that makes sure that threaded access to a key’s value is safe even if it has to load it.

So if 50 threads call cache.get(myId) and the value hasn’t been loaded yet, it is guaranteed to only be serviced by only one thread. (If it has already been loaded then it will be returned right away.) Then they put a bunch of expiry logic in there. I actually think that DesktopAssetManager should probably already be using LoadingCache. My memory says it cobbled together its own version of threaded protection and I have about 50/50 confidence that it did it without odd edge-case bugs or more likely just unnecessary inefficiencies. But so far so good, eh? My guess is that users don’t hammer the asset manager’s thread safety much but if we were to integrate a futures model, we definitely would want to bullet-proof all of that.

Threading architectures that don’t support continuations (something I really miss in JVM-based scripting languages but not in Java) can follow only a few different patterns. (Kind of making up some of these terms for simplicity and to be unambiguous)

Simple async: fire and forget (either with a pool or not)

Polling future: run the task, constantly check it for completion, do a thing on completion

Callback: run the task, task will call us back when done. (Not really any better than chaining runnables, really)

Multiphase: run the task, called once on a background thread, called again on the main thread (sort of a managed callback situation with the callback built into the task)

Lockstep: multiple threads run in parallel, they all gather together on a latch/semaphore when done. JME Bullet’s parallel threading does something like this. I could probably write a small whitepaper on why this is my least favorite form of threading but the short answer is: everyone pays for the slowest thread and in Java the memory barriers are left a little ambiguous. Just because everybody syncs together at the end (or every frame) doesn’t necessarily mean that they have a consistent view of shared memory. (Memory barriers and thread memory models is a way deeper topic.)

A game, to me, has some important constraints:
The app is already polling 60 times a second… and we want to do everything in our power to never ever slow that down. = minimize update/render thread impact.

We do not want 100 threads to suddenly swamp the CPU randomly. (If you only have ‘n’ cores and now ‘n’ threads are running 100% then no one else can run at all.)

Also, thread creation can be memory-expensive and so randomly creating them on the fly can do interesting things to several levels of the heap leading to extra GC.

We would like to avoid sending 1000s of new data objects to the GPU in one frame if possible.

Sometimes you have things that should run “right now” and some things that can wait until later. For example, you want the tree that just popped into view to load “right now” while the path finding calculation could maybe wait a little bit.

To me, the above favors prioritized thread pools and a metered multiphase approach.

So some kind of Job/Task interface like:

public interface Job {
    public void runOnWorker();
    public double runOnUpdate();
}

Where runOnWorker() is called on one of the pool threads. When it’s complete, it’s added to a done queue that is drained on the update thread… which then calls runOnUpdate().

runOnUpdate() in this case can return a ‘load factor’ which is an estimate of how much impact it thinks it will have on the frame. For example, a Job that will not modify the scene graph at all can return 0 while one that is about to attach some big scene graph objects can return a higher number.

The process draining the done queue can then decide to stop early and wait for the next frame if too much ‘impact’ has gone by.

Super simple example:

class LoadModel implements Job {
    private Spatial model;
    @Override
    public void runOnWorker() {
        this.model = assetManager.loadModel("MyModel.j3o");
        // Do some other stuff if needed
    }
    @Override
    public double runOnUpdate() {
        rootNode.attachChild(model);
        return 1.0;
    }
}

getState(JobState.class).execute(new LoadModel());

And I’m not ashamed to say that I’ve arrived at this approach after failing to do this right at least 5 times… it’s also the approach that SiO2’s job stuff uses. I got tired of rewriting thread pooling wrong a bunch of times for old Mythruna and my other tech demos and finally got something I can use everywhere.

I also have some negative experiences with enterprise level software whose threading architecture is very toxic to games. (Accumulo being the worst of these.) Pitfalls abound.

Pavl_G · June 1, 2022, 9:28am

pspeed:

Super simple example:
class LoadModel implements Job {
    private Spatial model;
    @Override
    public void runOnWorker() {
        this.model = assetManager.loadModel("MyModel.j3o");
        // Do some other stuff if needed
    }
    @Override
    public double runOnUpdate() {
        rootNode.attachChild(model);
        return 1.0;
    }
}

getState(JobState.class).execute(new LoadModel());
And I’m not ashamed to say that I’ve arrived at this approach after failing to do this right at least 5 times… it’s also the approach that SiO2’s job stuff uses. I got tired of rewriting thread pooling wrong a bunch of times for old Mythruna and my other tech demos and finally got something I can use everywhere.

Brilliant, thanks again for sharing SiO2/Mythruna approaches and your experiences as this will help a lot in the process of instantiating this feature, i can link your approach as an IPC at a higher level, as you communicate back to the jme3 update thread…really useful stuff.

Pavl_G · June 5, 2022, 12:14am

Those are some testcases i did today, from what i already know (based on IPC and java multi-threading) and from a quick glance on SiO2, the idea is very simple:

A factory class (MonkeyBinder) that creates a Daemon thread, each daemon process binds to a newly attached BaseAppState, and so each Daemon has it’s own binder (the current factory) and the AppState (for publishing final results to the update).
A Work<T> class with T async and void start(T asyncReturn).
The MonkeyBinder class can add some queued Work to its Daemon and the results gets published to the appstate in a form of a method execution.
Works runs synchronously in their Daemon, but asynchronously with other works from other daemons

In this current pattern, users can launch their own independent Daemons, so that each Daemon works by its own (without the knowledge of others) and results got published on the update using the Daemon binder…

Here is the core code:

Here is an example:

github.com

Scrappers-glitch/JectorTests/blob/48be05d17ede90f1119db8db558d60e2c93eb0ba/src/main/java/test/monkeythreads/TestMonkeyAssetLoader.java#L20-L38

      
        
            @RunOn(thread = Threads.DAEMON)
            @Override
            public Object async() {
                Box b = new Box(1, 1, 1);
                Geometry geom = new Geometry("Box", b);
            
            
    Material mat = new Material(application.getAssetManager(), "Common/MatDefs/Misc/Unshaded.j3md");
                mat.setColor("Color", ColorRGBA.Blue);
                geom.setMaterial(mat);
            
            
    return geom;
            }
            
            
@Override
            @RunOn(thread = Threads.LOOPER)
            public void start(Object asyncReturn) {
                application.getRootNode().attachChild((Spatial) asyncReturn);
                System.out.println("Finished loading my asset " + ((Spatial) asyncReturn).getName());
            }

github.com

Scrappers-glitch/JectorTests/blob/590641469a15f764d97bc385cb53d12fa55601ea/src/main/java/test/monkeythreads/TestMonkeyThreading.java#L32-L38

      
        
            @Override
            public void simpleInitApp() {
                heavyDutyBinder = MonkeyBinder.createMonkeyIPC(stateManager, "Heavy Duty Stuff");
                heavyDutyBinder.addDaemonWork(new TestMonkeyDaemonBinder());
            
            
    assetLoaderBinder = MonkeyBinder.createMonkeyIPC(stateManager, "Load Assets");
                assetLoaderBinder.addDaemonWork(new TestMonkeyAssetLoader(this));

Of note, i am currently using annotations in these examples to mark a DAEMON method VS a LOOPER (jmonkey state) method, but if we plan to do this on jme, it will be using normal java interfaces (as android reflection calls are blacklist now).

EDIT:
There are still a lot of things, like if a Work fails to publish a result, what will happen ? Should we add a BackOff or Retry Criteria logic or is it dead-end or we just throw some error, but a network call can be missed while the app still run normally, we can also add work priorities in which we sort the works array accordingly, those are still WIP…

pspeed · June 5, 2022, 3:09am

Things your approach doesn’t seem to cover:

Priority ordering.
This is critical for terrain generation style apps that will want to render the terrain closest to the player soonest.
Metered updates.
This is critical for anything that might load more than one thing a frame and wants to never drop frames. And from experience, metered updates requires some way for the update-loop part to indicate whether it did anything or not. (Think of something like minecraft with 3D chunks where a good deal of them will have no geometry at all… but you don’t know that until you generate it on the background thread.)
Separate thread pools.
At least it seems like this is missing from your example.

The SiO2 code has all of things and so far it feels like your approach doesn’t add anything… but maybe I’m missing something from your implementation?

It also may be that you build your apps with less desirable JME patterns/idioms that leads to these sorts of “pass a state manager to a thing” rather than just attaching an app state. Might be worth discussing that, too, perhaps.

Pavl_G · June 5, 2022, 7:17am

I guess if this is just a field priorityLevel and we sort an ArrayList using this field, then that’s okay and i am thinking of good other usages for this, like displaying multiple things in a chain for instance (could be useful in GUI too)…

Well, i am not sure if i am fully understand you here, but the Binder locks the current Daemon thread until the task results arrive where in that point the update-loop hasn’t attached a binding task yet, if you have a direct example for this i will appreciate that.

The initial approach (or the testcases above) work independently, Works inside a single thread are queued, but threads aren’t, this requires another utility to drive the pools.

Well, i have just learned the basics , ofc this will take sometime to reach the level of good old SiO2 code, but i am seeking a very simple model (and powerful one at the same time).

pspeed · June 5, 2022, 7:21am

Oh, god, no. No reason to sort a whole list just to order things by priority.

I don’t understand what this response means.

Imagine you want to load 4000 chunks so you queue them up. In one update pass, 200 of those are now available for running on the update thread. Do you run them all and drop frames?

If you only run one or two per frame, what if 100 of those will do nothing because after loading the chunk they found it is empty space and now will not add anything to the scene graph. So do you waste 100 frames before you let the others update?

Well, when it looks exactly like what’s in SiO2, I guess you will finally be there.

You could start from the other way and explain what the problems are with the SiO2 job stuff and I could tell you the reasons they are that way.

Pavl_G · June 5, 2022, 7:33am

The current approach above isn’t frame-based, Works in the same thread are queued and each work is marked finished if it returns and then the results are posted to the update, so yeah i guess this should be synchronized with the frames, so we don’t load a 4000 chunk part in a single frame…

EDIT:
Thanks for your input, it’s great and i think these things can be elicited only when testing a huge project (like Mythruna), i may try the Blocks library later on for sort of testing (and ofc learning new stuff).

Pavl_G · June 5, 2022, 11:07am

I guess the right terminology for this, is a frame pacing system, Correct ? (That’s a very large topic under the hood, but worth trying).

EDIT (this is SurfaceFlinger: a frame pacing lib for android, but they have the logical background for the pattern):

pspeed · June 5, 2022, 11:10am

I don’t know what to call it… but I’ve been doing it since the original version of the Mythruna engine.

…pretty sure it’s even in the IsoSurfaceDemos.

Critical for maintaining frame rates without dropping frames every time you spam the worker threads.