My August project is to add multithreaded physics spaces to Minie. Currently, I’m stuck. This post provides background. My next post will be status report and a plea for help/advice.
As the complexity of a Minie physics simulation increases, it tends to become CPU-bound. Modern computers provide multiple CPU threads, allowing software threads (lightweight processes) to execute multiple tasks in parallel. The Bullet software that underlies Minie already includes code to exploit task-level parallelism on multiple CPU threads. By exposing this feature in Minie, I hoped enable real-time simulation of more complex worlds, with greater accuracy.
Multithreaded physics spaces are not the only parallelism available, nor are they a panacea for all Minie performance issues.
From jme3-bullet, Minie inherited ThreadingType.PARALLEL
, the capability to dedicate a single Java thread to physics. This allows physics simulation to proceed while the rendering thread is blocked, or (potentially) in parallel with rendering. Bullet’s multithreading occurs at a lower level and a finer grain. It is orthogonal with this feature.
Most physics games use a single physics space. If a game used multiple physics spaces, one could dedicate a Java thread to each physics space. Again, that’s orthogonal with the feature I’m pursuing.
Bullet v3 reportedly has the capability to exploit the SIMD parallelism provided by graphics adapters. Minie is still based on Bullet v2, so it doesn’t yet have access to this feature.
AFAIK, Bullet’s multithreading support is limited to btDiscreteDynamicsWorld::stepSimulation()
, which corresponds to Minie’s PhysicsSpace.update()
. If there’s a performance benefit, I expect it will be most pronounced for a PhysicsSpace
with a large number of dynamic rigid bodies. I don’t expect any speedup for soft bodies, multibody objects, sweep tests, ray tests, or contact tests. I’m unsure how much multithreading will benefit kinematic rigid bodies, ghost objects, or characters.
Not all of stepSimulation()
is parallelized, and threads will doubtless conflict somewhat over shared resources such as locks and memory bandwidth. So even in ideal cases, I don’t expect stepSimulation()
to execute 12x faster on hardware with 12 CPU threads.
Bullet exploits task-level parallelism using an abstraction layer. The layer interfaces to 3 different thread-management APIs: OpenMP, Microsoft’s Parallel Patterns Library, and Intel’s Threading Building Blocks. To exploit task-level parallelism, the BT_THREADSAFE
macro must be defined, and a thread-management API must be selected at compile time.
Meanwhile, open-source development of Bullet appears to have stopped. There have been no commits to the main repo since May, and no official releases since November. Most of the user documentation is in a PDF that hasn’t been updated in 6 years.