KK Physics and Jolt JNI

pspeed · December 31, 2024, 9:03pm

My C-fu is very rusty but I wonder if you can extend the physics system class to have some custom destruction behavior just for your JNI layer. JNI to C++ was always way more “fun” than straight C.

…but I think you really start to get to the point where you are extending every class to wire up some things like that. (And maybe that becomes true anyway.)

Yeah, a concurrent hash map from ragdoll to physics system might work. Generic enough, it could be used for any ordered dependencies. (Just have to be careful of cycles in the general case.)

sgold · December 31, 2024, 10:10pm

if you pass a strong reference to the the cleaner runnable for example it should keep the physics space alive until all cleaner actions have been executed.

I like this idea. It’s simpler than the static map, very elegant.

zzuegg · January 1, 2025, 9:26am

I kind of hava a bad feeling regarding the JIT. It might at some point optimize the reference away.

sgold · January 1, 2025, 5:30pm

I don’t know much about the JIT. Is there some way to prevent optimization in this particular case? For instance, would passing the reference to native code be sufficient?

zzuegg · January 1, 2025, 6:36pm

Tbh the jit is a black box for me too. Calling anything native sounds quite save since i cannot see how the jit can say it is unnecessary.

I have never seen a log dissapear, but then i cannot say for sure what is happening when i have no logger attached.

Jmh uses it’s blackhole.

github.com

openjdk/jmh/blob/master/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java

/*
 * Copyright (c) 2005, 2013, Oracle and/or its affiliates. All rights reserved.
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
 * under the terms of the GNU General Public License version 2 only, as
 * published by the Free Software Foundation.  Oracle designates this
 * particular file as subject to the "Classpath" exception as provided
 * by Oracle in the LICENSE file that accompanied this code.
 *
 * This code is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 * version 2 for more details (a copy is included in the LICENSE file that
 * accompanied this code).
 *
 * You should have received a copy of the GNU General Public License version
 * 2 along with this work; if not, write to the Free Software Foundation,
 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
 *

This file has been truncated. show original

pspeed · January 1, 2025, 7:02pm

Most of the time folks find cases of “hotspot broke my code”… it’s actually something else.

I give the JMH guys the benefit of the doubt because I don’t have time to look deeper… but even that’s a little sus. However, what they are trying to do is going to provoke the beast no matter what. There is also the assumption that “they know what they are doing”. (But in my life I found bugs in the JDK itself so…)

Regardless, strong references should be strong. JIT/Hotspot should not remove them because it can’t possible know “everything that is about to come”. Hotspot has more information on actual use but it’s also supposed to fallback to the original code if something triggers a different use than expected. (It’s been probably 10 years since I poked around in the JVM and hotspot code, though.)

In this case, don’t worry about it until there is cause to worry about it… then assume that it’s literally anything else until you can prove otherwise.

zzuegg · January 1, 2025, 7:32pm

Maybe i am concerned because it looks like a too simple optimisation. you have a runnable, you pass a reference, the reference is never used, drop it.

pspeed · January 1, 2025, 9:44pm

I haven’t seen the implementation yet so I can’t comment… but if it’s somehow retrievable (ie: not private final and never used or accessed through getters) then I could “maybe” see it.

…but something is going to clear it later, I imagine. So it IS used on some level.

zzuegg · January 1, 2025, 10:17pm

Afaik we are talking about a Runnable passed to a java.lang.Cleaner. (At least currently stephen is using that) Keeping the reference to the physics space alive as long as the object exists to avoid that the physics space get’s GC’ed first. No need to clear it. and could be a lambda or whatever.

sgold · January 7, 2025, 8:33am

Jolt-jni v0.9.4 was released today.

sgold · January 9, 2025, 4:19am

For reasons of convenience and portability, jolt-jni provides single-precision methods to calculate transcendental functions (like sqrt() and acos()). These methods invoke the standard C++ libraries via JNI.

For many weeks I’ve been curious how these methods compare with the corresponding methods in JME’s FastMath class. Today I went off on a wild tangent, creating a small benchmark to satisfy my curiosity. For fun, I included the Apache Commons Mathematics Library and ran the benchmark on 3 different operating systems.

Here are the highlights:

Jolt-jni’s single-precision transcendentals were fastest in most cases, for all functions except sqrt(). The exceptions were the acos(), cos(), and atan() functions on macOS, for which jme3-core was slightly faster.
On sqrt, Jolt-jni was slowest. There jme3-core was fastest on Windows and Linux. It tied with Apache on macOS.

I was very surprised by these results. I expected Apache’s fancy library to outperform jme3-core, which basically invokes the double-precision methods in java.lang.Math and rounds off the results.

I realize JVM microbenchmarking is tricky, and I may have screwed up. Feel free to look at my source code and tell me what I might’ve done wrong.
I realize transcendentals shouldn’t be a major performance factor in most Java games. (If you use trig functions in a JME game, you’re probably doing something wrong.)
I haven’t tested Jolt-jni transcendentals for accuracy and corner cases, so there might be non-performance reasons to avoid them.

But still … if you really care about transcendental performance, you ought to consider Jolt-jni … even if you don’t need physics!

zzuegg · January 9, 2025, 11:15am

jmh

On my machine (windows, amd threadripper gen1, jdk 23) jme is fastest everywhere. I kind of expected jni to be slower so i was surprised when i saw your results.

Benchmark      Mode  Cnt          Score         Error  Units
Apache.acos   thrpt    6    4712482.986 ±   26453.946  ops/s
Apache.atan   thrpt    6    9727186.835 ±   44088.644  ops/s
Apache.cos    thrpt    6   13640811.639 ±  153369.150  ops/s
Apache.exp    thrpt    6   25338849.555 ±  379859.253  ops/s
Apache.sin    thrpt    6   16111396.862 ±   20074.039  ops/s
Apache.sqrt   thrpt    6  245998014.461 ± 2817529.494  ops/s
Jme.acos      thrpt    6   45592065.582 ±  652217.077  ops/s
Jme.atan      thrpt    6   33546400.454 ±  265087.561  ops/s
Jme.cos       thrpt    6   21541761.276 ±  240875.152  ops/s
Jme.exp       thrpt    6   36710736.971 ±  486189.029  ops/s
Jme.pow       thrpt    6   18450687.360 ±  197482.305  ops/s
Jme.sin       thrpt    6   21620147.202 ±   17265.555  ops/s
Jme.sqrt      thrpt    6  245862369.841 ± 1596236.355  ops/s
JoltJni.acos  thrpt    6   23845358.371 ±   95630.345  ops/s
JoltJni.atan  thrpt    6   19062854.434 ±   78862.304  ops/s
JoltJni.cos   thrpt    6   19582181.390 ±   31024.843  ops/s
JoltJni.exp   thrpt    6   20726750.109 ±  276036.796  ops/s
JoltJni.pow   thrpt    6   15145657.697 ±   70791.096  ops/s
JoltJni.sin   thrpt    6   19584326.794 ±  517179.050  ops/s
JoltJni.sqrt  thrpt    6   41073810.499 ±  819028.772  ops/s

sgold · January 9, 2025, 5:37pm

That’s good news for us.

I suspect jolt-jni performance is strongly influenced by JNI call overhead, which is likely the bottleneck on quick functions (like sqrt()).

zzuegg · January 9, 2025, 8:14pm

well, microbenchmarking. Considering the difference in result all we can say that result might differ

sgold · January 10, 2025, 5:32pm

If it were important, we could measure the JNI call overhead on our respective machines.

zzuegg · January 15, 2025, 8:09pm

I got curious what we can expect when valhalla gets released. Run a very quick test on the current valhalla preview jdk.

VectorTest.addClass        thrpt    4   327472761.766 �  19906481.246  ops/s
VectorTest.addClass2       thrpt    4   117189506.826 �   5525130.025  ops/s
VectorTest.addRecord       thrpt    4   325457786.741 �   6277689.559  ops/s
VectorTest.addRecordValue  thrpt    4  2449418532.944 � 112292055.585  ops/s

addClass is basically jme’s addLocal, addRecord a record based addition and RecordValue is the valhalla record value type.

//Added addClass2 which is using Float instead of float, so the autoboxing variant.

sgold · January 21, 2025, 2:45am

Jolt-jni v0.9.5 and KK Physics 0.3.1 were released today.

The motivation for jolt-jni v0.9.5 was to add Android support. Nobody asked for this. I wanted to figure out how to automate building/publishing Android native libraries without relying on Travis CI. I used jolt-jni as my “guinea pig”.

Now that I’ve got a workable automation scheme, I can apply it to Libbulletjme. For 5 months, I’ve been trying and failing to release new versions of Libbulletjme, and that has impacted progress on Minie. For me, this is a big deal!

The motivation for KK Physics v0.3.1 is simply to have a release that works with the latest jolt-jni. It’s still very incomplete and not intended for serious use. I plan to use it primarily for performance testing.

sgold · January 21, 2025, 5:10am

I re-ran these tests with latest Linux Mint (22.1), KK Physics, Minie, and JME (3.8.0-beta2):

(a) Minie single-threaded: 2733 - 2786 boxes
(b) KK Physics with 10 worker threads: 3674 - 3687 boxes (32% - 35% more boxes)

KK Physics with 1 worker thread: 2336 - 2382 boxes

The performance of KK Physics appears to have improved by 5-7%, while that of Minie declined slightly. I suspect the improvement came from changing the compiler options used to build the jolt-jni natives, possibly here.

Again, 33% more boxes doesn’t imply that the physics engine is only 33% more efficient. In the worst case, simulation work increases as the square of the number of bodies.

JavaFor3dGames · January 27, 2025, 7:47pm

Thank you. Android support should be there. You are already working on it. Thanks for that

Pavl_G · January 27, 2025, 11:02pm

I wonder how do you debug measure these performance impacts? Do you have a library that plots the impact along a virtual time for realtime analysis?