~4x Faster Quaternion, Vector multiplication

@RatKod line 91. change it to nanoTime as well

^ ^ so dumb sorry
new run coming and itā€™s looks better :wink:

ok new run result :

run:
=== Test ===
items : 100000
test  : 1000
reps  : 100 * 100000000
CSV output style :
rep, mult15 nanoS, mult60 nanoS
1, 724612068, 1555013378
2, 710515371, 1547878487
3, 711448898, 1555385242
4, 708061526, 1552634435
5, 725157728, 1569078242
6, 718875124, 1613458912
7, 714806605, 1551767006
8, 715319710, 1556905116
9, 712479363, 1556681051
10, 713129049, 1558393127
11, 710778299, 1559117659
12, 715254067, 1560933029
13, 711952796, 1561666063
14, 709205810, 1553457310
15, 735802876, 1596130721
16, 710239324, 1553952062
17, 708019777, 1550200630
18, 712203019, 1551217828
19, 716553787, 1570533106
20, 723695723, 1561282150
21, 706624223, 1547947514
22, 708147581, 1550213891
23, 709215435, 1546528313
24, 707530058, 1550292111
25, 712918278, 1594512662
26, 756190764, 1629085940
27, 734426152, 1618565137
28, 740558702, 1607697062
29, 714468976, 1572796974
30, 711820317, 1556108603
31, 720027029, 1577202782
32, 717625584, 1559548689
33, 715686346, 1561483816
34, 717599133, 1562630506
35, 712350071, 1559539992
36, 712989054, 1561252100
37, 739008641, 1577544069
38, 713394570, 1565814183
39, 716312404, 1578842599
40, 732173396, 1564449313
41, 720617708, 1560537534
42, 709401998, 1554668742
43, 707651092, 1548269719
44, 708375637, 1547603727
45, 706951666, 1549352610
46, 709957008, 1549799054
47, 709369523, 1548072234
48, 716349088, 1554100034
49, 706614030, 1565856315
50, 714492884, 1558841623
51, 710150951, 1549568279
52, 709101121, 1546217569
53, 705999652, 1547716558
54, 709682281, 1548642821
55, 709130724, 1555519903
56, 715600902, 1573633620
57, 718943836, 1566159184
58, 724058530, 1561047291
59, 727686041, 1581849352
60, 721830146, 1563853408
61, 739580770, 1585389674
62, 711904114, 1552641625
63, 711051657, 1554466473
64, 707959976, 1548032034
65, 709652500, 1547140629
66, 707305095, 1549540929
67, 708055159, 1546039200
68, 709480397, 1546213182
69, 708351257, 1548147493
70, 708538787, 1547079479
71, 710245914, 1547847603
72, 709265319, 1550824578
73, 707270533, 1547591503
74, 714855755, 1548550681
75, 708931959, 1567324887
76, 716817469, 1547081838
77, 708353547, 1547563410
78, 712712444, 1584462172
79, 765241087, 1554846318
80, 725005467, 1604357749
81, 743530381, 1571592219
82, 709220812, 1562932882
83, 714530586, 1561175390
84, 737761913, 1600846530
85, 728907030, 1585862834
86, 726554519, 1602484587
87, 734713944, 1559811383
88, 718891826, 1565425997
89, 708251237, 1558680661
90, 714781572, 1572505234
91, 716717564, 1562609618
92, 709638051, 1555472898
93, 718050102, 1568485069
94, 712343849, 1552767890
95, 712398907, 1564784559
96, 725041901, 1608108233
97, 714240450, 1563428771
98, 718952495, 1564417840
99, 724354805, 1573377373
100, 735245137, 1555706153
BUILD SUCCESSFUL (total time: 3 minutes 48 seconds)

and a nice chart :

1 Like

To clarify the following two methods would be affected by the change and I copied the code for each one for convenience. I donā€™t have a lot of git experience. Iā€™d like one of you, who do, to apply the changes. Thanks.

Quaternion.mult(Vector3f v, Vector3f store);

float vx = y*v.z-z*v.y;
float vy = z*v.x-x*v.z;
float vz = x*v.y-y*v.x;
vx += vx; vy += vy; vz += vz;
store.x = v.x + w*vx + y*vz-z*vy;
store.y = v.y + w*vy + z*vx-x*vz;
store.z = v.z + w*vz + x*vy-y*vx;

Quaternion.multLocal(Vector3f v);

float vx = y*v.z-z*v.y;
float vy = z*v.x-x*v.z;
float vz = x*v.y-y*v.x;
vx += vx; vy += vy; vz += vz;
v.x += w*vx + y*vz-z*vy;
v.y += w*vy + z*vx-x*vz;
v.z += w*vz + x*vy-y*vx;

This is largely irrelevant now but Iā€™d caution one not to treat this as a hard and fast rule as itā€™s very case dependent. Multiplication can lead to more errors because it will multiply whatever error you already haveā€¦ but still one multiply is likely to introduce less error than two or three adds of the same value. ie: 5 * x is much better than x + x + x + x + x. This is why in loops itā€™s better to multiply by the loop iterator than to add a delta each time.

for( int i = 0; i < 100; i++ ) {
float x = i * 0.1f;
}

Versus:
x = 0;
for( int i = 0; i < 100; i++ ) {
x += 0.1f;
}

In the first case you will end with 100 and in the second case Iā€™d be very much surprised if you did. This can happen even for relatively few iterations. And while if you think about it makes a lot of sense, the second one is adding 100 times and the first is really only multiplying once per resultā€¦ itā€™s not immediately intuitive if one is already thinking ā€œmultiply is less accurateā€.

Furthermore, adding of numbers with large separation of ā€˜sizeā€™ can introduce HUGE errors. Add 100000000f + 0.01f and the 0.01f might completely disappear. Adding only keeps accuracy if the precisions of the added numbers are close to the same. Otherwise, one gets truncated.

http://i.imgur.com/IW8simF.gif

1 Like

@pspeed yes that is true, that one mult is more precise then many additions.
thou, generally single mult introduces less error then single add.
eg: x^2 - y^2 can be evaluated more precisely like (x - y)(x + y). In the first case there are 2 mult 1 add/sub, second case 1 mult, 2add/sub

1 Like

Since 3.1 is unstable, I would say, lets just try if it works good for everyone.
If not posts will pop up pretty quickly i guess XD

Itā€™s not my habit to revive old threads. It has been around three quarters of a year.

I have also made a github pull request Fast, precise mult with Vector3f by TehLeo Ā· Pull Request #372 Ā· jMonkeyEngine/jmonkeyengine Ā· GitHub along with requested tests. It got somehow forgotten, so I am bringing your attention to it.

3 Likes

Iā€™ll check it out this evening with some of my own tests and then apply it if itā€™s good.

Your performance benchmarks doesnā€™t correct. You need use only OpenJDK: jmh for test.

Why are they wrong? I get that using some kind of helper might improve this, but what is the actual flaw in the tests performed?

Your tests do not take into account many factors in the work of the JVM.

You are relating to these Fast, precise mult with Vector3f by TehLeo Ā· Pull Request #372 Ā· jMonkeyEngine/jmonkeyengine Ā· GitHub test here?
I kind of agree that the test methodic is flawed anyway.

What I would prefer to see is the difference not in a micro benchmark, but in something like the particle effect test, or a smaller jme game like the stroid panic one. Cause there are multiple optimizations possible by the jvm that are good in microbenchmarks, and bad in actual uses.

Note: to those wondering ā€œwhy hasnā€™t this been applied yet? Didnā€™t Paul say heā€™d do something?ā€

I have done some more testing and the conclusions left me on the fence about whether to apply this patch or not. I have a bunch of comments added to the issue:
https://github.com/jMonkeyEngine/jmonkeyengine/pull/372

1 Like

Is it this hard to contribute to this project ?? the_leo has done so much more than Iā€™d have done, I am much less motivated.
Come on guys, letā€™s show some flexibility here.

This is like the corest core code there is. Would you rather we accept every patch without examination?

I wrote up a very detailed case of why this is not so easy to apply. It may be that the rate of error of this new math is just not worth the extremely tiny gain it would mean to a real application.

We have to scrutinize patches to core. After all, we blithely accepted an optimization to collision detection recently that ultimately broke at least half the applications out there and had to be reverted.

Note: if you want to run a version of jme where ā€œevery patch goes inā€ you are welcome to make a forkā€¦ but you might call it jMonkeyPoo instead as jMonkeyEngine is already taken.

3 Likes

As well have the core devs. Read the comments in the link above. Of course can they not just replace something, that needs investigation especially that deep in the core.
Maybe the core devs could provide it beside the existing one instead replacement, but I do not understand much from this quaternion stuff

nVidia SDK quaternion vector multiplication:

Vector3D<T> Quaternion<T>::operator * (Vector3D<T> V)
{
	Vector3D<T> uv, uuv;
	uv = Axis.Cross(V);
	uuv = Axis.Cross(uv);
	uv *= (2.0f * Angle);
	uuv *= 2.0f;
	return V + uv + uuv;
}

Which is the same thing once you expand the cross products.

Something that can sounds dumb but ā€¦ it would be interesting to have some people testing that on a very complex and rich game (on a test version, of course). But if it works, well ā€¦ it will be nice.

(but i agree with mythruna : we must be very very carefull on that, as it is the core of the core of the engine)

Great discussion! Canā€™t wait to see what you come up with. Iā€™ve been using JMEs Quaternion class in my programs so much (even the ones that are just math-related, without using much of other JME stuff), that this comes up of a great relevance to me.

And yes, in any self-respecting project, if you hear something about ā€œgetting in the coreā€, then be ready for a long and elaborate research for the proof of validity. Despite what things may seem, only science will work here. Too much is at stake.