Hey monkey,
This is gonna be a long post with nerd talks about sorting algorithms. So if math, graphs and micro optimization bore you to death, I won’t blame you if you just walk away from it.
This is an echo to this post http://hub.jmonkeyengine.org/groups/development-discussion-jme3/forum/topic/very-bad-sorting-algorithm-in-geometrylist-sort/
originally posted by Phate666.
He deleted his original post because he went berserk and left the community after some other story but anyway, here is an abstract of what it was about.
His first claim was that JME merge sort for sorting geometries was terribly slow and that Arrays.sort (java6 and java7) was faster.
He pointed out that java 7 had a sorting algorithm called TimSort that was performing very well and consumed only n/2 temp memory for sorting (n being the number of elements to sort in the array). (our merge sort consumes n temp memory)
His claim was based on the SortUtil.test() method
http://code.google.com/p/jmonkeyengine/source/browse/trunk/engine/src/core/com/jme3/util/SortUtil.java#119
After investigations, I realized he was not completely right (about the merge sort being slow).
- First this test is kind of flawed, because it does not reflect the reality of what we are sorting in JME (small arrays of geometries once for every frame). There is no magical sorting algorithm that performs the best in any situation, it’s very dependent to how the data you feed it with are distributed.
- Second and most, the test has a bug. The merge sort actually partially sorts the array passed as the first parameter. So the “original” array comes out of this sort partially sorted and is fed to the Array.sort, and TimSort is very good at sorting partially ordered array (ascending or descending).
Timsort though has performance just a little better than merge sort with randomly distributed arrays.
So basically his first numbers posted (now deleted unfortunately) were showing that Timsort was like 15 times faster than merge sort, but that’s not the case in real life.
But…he was not completely wrong neither, because Timsort IS indeed faster in many situations.
So I went and grabbed the Java7 Timsort and made some tests. I tried to find a proper test that would represent well the general JME case, and tested all our sorting algorithms against several types of arrays.
Gnome sort is terrible against reverted arrays so i ruled it out.
Quick sort is not that quick and not stable, so i rule it out too.
TimSort was always one head ahead of Merge sort. and blew it off with already ordered arrays, ascending or descending.
I ended up using a real scene and sorting real geometries instead of benching on arrays of floats.
For a matter of copyright, I couldn’t keep Java 7 Timsort as is, in JME source, so I had to re-implement it from the original paper by Tim Peters (list sort for Python) that can be found here :
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
and here is the C code of Python’s listsort
http://svn.python.org/projects/python/trunk/Objects/listobject.c
Also i made sure the temp arrays needed for the merge were not allocated on each sort, but kept with the geometry list ( as we do right now for the merge sort).
What is TimSort?
I won’t get into the details because, I would bore the bravest among you, but in short, TimSort is a sort that takes advantage of natural order of things.
The assumption is that there are parts ,of an array to sort, that are already ordered. Those parts, called “runs”, can be merged together (merge sort style).
But the subtlety of it is that TimSort only consider runs of a certain size. If a run length is below a certain threshold, the remaining elements to the threshold are sorted with a binary insertion sort (
http://en.wikipedia.org/wiki/Insertion_sort A sort that almost can’t be outperformed on small arrays sorting).
In original C implementation the Threshold is hard coded to 64 elements. in Java 7 implementation it’s called MIN_MERGE and hard coded to 32 (read Timsort javadoc to understand why Josh Blosh chose 32).
I called it MIN_SIZE because I found MIN_MERGE was somehow misleading into being the minimum number of merge, which is not.
I used 32 as it was the value chosen for Java 7.
I used a scene with 200 objects with 3 different materials. all objects were moving around a bit.
Here are the results :
here is the google spread sheets
https://docs.google.com/spreadsheet/ccc?key=0AsE_ydpoOkOhdFM1eW5aQnUxLVFMdlFaUVhycXZ5Y1E#gid=6
So …except that java 7 beats the hell out off java 6, this graphs shows that TimSort is faster than merge sort for this kind of scene : around 15% in java 6 and 25% in java 7.
And all of this with half temp memory consumption.
I was thrilled and I thought testing it on android would confirm this. I was wrong.
For android I used a similar scene but i dropped the number of geometries to 100 to have a descent frame rate (22 to 23) so the test does not last too long.
Here are the results. I also tested with dalvik’s VM Arrays.sort that is also a TimSort.
here is the spread sheet
https://docs.google.com/spreadsheet/ccc?key=0AsE_ydpoOkOhdFM1eW5aQnUxLVFMdlFaUVhycXZ5Y1E#gid=7
Even if i was happy to beat android’s Arrays.sort, I was very disappointed that current merge sort was faster.
So I went and read the paper again.
I found out that the MIN_SIZE parameter could have strong influence on the sort performance. so I decided to try with different values : 32 (java7 timsort), 64 (original c implementation), 128, (said to work in the paper), 256 (said to be slower in the paper).
Here I used a scene with 1000 objects on desktop :
and the spread sheet
https://docs.google.com/spreadsheet/ccc?key=0AsE_ydpoOkOhdFM1eW5aQnUxLVFMdlFaUVhycXZ5Y1E#gid=5
As you can see, the sort are very close to each others, but 256 is definitely the slowest. 64 being the best, 128 being just between 64 and 32.
I then made this test on android too using the same 100 objects scene as before.
the spread sheet
https://docs.google.com/spreadsheet/ccc?key=0AsE_ydpoOkOhdFM1eW5aQnUxLVFMdlFaUVhycXZ5Y1E#gid=3
So as you can see,Dalvik’s Arrays.sort is too slow. Timsort 32 and 64 are slower than merge sort.TimSort 128 is par to merge sort (very very very slightly better when you look at the numbers). TimSort 256 is slightly better.
This means basically that on android Binary insertion sort performs better than other sorts.
Setting the MinMerge to 128 or higher means that a 100 elements arrays will never have a merge performed on it… Only binary insertion sort is used. And the more naturally order is the array the faster the sort will be.
From that graph and previous results I concluded that 128 was the best value to use to fit both desktop and android cases. 64 is too slow on android and 256 is too slow on desktop.
This way we’d have a faster sorting for desktop and about the same speed sorting on android with half memory consumption.
I performed a last test on desktop, with a scene with 200 objects.
spread sheet
https://docs.google.com/spreadsheet/ccc?key=0AsE_ydpoOkOhdFM1eW5aQnUxLVFMdlFaUVhycXZ5Y1E#gid=1
So as you can see with this kind of scene, timsort 128 is a bit slower with java 6 than timsort 32 and on par on java 7.
Anyway, don’t get too wet about all this. With all the tests I did none of the algorithm made the frame rate change. That means that the sort operation is very unlikely to be the bottleneck in a scene rendering process.
When you think of it a 0.5 second gain over 100000 frame at 60 frame seconds is a 0.5 second gain over about 28 minutes…
This is micro micro micro optimization.
But still…faster is faster.
So guys…what do you say? do I put that in?