MD5Importer Performance tweaks

Hey guys, Im working on an RTS style game so I was hoping to be able to have 100+ units on map without any issues but using the md5importer (concurrent version from svn) im barely able to get 50 on there.



Currently I have my code setup so the model and its animations are loaded once into a prototyper, from there I create copys as I need them. I managed to get a bit of a performance increase by sharing the animations, of course this means everyone walks in sync but I might be able to set up some kind of small pool of active animations and mix it up a little without wasting too many cycles.



Looking at the profiler in netbeans there is an awful lot of time spent in the animating thread (the one calling update on the animations) though netbeans wont show classes from libraries so I cant see whats going on 'under the hood'. It seems to be working way too hard for 1 animation, barely maintaining 15-20 fps (game loop is still up around 60 so dual cores must be working hard).



Is the logic right that I need to call swapBuffers for every frame of my renderer on every object using the shared animations? It seems that if I dont the animating thread gets stuck waiting for a semaphore?



Are there any other tweaks people can suggest? Or are we going to have to drop md5/skeletal animation?

It looks like even though the animation is shared between all the nodes the meshes arnt, so there is a small benefit in calculating the bone positions but still a huge bottleneck with the mesh data.



Perhaps someone knows if its possible to share the raw TriMesh between nodes? Looking at Node multi-parenting wont work - well short of creating a mesh that dosent have a parent?


Have you tried to use SharedMesh? Multi-parenting is possible, I implemented something that requires this property but it is a bit complicated  :frowning:

I don't know whats going under the hood, but I read that if you export the animation to binary format then you should get some performance boost, but that probably mainly applies to the load times.

Yeah md5 text format is ugly but I can live with that - If there are speed bonuses at runtime though Ill definitly have to check it out. Ive merged the md5importer source into my tree so I can do some more in-depth profiling. Looks like a good chunk of the time is being spent updating in processVertex and processNormal.



Going to spend some more time tweaking…

Thanks Ill give it a look tomorrow (its after midnight here, and ive got work early). I realise that this is going to get messy but I dont see any way to squeeze any significate speed boosts out of it (short of some jni bindings and some raw C code perhaps).



Anyone had any experience trying to draw more then 100 animated models on the screen with jMonkey? Is it going to be far less of a headache to leave md5 well enough alone and use simple vertex animation?

No JNI won't help according to me.

Its all about how optimized the skinning code is. You want to avoid having method calls in your skinning code, reduce everything to basic math operations and that will hopefully get you the best speed. Another thing that can help is hardware skinning, though jME doesn't support that well due to its lack of mainstream shader support.



In my Ogre3D importer, I was able to get 1,000 animated models to render at 30 fps. I used some LOD techniques though, the Ogre3D mesh format provides you with indexed LOD and I also included animation "frameskip" LOD, which would skip frames for animated models far away from the camera.

CraftyHermit said:

Thanks Ill give it a look tomorrow (its after midnight here, and ive got work early). I realise that this is going to get messy but I dont see any way to squeeze any significate speed boosts out of it (short of some jni bindings and some raw C code perhaps).

Anyone had any experience trying to draw more then 100 animated models on the screen with jMonkey? Is it going to be far less of a headache to leave md5 well enough alone and use simple vertex animation?


i found that the microsoft .x file animation also have this problem.but i think the way to solve it is the LOD or BSP merchanism.