Perfomance with a large number of object in the scene

Thanks for doing this work. We’ve profiled the heck out of jME in the past, but with recent “upgrades” in lwjgl, things like this were bound to come up.



We do know that jME uses more memory then we’d like, but I think in terms of performance, using a cache of getCaps (or simply our own flags, set initially by combinations of the getCaps results) would be a smart idea if lwjgl has changed things so much as to make it a performance drag.



It looks like the majority of the drain due to that method comes from our own calling of it. Calling a method so often when the results are always the same is a bad idea of course… But, as you probably know, getCapabilities was not always there and we were originally referencing a static boolean flag in an LWJGL class. During conversion to a newer lwjgl release, we… (I should say “I” since I did most of the conversions) were not careful when replacing with the new syntax.



Anyhow, I will correct this mistake sometime this week. Thanks for calling attention to the issue. Give us an update if/when you continue your performance testing.

Hm… well I got LWJGL to compile (upgraded to .97.1, I had no problems with it but I hardly ran any tests). I changed it to avoid the downcast+ThreadLocal stuff, and my framerate shot up some. Using 25x25 in the example, it went from about 43-46 to 48-52!



I seem to have some freak problem with profiling now though. It doesn’t seem to be jME related but if anyone knows what could cause this:



edit: fixed by excluding AWT from profiling (seemed to work without that first though).

getCapabilities should not be called all the time, nor should makeCurrent. That should keep performance fine, in single threaded mode.


yes, I'm a bloody optimist.


Your almost sounding british there renanse... :P

as far as I am concerned, the getCapabilities are called as part of the internal sanity checks. This could easily be removed. I will look into providing a debug and a release build (if feasible)

"DarkProphet" wrote:
Your almost sounding british there renanse... :P

Uh oh, I'll go get the soap... ;)

Note that the native files are being generated too, so to get anything useful out of it, you’ll need to recompile the native libs also.


  • elias

Note2: I did a quick test with the flag commented out and it seemed to compile fine :slight_smile:




  • elias
"elias" wrote:
I guess you can't please everyone. I did the original thread local Capabilities to make LWJGL conformant to the (braindead win32) spec which states that the function pointers can vary per context, hence the need to look up pointers all the time. Moreover, as someone noted, multiple threads are handled through ThreadLocals. So this is not simply debug help as Matzon thought.

Luckily, I did anticipate this kind of performance issues so I added support for the old (non-conformant and non-threadsafe) behaviour in LWJGL. I don't have time to test it out, but try commenting out the '-Acontextspecific' flag in build.xml and do an 'ant generate-all' to re-generate the sources. You can read about how the generator works in docs/generator.txt, btw (note that the sources are not automatically rebuild from the templates, since we want to compile on java 1.4 platforms - mainly Mac OS X).

- elias

Well looks like our answer came while I was typing my post. jME uses a single thread / single context so it could use this build (if it works) by default, no?

It would probably work fine.


  • elias

Small followup on this. Current behaviour also messes up multithreading. Or specifically creating Geometry in a seperate thread. I don't know if this was supposed to work in the first place. A made a topic on this a long while back unfortuntly with no replies:



(edit: new forum link) http://www.jmonkeyengine.com/jmeforum/index.php?topic=1265.0



All I know what it did work without problems. Now I needed my patches since getCapabilities() is called when for example creating a texturestate.

Can'there be single-threaded versions of lwjgl packed with jME? Or, could anybody who compiled the current version of lwjgl with the described option, upload it somewhere?

I'd rather see my two line patch in LWJGL. Apperently they already use flags so that shouldn't be too big a shock for them. Maybe I should take it to their forums some time before 0.9…

Hi,



I think I got a solution that should both be fast and still keep the correct multithread and multicontext behaviour. I've comitted the fix to CVS, so you can compile from that, or alternatively get an (unofficial) updated lwjgl.jar from here:



http://odense.kollegienet.dk/~naur/lwjgl.jar



If it works out, we'll release 0.98 with the fix in it.


  • elias



    For the interested, the fast path consists of:



    public static ContextCapabilities getCapabilities() {
        CapabilitiesCacheEntry recent_cache_entry = fast_path_cache;
        // Check owner of cache entry
        if (recent_cache_entry.owner == Thread.currentThread()) {
            /* The owner ship test succeeded, so the cache must contain the current ContextCapabilities instance
             * assert recent_cache_entry.capabilities == getThreadLocalCapabilities();
             */
            return recent_cache_entry.capabilities;
        } else // Some other thread has written to the cache since, and we fall back to the slower path
            return getThreadLocalCapabilities();
     }


Didn't show up in the CVS browser on SF yet, but it looks like it should work just fine for jME.

Well ok, I was a bit quick. That will work if you use jME from a single thread, solving the performance problem introduced in this thread.



However some on the jME forum (myself included, using my patch) use (or some have tried to anyway) a single jME/LWJGL context with multiple threads. This can probably also be solved by passing the ContextCapabilities from the first thread to any newly created threads, and then calling setCapabilities(). The performance problem will creep in a little again at this point (though not as bad as before), even though there's still only a single context used. And as far as I can see you can't remove one right now, making it a small memory leak if you create and destroy a lot of threads.



A flag for a single context (rather than for a single thread as I currently named it) would still make things a little easier, though this change does help for the original problem in any case (performance).

I don't understand why multiple threads shouldn't work as it is now. Does makeCurrent() from the other thread not work?


  • elias

Then I can't use the other Thread anymore, as I understand it. In fact I have to releaseCurrentContext(). This can be usefull stuff, but not for simultanious thread acces.



Most people want to do this for creating Geometry in a seperate thread. About the only LWJGL thing used here is ContextCapabilities. And there are plenty of workarounds. Like just creating the buffers/textures/etc in the seperate thread and all jME/LWJGL related in the main thread or after grabbing the current context (kind of tricky when you're using jME's or someone else's predefined objects). Or setting the ContextCapabilities manually as I suggested (maybe grabbing the current context and releasing it afterwards will clean the thread up).



So I'm more than happy to have seen just this fix go in. The rest is just food for thought. I know LWJGL's strict policy of not including any unneeded code :slight_smile:



edit:

Well I've actually gone ahead and implemented setting the ContexCapabilities manually. setCapabilities isn't public so I created a class in the org.lwjgl.opengl package to acces it. It looks hacky, but it works. Since ThreadLocal uses weak references, there's no memory leak either.

Well, OpenGL does not generally work from multiple threads if the context is not current. So to stay portable you need to release the context and make it current in the other thread. There's no way around it.


  • elias

It's not synchronized. As long as you're not using the same memory locations from multiple threads it should work fine. Which is what you're doing when you create geometry. LWJGL is mostly used in creating the renderstates for Geometry, and in this process getContextCapabilities is used both directly from jME and from LWJGL itself (in methods called from jME, like glGetFloat).



As I said, my "hack" works. The fast path can only work for one thread at a time, but in my case (lot's of calls from the main thread, very few from the others) I try and make sure the main thread has the fast path by calling setCapabilities at the start of each frame. So it works, and with decent preformance. It doesn't look pretty though.