SceneProcessorCopyToImageView Performance problem in JME inside JFX ImageView solution hogs CPU, framerate unstable and stutter!

I have been using DavidB’s (@david_bernard_31) solution for having a Offscreen JME application displayed in an ImageView in a normal JFX Application. It has been working great but when we put it on a severely slower Client PC we noticed its lagging horribly. On inspection it seems the main culprit is the method copyFrameBufferToImage. Got ~400fps when the app was still JME with JFX HUDs, and now that its the other way around its about 60fps, on the strong development machine. The main problem is that the process is slowing down JavaFX too, I dont care about the JME frames too much as long as I have about 10 frames I am good. But JFX culling sometimes doesnt happen for a few seconds. I believe the culprit to be in all the ColorBuffers, I am using BGRA8 but I think there still is a conversion going on causing the slowdown. And yes I am using the method readFrameBufferWithFormat. But the PixelWriter creates a converter that call getSetter() which gets a ByteBgraPre Format. Know I have little knowledge of ColorBuffers and 3D in general so I am unaware of what the difference really is.

I have been trying for multiple days now to solve it. JProfiler just shows that JFX and JME3 Thread are blocking each other. I tried different JME versions (3.1, 3.1.alpha1, 3.1.beta1) and lwjgl versions 2.9.3 and 2.9.1. Even different JREs (8u66, 8u77, 8u111)

I am at my wits end and any hint or idea would be appreciated.

Edit: There seem to be spikes in CPU Usage but I couldnt find out why. Also even on the developer PC when not putting a framelimit on, the JFX UI is very unresponsive. I get about 60-90 frames but only very few in JFX making it useless.

Edit1.5: Just installed Afterburner and it shows that my GPU usage is jumping from 0-100 with some in between steps. CPU is not bottlenecking. if it where constantly using my gpu i could imagine double the fps atleast.

Edit 2: The main 2 classes I am talking about:

A lot of work in this case is probably transferring data through the PCI bus. As the image is rendered on the GPU it has to be transferred to the CPU memory for each frame… So memory and PCI bus speed might be the biggest bottleneck. Maybe try lowering the rendering resolution?

Doesnt help I am afraid. It truly is a CPU bottleneck, altough now I am not so sure what causes it anymore. The copyFrameBuffer method isnt the main culprit anymore now, after I removed the sycnhornized blocks. Risky and probably stupid but desperate situations calls for desperate means. Still CPU is still high, only now LWJGL/JME uses it all.

Hm, if it uses synchronized its probably not a great design anyway. Would probably be better if it used a queue for the images and then dropped images if/when the queue gets too full… That way the threads wouldn’t have to block each other either.

The synchronized block

is required to avoid read or write on partial, using a queue will have the same constraint except if you create image buffer for each entry => lot of memory alloc/desalloc as we’ll not reuse existing.

Having the sycnhronized only on the block that should run in FX seems to have alleviated the problem a little bit for me. Framerate increased a bit and its also the stablest its been so far. I dont know why or anything but for now its some improvement until I can find the real problem.

Can you share a sample project that reproduce the issue, then we could experiment/work on it ?

Did you remove the limitiation to 30fps/60fps on the jme app ? (a bad idea because it will generate lot of useless copyBuffer,… and will overload the GPU->CPU)

You can always use a ring buffer to avoid the alloc/dealloc issue. The Problem with synchonized in this case is a) synchronized isn’t very optimized and puts a lot of strain on the JVM, its basically the sledgehammer of thread safety. Thats why Sun added the concurrency classes. And b) it causes the threads to block each other which isn’t necessary.

Sorry I have been away this past few days. Anyways I just read up a bit on RingBuffers but I dont quite know how to implement it in this example. Do i make a queue of whole images or do I make a queue of pixels that is total pixels * 4 (rgba) long? If its not too much of a bother can you show me an example?

The test application (TestDisplayInImageView) that is in the repo actually works as an example. If I crank it up to Fullscreen on my 1080p monitor and start it with jProfiler (which slows it down) it will reduce framerate to about ~40ish and FX gets slow too. I increased camera movement speed to 50 which makes it all the more obvious. And in JProfiler you can see how JME and FX Thread block each other all the time. The setPixels() method is using about 45% and slowly rising of the CPU according to JProfiler.

It seems to me that any kind of JavaFX Thread activity will throw it off completely. I was panning the camera around like crazy when I opened the Background combobox in the test application and it just completely froze the application (was running fairly smoothly until then) and it never quite recovered and kept being laggy from then on for a few minutes until it recovered again.