Thread efficiency vs time per frame deficiency - opinion welcomed :)

amirkr · November 30, 2013, 11:22pm

I see.
We do create “safeguards” from time to time and I described the skeleton of what happens. For example, not all the images are loaded from other players, some are generated automaically, etc. At the end, all I have is a situation where a user has:
A) a folder on his/her computer with images (of various sizes).
B) A need to “load” these images into the scene.
It seems weird the loading part disturbs the GC so much. After all, if I can run “heavy” games from 1998 on this computer, games which are now very obsolete but still use a lot of data and the GC abilities (sometimes! not always, I know), then it seems strange a simple loading of images is so difficult.
(I know this is a “weak” argument, since many games use sophisticated ways to optimize performance, not all games are the same, etc - but still, a random “heavy” game from 1998 must be heavier on graphics than my miser texture loading example. I just load textures ).
I stripped it to the very core - start a thread. Attach image. Do it again. Nothing else - and still the problem persists. Sometimes, after a cycle of loading, I get a Fps boost - probably since the statistics shows I do not create any other Object in memory, which means (to my understanding) that the textures are in memory and are therefore loaded much quicker. But sometimes this does not happen.

By the waym you said “thumbnail textures” - this is only possible if I create the “thimbnails” outside of jmonkey, then load that smaller image as a exture, right? There is no way to reduce the quality of the image (say, make it smaller and compress it) via jmonkey (or am I wrong?). This is a plausible solution, it would require adding more work “offline” (off the game) but it’s doable.

pspeed · December 1, 2013, 12:03am

You could do it in code, too. JME doesn’t do it but you can do it. java.awt.image has all kinds of support for that. JME has some support for converting java Images to JME images.

re: GC, your test created hundreds and hundreds of new images. Obviously GC is going to be an issue. Even worse, you allocated native memory and OpenGL objects for every one of those images. As rapidly as you were creating stuff, you are probably even blowing past incremental GC and causing full GC to happen (can pause the whole JVM for upwards of 2 seconds).

And you mentioned a 1920x1200 image… dude, that’s 9216000 bytes just for that image. And then only if you aren’t generating mipmaps. The old games you talked about never dreamed of loading even one image so large.

amirkr · December 1, 2013, 12:30am

Several things:

Thanks. It is indeed possible to use jave’s BufferedImage for those purposes. It might help!
When I wrote GC I thought we were talking about “Graphics Card”, I guess that’s just me right? (Because I don’t see what “full GC” means in that context).
What is the “native memory”?
In my example I created “hundreds and hunderds” of images, true. But the speed “bump” was right from the start and was constant throughout the entire process, regardless of how many I have already created, attached or detached. This is important since in my “real” game everything started to be slow right from the start even when I only loaded 15 geometries with images, even now, after I made the “queue” modification.
About the 1920x1200… yeah, I dream big I’ll give myself that but maybe it needs to be scaled down. I can’t generate mipmaps since I never use the same texture twice.

So to sum things up I think you suggest me the following:
A. Scale down the images.
B. Try to create them not “all at once” in different threads but on some timely manner in which the load is reduced.

Anything else that comes to mind before I drench myself in the “Real” code again?

And THANKS!

pspeed · December 1, 2013, 1:18am

GC = garbage collection. Welcome to Java. Meet your friend the Garbage Collector. He’s nice but don’t turn around… he’s not to be trusted.

toolforger · December 1, 2013, 9:19am

“Native memory” is memory outside the JVM. All data that goes to OpenGL needs to go through native memory.
Garbage collection for native memory has issues - not for the timing, but you may run out of memory due to fragmentation.

I’m not sure that the image sizes are a problem. PCIe can transfer hundreds of megabytes per second at the minimum; it’s not that easy to saturate that. Unless you’re sending tens of 10-MB images, I’d say that that’s not the issue (but you can benchmark a collection of large and a collection of small images).

Scaling down means you’re going to do very CPU-heavy work. If the bottleneck is on the CPU-to-GPU bus, then it could help; if the bottleneck is on the CPU, it will make things worse. (GC of Java memory is a CPU-bound activity.)

zarch · December 2, 2013, 9:56pm

@pspeed said: GC = garbage collection. Welcome to Java. Meet your friend the Garbage Collector. He's nice but don't turn around... he's not to be trusted. ;)

I wouldn’t say you can’t trust him. More that he tends to turn up with his truck and make as much noise as possible emptying the bins on the one morning you have off and are trying to have a lie-in.

amirkr · December 2, 2013, 10:00pm

Thanks all. I’m taking everything you said into account and trying different solutions.
It’s a tricky business I’ll update if I will find the exact cause for the low Fps (was it really the image size? etc).

Amir.

amirkr · December 5, 2013, 4:14am

News and further details:

If what I’m about to say sounds awefully trivial, please excuse me, but I found it interesting.
I tweaked and played with my “real” program for a while.
As a short remainder - I was loading images as textures into quad geometries in different threads than the render thread, then placed those in a ConcurrentLinkedQueue which the main thread checked with every iteration of the update loop. If a new geometry was “ready”, the updateLoop attached it to rootNode (actually, it attached a node which the geometry is attached-to, to the rootNode).

Here are the new findings:

If I say “the hell with concurrency” and load all the images sequentially through a method called by the update loop (say, the update loop calls a method which generates 10 “images”), using a Vector or something to place the results when they are done, it works at the exact same speed as if I was using threads.
This leads me to think that somehow, somewhere, the “concurrency” is broken and the update loop somehow “does it all”. This is very confusing because, as you can see from the code example I gave (which is a simplified version of my code where I just create new geometries all the time), the update loop only attaches to rootNode the finalized, ready and done geometries that were created on an entirely different thread without the usage of anything from the render thread (as far as I can tell).
There were discussions regarding the size of the image I’m trying to place as a texture -
If I decide to undergo all creation steps of the geometry in a different thread (create a quad, set material, load the texture…) even adjust it’s positions etc, but do not attach it to the rootNode, everything is smooth like freshly fallen snow (on a very cold day… not that icey snow). It’s the attaching that completely kills the updateLoop’s Fps. Why? I’m baffled now.

As you can see I have not given up yet continuing to work and update on the issue, if you have any thouhgts, as usual, I’d be very grateful (maybe it’s that “enqueue” thing which I don’t fully understand? etc).

Thanks,

Amir.

pspeed · December 5, 2013, 4:38am

Regarding the second point… I’m not sure how many different ways to say that time is taken up transfering the image to the GPU (and possibly generating mipmaps since you haven’t really said if you are specifically turning that off or not).

I feel like I’m saying the same things over and over and yet the same things are still baffling to you.

@amirkr said: 2. There were discussions regarding the size of the image I'm trying to place as a texture - If I decide to undergo all creation steps of the geometry in a different thread (create a quad, set material, load the texture...) even adjust it's positions etc, but do not attach it to the rootNode, everything is smooth like freshly fallen snow (on a very cold day... not that icey snow). It's the *attaching* that completely kills the updateLoop's Fps. Why? I'm baffled now.

Because a lot of time is taken transferring the data to the GPU. The data transfer to the GPU takes a long time. The update loop blocks because the data is being transferred to the GPU. All of the time taken on the background thread is nice but the data still has to be transferred from Java memory to GPU memory… which takes time and blocks the update thread.

So hopefully it is no surprise that you still get frame hiccups on the GPU thread since the GPU is busy accepting your data.

P.S.: and again, try NOT generating mipmaps for the textures. Mipmaps take a lot of extra time and space and you may not need them. So, mipmaps are bad. Turn mipmaps off. Load the textures without generating mipmaps. Do not generate mipmaps and see if anything is better.

Oh, and don’t generate mipmaps.

pspeed · December 5, 2013, 4:44am

Actually, while we’re on the subject… the really strange thing is why the asset manager is even loading the image more than once to begin with. They are supposed to be cached.

Maybe post your latest code.

toolforger · December 5, 2013, 7:15am

@pspeed I heard you loud and clear, it’s just unclear whether the to-GPU transfer times are really the culprit.

@amirkr The “enqueue thing” is that you’re preparing a piece of code that’s supposed to be run in the graphics thread (a.k.a. update loop a.k.a. OpenGL thread).
You prepare data in your worker thread, but you can’t insert it into the scenegraph from there if you don’t like race conditions, so you wrap up the data and an anonymous inner class with a function to insert that data, and send that wrapped task to the graphics thread. It’s essentially the same as Swing’s invokeLater.

pspeed · December 5, 2013, 7:17am

@toolforger said: @pspeed I heard you loud and clear, it's just unclear whether the to-GPU transfer times are really the culprit.
@amirkr The “enqueue thing” is that you’re preparing a piece of code that’s supposed to be run in the graphics thread (a.k.a. update loop a.k.a. OpenGL thread).
You prepare data in your worker thread, but you can’t insert it into the scenegraph from there if you don’t like race conditions, so you wrap up the data and an anonymous inner class with a function to insert that data, and send that wrapped task to the graphics thread. It’s essentially the same as Swing’s invokeLater.

He’s already doing this… just not using JME’s queue because it’s better in this case to perform only one operation per frame. From experience.

amirkr · December 5, 2013, 2:31pm

Thank you both:

toolforger - that helped understand the tutorial
pspeed - I did read your comments carefully: I thought the “heavy GPU” task, or the “heavy GPU transfer” occurs when I’m taking an image file from my computer and tell jmonkey - “see that? make that a geometry!”. I was surprised to see that this part was very very fast, but just calling “rootNode.atachChild(myGeometry)” made the render thread to freeze. I see now you mean the transfer to the GPU happens only when I want to display the geometry?
As for caching - you are very right. In the “test code” example, when I load the same image over and over again, it is cached and quick after a few times. But in the real code I’m attempting to load ~20 geometries from completely new images every time a user moves to a new “room”, so nothing can be really cached.
Regarding mipmaps - I’m searching how to turn it off (seems hard to find ).
[EDIT: I applied this to the material - t.setMinFilter(Texture.MinFilter.BilinearNoMipMaps); but it didn’t increase the Fps. Is that what you meant?]

Empire_Phoenix · December 5, 2013, 3:10pm

@amirkr said: Thank you both:
toolforger - that helped understand the tutorial
pspeed - I did read your comments carefully: I thought the “heavy GPU” task, or the “heavy GPU transfer” occurs when I’m taking an image file from my computer and tell jmonkey - “see that? make that a geometry!”. I was surprised to see that this part was very very fast, but just calling “rootNode.atachChild(myGeometry)” made the render thread to freeze. I see now you mean the transfer to the GPU happens only when I want to display the geometry?
As for caching - you are very right. In the “test code” example, when I load the same image over and over again, it is cached and quick after a few times. But in the real code I’m attempting to load ~20 geometries from completely new images every time a user moves to a new “room”, so nothing can be really cached.
Regarding mipmaps - I’m searching how to turn it off (seems hard to find ).
[EDIT: I applied this to the material - t.setMinFilter(Texture.MinFilter.BilinearNoMipMaps); but it didn’t increase the Fps. Is that what you meant?]

If your target is a pc you might go a bit more bruteforce here, just load all rooms into the memory
The mipmap generation onyl slows down the first frame,a s the grafics driver eneds to create them on the fly when none are provided, but they should be used.
Btw if you load something else than j3o models, the loading itself can take quite some time.

amirkr · December 5, 2013, 3:14pm

@Empire Phoenix said: If your target is a pc you might go a bit more bruteforce here, just load all rooms into the memory :)

I’m afraid that is not possible (I wish it was! ).
The game has a situation (the problematic situation…) in which images are automatically downloaded from a server at some intervals. These images were uploaded (potentially) by other players. So I can’t just download them all into main memory, I need to do it dynamically as the player moves around (I could load “a lot” of images, but then when he/she reaches “the end” and I want to load new ones, the movement would again be super laggy (if not completey non responsive)).

toolforger · December 5, 2013, 3:21pm

You could load all images, and just replace/add those images that were added later.
If that’s too many images: Preload images of the neighbouring rooms.

Though that’s just workarounds. Loading images should not produce noticeable lag.
Did you find out whether the effect is influenced by image size? Image count?

amirkr · December 5, 2013, 3:31pm

@toolforger said: You could load all images, and just replace/add those images that were added later. If that's too many images: Preload images of the neighbouring rooms.
Though that’s just workarounds. Loading images should not produce noticeable lag.
Did you find out whether the effect is influenced by image size? Image count?

The Fps decrease occurs only when I do the attaching part to rootNode - I can load as many images in the “background” (create their geometries) but as long as I do not attach, there is no problem. Consider the following scenario: A player is in one room, then he/she moves to the other room. It is then when I need to attach all the relevant geometries to the rootNode (put the images in the “empty placeholders” dynamically). I could try a trick, and place them there beforehand - but what happens if the user moves to another room afterwards? The problem is that even attaching a single geometry with a “heavy” texture knocks out the Fps, making the game non responsive. It is impossible for me to “guess” when the player decides to stop moving (so I could quickly load all the images! haha), which leaves me with a severe problem: He/she wants to move, but I have to attach a geometry (either for this room, or as a preparation to the other room) but the movment is stuck.
For various reasons (related to the objective of the game itself, not code-wise) I cannot place a “loading” screen of some sort between rooms (they are not, for example, different “stages” in a game, but more like a maze you walk in).

I was able to trace the effect to the image’s size/dimensions (can’t distinguish one from the other effectively). Very small images (say 32x32) load quickly and the movment is almost smooth. Larger images (say 100x200) also load decently quick, but the movment is a bit laggy… etc. Huge images (1920x1200) load about 10 times slower than the small ones and make the Fps drop to 0-2 - this matches what pspeed said (I think) about transferring data to the GPU which hinders the render thread).

Is there a way to seperate the data transfer from the actual movement of objects? All I want is for what’s on screen to move nicely while other things load in the background! Why is this so difficult?

toolforger · December 5, 2013, 4:13pm

A 1920x1200 image is 2.2 megapixels, i.e. 6.6 MB for RGB and 8.8 MB for RGBA.
Now if you have just one image of that size, it shouldn’t cause any noticeable lag.
Dozens of them, possibly in combination with hardware or driver problems… maybe.

No idea where the exact limits are. CPU->GPU bandwidths are generous but not unlimited, and if a room contains hundreds of images of that size, this could indeed cause lag.
OTOH you never really need more than a single 1920x1200 image, and only if it’s screen-filling. You could upload only size-limited versions of the images, and upload the higher resolutions only if the user moves near a specific image - which will remain slightly blurred until the upload has happened, but the upload of a single image should be fast (within less than one frame I’d say) and the user shouldn’t see more than a slight and transient blur.

There’s also the possibility that you’re having a hardware or driver problem.
Can you provide a self-contained package somewhere so we can run the thing on our machines? You posted the sources, but we don’t have image mixes so any findings that we might report might be meaningless for a comparison with what you’re seeing.
I.e. a self-contained example would be very nice.

amirkr · December 5, 2013, 4:20pm

Yes, I can certainly do that if you’re willing to take a few minutes to run it on your machine
The 1920x1200 case is extreme - Images of a much smaller size (for example 200x200) slow down the movment as well.
This is very peculiar

I’m working on the self-contained code (what is the easiest way to put it here? Simply copy-paste as code the 3-4 classes I’m using, including the main?

toolforger · December 5, 2013, 4:52pm

You could upload it as a zip somewhere.
If a set of images is on a publicly accessible server, you could simply post the URL.

200x200 images should be a non-issue. That’s just 4k pixels per image, i.e. 12 or 16 KB per image. That’s nothing.
I suspect either some issue with your code, a yet undetected issue in JME, or a driver/hardware problem. (Which, of course, doesn’t say anything about what the cause is.)