VSync vs/ Not VSync

I have a question.

When I have VSync turned on the Game Menu runs around 54fps. Inside the game it runs around 30FPS.

If I turn off VSync off and set the FrameSetting to 120. The Game menu runs at 118fps and game runs around 56fps.

If JME can run the game menu at 118 when not limited, why does VSync in JME not able to do a steady 60fps. The game runs at a low 30fps with Vsync but with out it is runs at almost 60fps.

I’m trying to wrap my head around why JME performance with VSync make the code so much slower. In my own engine, I never saw any issues with VSync or not… But my engine was built for the game so generic coding was not done.

Does anyone know what might cause VSync in the game to around only around 30fps but without it, it is almost around 60fps.

Thanks,

What refresh rate does your monitor have? Vsync will cap the frame rate to the monitor’s refresh rate

Might be related to this bug which was fixed in 3.4.0-stable:

Vsync doesn’t update the graphics until the vertical blanking period. This means that you will essentially never get tearing and your frame rate is nicely locked to that of the monitor. Also, in a lot of cases, the code can continue building the next frame while the previous frame waits for vsync.

Setting the frame rate sleeps for whatever time is left on the frame to make X FPS. Essentially your CPU wastes time to wait for the next frame. If all of your game logic is on the render thread then you pause your whole game while that waits.

Neither of those are “unlimited”. You can run without either of them and things will run as fast as they can and sometimes you will hear your GPU capacitors screaming through your speakers, too.

What happens if you set the frame rate to 60?

When you set it to 120 and lose two frames, that automatically implies to me that some frames are taking longer than 1/120th of a second. Depending on why that is, it could happen more or less often with different timing… just depending on how things line up.

Depending on how the frame rate is being capped, “sleep at the end of frame” can be more forgiving than vsync. If a frame runs a tiny bit long, a sleep might just get skipped. VSync would have to wait for the next frame so you drop a whole frame.

In JME, there is a detailed frame stats state you can set that will show you where all of the time is being spent per frame. It’s sometimes illuminating.

Edit: as an example of what I’m talking about with vsync versus sleep.
If your frames are consistently 17.2 ms then running flat out would be 58 FPS but with vsync if every frame is taking longer than 16.6 ms (even .6 ms longer) then you drop every other frame and get 30 FPS.

…but really, frames should target being faster than 16 ms for consistency. Like, half that on average if possible. (And everything you do on the render thread counts towards this.)

2 Likes

Thanks. That make sense. I will take a look at the frame time, you are probably right on.

The frames are taking a bit long, with vsync that can cause an entire extra loop before refreshing.

With out it it explains the 56 fps on frame lock instead of the 60. I have stuff taking long than 1/60 but not much, because it only falls 4 frame per sync.

1 Like

I’m using JME 3.5 lwjgl3. So it wouldn’t have that issue and I didn’t see that. I think the response pspeed wrote applies more to my case.

1 Like

Yes. Figured it out. I turned of Particles and now it runs at 120FPS. I knew JME particle system is not good, so I switch to ParticleMonkey, but the performance there is also not good either. My testing from above was from ParticleMonkey with 40,000 instance.

But going through code, I don’t see particle Monkey using instancing either. So, I guess I will need to write my own that uses instancing so I can gain the speed required to run that.

I see examples of the code using InstanceNode. a quick test on that, it was able to handle 80k with over 6000 fps on my machine. But have to write my own particle system to handle what is needed.

Does anyone know of the particle system using instancing for JME?

1 Like

Jme’s particles system uses one single mesh for all the particles, so instancing won’t do much i think, at least when compared to point sprite particles, it might be a bit better compared to quad particles.

If point sprite particles are not good enough, probably you should look into reducing overdraw and implementing as much logic as possible in the shader (to reduce data transfer to the gpu)

1 Like

Thanks. Single mesh, that means every frame update the mesh is updated and the GPU is update with new mesh data.

Instancing is far better for rendering the same mesh 10k times. You have 1 object with 1 mesh and if it is a quad then you have 2 triangles for 10k objects.

I noticed JME really doesn’t use instancing anywhere. It is more about changing the mesh over and over and over and over…

Instancing is not free. 10k quad instances is definitely faster than 10k separate quad objects but it’s not as fast as a 10k quad mesh. You’ve pushed the 10k draw setups to the driver is all. They still have to be done.

Given that an instance will need a buffer of transform data per object (10k of them), for triangles (and sometimes quads) it can be better to update 30k (or 40k) vertexes in a mesh buffer than it is to update 10k transforms… just to let the GPU do 10k different object setups.

Where this tradeoff is, is not always 100% clear and may vary somewhat from card to card, or bus to bus, I guess.

I would probably never instance triangles. I’d test quads on a variety of platforms before deciding. For anything larger than a quad, instancing would be a fine choice.

…and point sprites will be the rock star of all of them when you can get away with it.

In my experience, I’ve not seen that to be faster than instancing. I see a couple of different examples in the examples. One is multiple meshes, another is batch Node, then you have the mesh update to have 1 mesh and alter that mesh to reflect all the particles and then instancing.

From what I can see in FPS, the instancing comes out far better than any of the later.
If you have an example of showing the mesh faster than instancing, please point it to me so I can check it out.

Both instancing and your (all in one mesh) makes just one call, the render times, (thinking about it). don’t see how the one mesh is faster. You still have a call on every vertex and a call on every fragment. Even when say most of that mesh is outside of the frustum.

From what I see in 3.2+ Point Sprite have been removed, more and more newer GPU do not support point sprites anymore.

That’s not true, the constant that was used to enable and disable them was deprecated , because now they are always enabled when you draw points.

As @pspeed noted, you are still going to need to update the particles position on your gpu since these emitters compute them on the cpu.

If you render 1000 particles with point sprites you are going to have 1000 vec3 in your position vertex buffer, if you do the same with instancing you will still have 1000 vec3 to update but on the instance data vertex buffer. So, at best, it will perform the same.

But if your particles are quads, then you might have a point in wanting to use instancing, since you could save 3/4 of the bandwidth for positions and also the entire index buffer and texture buffer (assuming your effect doesn’t need to change it). With more complex shapes it makes even more sense.

However i don’t see why anyone would want to use something other than point sprites for particles.

Down in the deepest levels there is still per-instance setup. You do that once for a big mesh. You do that 10k times for 10k instances. How much per-instance setup there is will be hard to say and is completely hidden behind the draw call. (It will also matter a lot if its the driver doing it instead of the GPU.) Just the fact that you can have some buffers that repeat every instance, some every 5 instances, some never (the transforms) means there is some setup for the geometry that wouldn’t happen for a simple mesh that can simply stream vertexes and edges, etc…

Plenty of articles on this when I looked back in the day. Like you, I saw instancing as a silver bullet for all of my “lots of things” needs. Someone much smarter about graphics than I am pointed out that there is a hidden cost so I dug in and played around.

It should be easy enough to test. Just create two versions of the same mesh, one custom mesh full of all the data and one instanced version. Then make them big enough or numerous enough until there is a time difference. I no longer have my test code anymore.

At least you aren’t trying to use it for static things like blades of grass… that’s when it gets really crazy. The comparison for static geometry is quite a bit different than for dynamic geometry.

…and in the end, in either approach, if you can find a way to do the animation in the shader you will double-win.

Edit:
I should caveat this all with the fact that I never use JME’s built in conveniences for this and always choose to make my Mesh and VertexBuffers directly. So if you are playing around with InstancedNode and/or BatchNode you could be hitting things unrelated to Mesh and JME’s low-level integration of buffers.

There is a lot more to it than that.

If you look at Particle Monkey, you have EVERYTHING being DYNAMIC variables.

Your vertices, text coords, normals are all dynamic. You have 10k * 3 * 2 (Simple Quad) verticies, 10k * 3 text coords, normals 10k * 3

These are being updated every Frame.

Instancing.

vetices 3 * 2 (Simple Quad) (STATIC), 3 text coord (STATIC), 3 normals (STATIC). 10k world position (ONLY DYNAMIC VARIABLE).

ONE HUGE MESH
10k * 3 * 2 = 60k
10k * 3 = 30k
10k * 3 = 30k

60k + 30k + 30K = 120k of data EVERY FRAME.

INSTANCING
10k * 3 = 30k
30k = 30k (EVERY FRAME) Rest is static, you don’t have to update GPU.

So difference per frame is 120k of data vs. 30k of data EVERY Frame.

There is more to think about than just world position.

Yes, one HUGE Mesh is one call, Instancing can be one call or could be more depending on your batching of instancing. Every CPU to GPU call is a large hit on performance and sending that kind of data every frame will take its toll.

So those numbers aren’t technically correct and they also aren’t comparing apples to apples either.

Particle monkey at the moment only updates positions, vertex colors (so you can change the color of particles / alpha) and texture coords (so you can use animated sprites). Normals aren’t updated which is probably a bug if you want particles lit from the scene. There is some room for optimization here based on what influencers you use.

In your example of instancing you miss out on particle rotation, particle sizing, animated sprites, particle coloring, and transparency. Once you do that, instancing doesn’t give you any sort of speed advantage. If you don’t need rotations, you could use point sprites to reduce the vertex count. The numbers basically come out identical at that point.

Particle monkey was patterned off an emitter design which was an extension to the jme emitter so I would expect performance to be similar at the moment.

Thanks for all your help. I was able to go into the engine and tweak it and now I’m getting a steady 120fps on the game. I changes several BaseAppState to make them run faster. Also, I tweaked the Particle system for more speed.

And since I was using BatchNode in my game, in a 16x16 grid. I turned that into chunks and added a simple chunk manager to handle removing and adding rendering of chunks.

Now it is a steady 120fps on my machine. The two biggest areas for improves was tweaking Particles and the chunk manager.

Thanks.

3 Likes

If I understood correctly, you made some optimizations to the jME engine code? If so, any of them general that could be included in the engine? :stuck_out_tongue:

2 Likes

No sorry, didn’t explain myself correctly, I optimized my code. When I said optimized the particle system. I was talking about the one I have in my code.
I didn’t touch JME

2 Likes