Best/easiest way to fade in a sprite?

toolforger · September 18, 2013, 8:03pm

I’m trying to fade in a sprite, starting with an alpha of 0 and ramping that up to alpha = 1.
Complications:

The sprite image is a png with an alpha channel, which needs to be honored (i.e. pixel alpha and fade-in alpha need to be multiplied)
Implementation techniques I have seen are Picture, quad, billboard, and point sprite. I’m currently using Picture but I’d be willing to use whatever would work.
Performance would be nice. @tralala used point sprites and had over a million sprites @ 200 fps or something like that.
I’ll use it in the tutorial I’m making, so I’d like to use best practices as far as any are established.

pspeed · September 18, 2013, 8:10pm

For point sprites, color.

For quads, color (either material or vertex).

For Picture, use Quad.

t0neg0d · September 18, 2013, 8:24pm

There are a few GUI libraries that do this… you have the source code available… see how they implement it?

toolforger · September 18, 2013, 8:49pm

@pspeed That was a bit too terse, sorry.
Point sprite… what class is that? There’s no PointSprite.java in the engine.
Quad… what shader (material) do I use? It this fast?
Picture… no idea how this relates to Quad. (Picture is a Java class in the engine, primarily purpose seems to be GUI stuff.)

@t0neg0d Heh, I’m pretty sure a lot of libraries exist, but I have no idea which ones do it well.

pspeed · September 18, 2013, 10:56pm

@toolforger said: @pspeed That was a bit too terse, sorry. Point sprite... what class is that? There's no PointSprite.java in the engine. Quad... what shader (material) do I use? It this fast? Picture... no idea how this relates to Quad. (Picture is a Java class in the engine, primarily purpose seems to be GUI stuff.)

Point sprite is a type of mesh, really. The particle emitter uses it internally and that’s the code I cribbed to do mine that weren’t particles. I assumed you already knew about them since you referenced them in your post and pointed to an example.

Either shader would technically work but obviously Unshaded is the more appropriate one. (There are really only two shaders in JME after all). Then you can either use vertex color attributes (would allow batching) or the material “Color” parameter. This color would be multiplied by whatever the texture color is.

My point with Picture is that you cannot easily do what you want to do. So if you are using Picture then switch to quad. You said you were already using Picture so I mentioned it. Whatever you do, you probably shouldn’t be using Picture.

I thought you knew your way around JME more than this or I would have provided more info initially. Sorry.

Personally, I’d use batched quads. Point sprites are harder to setup and have some built in limitations (always screen aligned, on some cards cannot be more than 64 pixels on screen, etc.) A set of batched quads will use roughly four times as much data but in general I think it’s worth it. And they are easy to understand and work with an existing JME material.

t0neg0d · September 19, 2013, 3:30am

@toolforger said: @pspeed That was a bit too terse, sorry. Point sprite... what class is that? There's no PointSprite.java in the engine. Quad... what shader (material) do I use? It this fast? Picture... no idea how this relates to Quad. (Picture is a Java class in the engine, primarily purpose seems to be GUI stuff.)
@t0neg0d Heh, I’m pretty sure a lot of libraries exist, but I have no idea which ones do it well.

They all do it the same way:

color.a *= alpha;

t0neg0d · September 19, 2013, 3:36am

@toolforger said: I'm trying to fade in a sprite, starting with an alpha of 0 and ramping that up to alpha = 1. Complications: - The sprite image is a png with an alpha channel, which needs to be honored (i.e. pixel alpha and fade-in alpha need to be multiplied) - Implementation techniques I have seen are Picture, quad, billboard, and point sprite. I'm currently using Picture but I'd be willing to use whatever would work. - Performance would be nice. @tralala used point sprites and had over a million sprites @ 200 fps or something like that. - I'll use it in the tutorial I'm making, so I'd like to use best practices as far as any are established.

Picture = fine, I guess… no clue on rotation.
Quad = ideal
Billboard = kinda silly in Ortho
PointSprite = not sure how this applies.

toolforger · September 19, 2013, 6:50am

@t0neg0d Rotation works for Picture, the example video does exactly that
Agreeing about the silliness of Billboard

@pspeed okay, quad it is then. I guess point sprites aren’t rotatable anyway (is this true?)

That would be Unshaded.jm3d using the Texture and the Color parameter, right?
With the color as White-with-alpha?
(Just trying to lay out the strategy before I sink time into a path that won’t work in the end.)

pspeed · September 19, 2013, 10:00am

@toolforger said: @pspeed okay, quad it is then. I guess point sprites aren't rotatable anyway (is this true?)
That would be Unshaded.jm3d using the Texture and the Color parameter, right?
With the color as White-with-alpha?
(Just trying to lay out the strategy before I sink time into a path that won’t work in the end.)

Point sprites cannot be rotated… they are always screen aligned (as I mention above) and sometimes they have a max size.

For quads you can go Texture + Color or Texture + vertex color. The latter will let you batch but now I’m repeating myself.

Yes, white + alpha. They are multiplied together textureColor * color… so whatever color gets you the result you want from that…

toolforger · September 19, 2013, 11:11am

I wasn’t 150% sure what exactly screen alignment meant, I just wanted to make sure

Not sure how batching would help. If each sprite moves independently or has an independent Color setting, I’d be uploading the batched mesh for each frame and not gain anything, right?

Ticking off White+alpha and no-no-pointsprites as a solved sub-aspects, thanks.

pspeed · September 19, 2013, 11:20am

This has been gone over here a few times… but which do you think is faster of these two approaches?

Package up a ping-pong ball. Wrap it up nicely. Write an address on it. Call the postman to pick it up. Wait by the door. Hand it to him. Repeat 500 times.
Package up 500 ping-pong balls… and send them at once.

Less objects = better performance. Your GPU will eat through vertexes like a wood-chipper through ping-pong balls.

Whereas each object will require an update logical state traversal, an update geometric state traversal, a cull check, resending all Material uniforms to the GPU, and a draw dispatch to the GPU. All for four little vertexes.

You could batch hundreds and hundreds of quads, send them all at once, and your system will barely blink.

zzuegg · September 19, 2013, 11:22am

I guess using a lot of drawcalls might be more costly than updating a buffer once… I have written ‘i guess’ because from my testing experience i have never gained much tweaking mesh sizes in favor of drawcalls. That might be different if all your objects are always on screen… and as it seems they are only simple objects

toolforger · September 19, 2013, 1:21pm

I can believe that this has been gone over a lot of times.
Obviously I’m misunderstanding the trade-offs between batched and non-batched updates; I just don’t know where my mental model is wrong. (I read the Javadoc for the batched node but it isn’t very detailed about what it actually does.)

Here’s my model:

The number of vertices doesn’t change from batching. Each quad has four vertices, batched or not.
The number of shaders sent is unaffected. Shader instances are generated for each combination of #IFDEF outcomes that each shader’s use has, and each instance is sent once as a scene needs it. I don’t know when, if at all, shader instances are deleted.
Scenegraph spatials that have not changed since the last frame aren’t resent if they are unchanged in the current frame, they are simply left as is in the GPU. This means that it’s a good idea to keep changing and unchanging parts of the scene graph in different batches (I previously had thought that you never batch stuff that’s changing because it’s useless, you resend the mesh anyway.)
A “mesh” is send as a list of vertex coordinates, plus some settings. I have no idea how much these additional settings would be. A sprite would be four coordinates = 16 floats, plus an index buffer with six entries to circumstribe two triangles; a texture id would be another int, plus four floats for the RGBA color. Meaning a per-quad overhead of about 25% if unbatched, but then no overhead for reassembling the batch on each frame… I’m not sure whether that would affect performance.

I’d be grateful to hear confirmation or correction for each bullet point, that’s the kind of informatoin that allows me to correct my mental model and improve my predictions of what will go fast and what will go slowly.

I might be missing relevant parts; hints in that direction would also be highly welcome.
Right now, I think I’m missing

how multipass rendering is initiated, and what data transfers that causes;
how to retrieve data from a GPU-side buffer, and whether that causes overhead beyond the raw data.
In both cases, that’s because I haven’d used these yet and there’s so much other stuff to learn

zzuegg · September 19, 2013, 5:05pm

What you are missing is actual commands send to the gpu…
Each of those call’s cost time, actually more than it would cost time transfer a large buffer.

In a world with 1000 squares you have to call:
for each square:

glBindBuffer() for each vertex,texcoord,color Buffer
glUpdateUniform(model matrix)
glDraw(indexBuffer)

In a 1000 square world that are at least 3000 gpu interactions…

If you batch them instead:

glUpdateBuffer(vertexBuffer,theDirectBufferWithVertices)
glBindBuffer() for each as above… (texCoor and color might not change so they don’t need an update)
glDraw(indexBuffer)

Thats it, maybe 5 gpu interactions…

But on a larger scale (in terms of map size) (and with lots of objects outside of FOV) my experience has shown:

Culling>Batching

zzuegg · September 19, 2013, 5:18pm

@toolforger said: 1) how multipass rendering is initiated, and what data transfers that causes; 2) how to retrieve data from a GPU-side buffer, and whether that causes overhead beyond the raw data.

Usually multipass rendering refers to rendering the same object more times with different shaders… Don’t know what you try to do so i don’t know if you really need multipass
That really depends if you need the data really on the cpu. For example rendering to a texture and the using that texture in another renderpass does is nearly for free. But if you really need the rendered image on the cpu than it is going to be costly since AFAIK you need to wait till the GPU has done the job…

Could also be that the data takes can be ‘random’

However, most gpu data is only used on the gpu. For only calculating something you might use opencl

pspeed · September 19, 2013, 5:53pm

Skipping a few posts just to add this…

I said “batch the quads”. I never ever said “use batch node”. The tradeoffs are very different for batch node as it will still do some of the scene graph management.

The fastest way is to make your own mesh of quads and manage it yourself. You will be doing some but not all of the same work as batch node but if you have a lot of quads then that difference may be significant. Also, you can just how to batch them. If you want to let JME manage your separate quads then you will still gain an improvement over a whole bunch of regular separate Geometries (assuming they all have the same material) but not as much as if you are composing the mesh yourself.

pspeed · September 19, 2013, 6:06pm

@toolforger said: I can believe that this has been gone over a lot of times. Obviously I'm misunderstanding the trade-offs between batched and non-batched updates; I just don't know where my mental model is wrong. (I read the Javadoc for the batched node but it isn't very detailed about what it actually does.)
Here’s my model:

The number of vertices doesn’t change from batching. Each quad has four vertices, batched or not.

The number of shaders sent is unaffected. Shader instances are generated for each combination of #IFDEF outcomes that each shader’s use has, and each instance is sent once as a scene needs it. I don’t know when, if at all, shader instances are deleted.

Scenegraph spatials that have not changed since the last frame aren’t resent if they are unchanged in the current frame, they are simply left as is in the GPU. This means that it’s a good idea to keep changing and unchanging parts of the scene graph in different batches (I previously had thought that you never batch stuff that’s changing because it’s useless, you resend the mesh anyway.)

A “mesh” is send as a list of vertex coordinates, plus some settings. I have no idea how much these additional settings would be. A sprite would be four coordinates = 16 floats, plus an index buffer with six entries to circumstribe two triangles; a texture id would be another int, plus four floats for the RGBA color. Meaning a per-quad overhead of about 25% if unbatched, but then no overhead for reassembling the batch on each frame… I’m not sure whether that would affect performance.

I’d be grateful to hear confirmation or correction for each bullet point, that’s the kind of informatoin that allows me to correct my mental model and improve my predictions of what will go fast and what will go slowly.

I might be missing relevant parts; hints in that direction would also be highly welcome.

Aside from the information Zuegg provided, I will reiterate in bullet form the stuff that happens for every object in the scene graph every frame regardless of state. I will include the state-optional stuff with big "IF"s in front. (Each of these is CPU time.)
-updatLogicalState() is run on every object
-updateGeometricState() is run on every object (always)
-IF the transform has changed then the world transform and world bounds are recalculated.
-culling is performed on every object
-depth sorting is done on every non-culled object

For the non-called, now sorted objects: (Each is a GPU call.)
-shader is bound (not recompiled unless needed) for every visible object
-uniforms are resent for every visible object (every time every frame. yes.)
-all of the buffers are bound (they are resent if any have changed)
-transform is setup
-render state is setup (pretty sure only changes are sent since last render state)
-object is drawn

So you can see why 1000 objects might be kind of expensive as compared to 1. You can send a pretty sizable buffer in the time it takes to do all of that.

Real world example is my drop shadow code filter that I wrote. In the first “get it working” case, I sent every shadow as a separate 8 vertex box. I’m bypassing all of the normal rendering stuff because I’m calling render(Geometry) myself. So it’s nearly all the GPU dispatch side. When there were 1000 shadows in view, this dropped FPS from 200 or so down to 30. When I batched them up, FPS dropped from 200 down to 190. Most of that time was taken sorting and prepping the boxes themselves.

Pretty significant, I think.

toolforger · September 19, 2013, 6:17pm

Thanks again to you both, this is very useful information.

@zzuegg said: What you are missing is actual commands send to the gpu.. Each of those call's cost time, actually more than it would cost time transfer a large buffer.

Wow. I wouldn’t have thought that the OpenGL is that heavyweight.
Advice duly noted on the “things to factor in” list.

@pspeed I guess I’ll try Monkey Blaster without batching first, then try the various batching options.
As far as I understood the engine sources, the update*State() functions do nothing unless the corresponding flag is set. I.e. an entire subtree had no geometry changes -> entire subtree is ignored by updateGeometricState().

Uniform resend - okay, that’s gonna hurt if it’s done for all nodes, modified or not.
Is this a limitation of OpenGL, or just an optimization not (yet?) done in the JME engine?

zarch · September 19, 2013, 9:03pm

The new particle system will handle all the batching, billboarding (or not billboarding), rotating, fading, etc for you.

https://wiki.jmonkeyengine.org/legacy/doku.php/jme3:contributions:particles

Just set the texture, spawn the particle in the correct position/orientation, set a colour influencer, job done.

toolforger · September 20, 2013, 7:08am

@zarch said: Just set the texture, spawn the particle in the correct position/orientation, set a colour influencer, job done.

Wow. That’s awesome. I thought I’d have to build my own meshes, which I don’t like because it’s a bit hard to document vertex position and vertex index.
This seems to fit the bill perfectly.

Since this is going to be a tutorial, I think I’ll first use quads, then demonstrate the speedup that a particle emitter gives.

Is there a way to give a +3?