I am trying to create a shader that is able to write 2D images onto the screen. I choose not to use the HUD for this, because there will be many of them so I will be instancing them and later on I will also do some more complex shapes using this shader.
In my understanding, 2D shaders can be kept very simple.
Let’s say I have an indexed mesh with these four vertices:
check index order.
gl screenspace coordinates range from (-1,-1) to (1,1)
iä
check your custom geomerty with a workig shader and check your shader with a working geometry
I don’t know how far are you going with this. But I have used com.jme3.ui.Picture to do what you are maybe trying to do. I also have made some custom shaders on top of that but this is my base.
From what I understand, com.jme3.ui.Picture uses a regular shader. The regular vertex shaders do a lot of stuff that is overkill, like all the transformation matrix calculations, just to project coordinates that already are screen coords to coords that are in a way also screen coords. I have red some examples that skip all that and do the vertex calculations as simple as I described.
Yes… well explaining it to you after all your help with Lemur feels a bit awkward: I am working on a library to extend the HUD functionality. Currently a part of my interface is done with Lemur, the other part are lines, text and images to visualize the neural networks and a third part of it are custom meshes to create graphs like bar and piecharts. And it does not work well together, there are features missing and I can not get the overall user experience right with this combination.
And I think there is some performance to win. I don’t know about Lemur, but the latter two are not optimized for performance that much.
So the library I want to make must be able to draw a diverse set of basic elements, like images, lines, boxes with borders, text and circles. All these elements will have a uniform way of interaction handling and can be combined to create custom gui controls.
I already did a lot of complex shaders with nice tricks for UngaMunga, the only new thing I am doing here is 2D.
O, and I am some sort of shader addict. I love puting stuff in shaders…
It’s true that Unshaded will project 3D coordinates to 2D “view space” coordinates but this is really just a handful of multiplies… and to project screen space to view space coordinates you will end up doing a similar amount of multiplies.
…which I hope you are doing as a mesh at least.
…essentially the same things, I guess.
Optimizing the vertex matrix multiplication is the wrong direction to worry. Your primary concern should be reducing draw calls… do everything you can to reduce draw calls. The graphics card can handle millions and millions of vertexes and triangles… but a few thousand draw calls will kill everything.
If you think of a draw call like a truck… you can send 1000 trucks with a quad each or you can send 1 truck with 1000 quads.
Everything else is noise in the statistics, really.
Edit:
Plenty of opportunities for custom shader work when you aggressively batch… especially if you change your mind about your network being 2D and want to project it in 3D with billboarding for the text+images… you will absolutely want to do that billboarding in the shader.
I am sure your right, but knowing how much I hated doing arithmatics, I wanted to make the lifes of those poor GPU’s a bit easier.
My main target is limiting draw calls: all these elements will be instances from a limited number of meshes. The meshes have a list of definitions in a FloatArray - hence my other questions - and I am messing with the InstanceData to pass the definition to use with an instance and its parameters. The definition tells the shader stuff like where on the atlas texture an image is. This is something I already do with other shaders.
Yeah, but in your other one you were passing all of that as a material parameter… and really that kind of data should be vertex attributes.
But you wouldn’t be. There will be vertex math either way and the GPUs eats that stuff for lunch and asks for more. You could even do trig functions in your .vert shader and you’d hardly notice the performance. A few multiplies is nothing.
Ok, let’s take an example: textured boxes with borders are 3x3 quads, like TBT in Lemur. If I read your suggestion right, you would propose 16 vertices for every element.
With each three floats and two UV’s. Let’s say we have around a thousand of those. That would mean a vertex buffer of 1631000=48000 floats and a UV buffer of 32000 floats.
My solution would have just 48 vertice floats and 32 uv floats. The rest is in InstanceData 16 * 1000 floats.
Let’s add some animation to some of the elements- move them or have them turn slightly. In the first scenario, the CPU would have to do a matrix multiply on the vertices of the animated elements every frame. And then send 48000 floats back to the GPU.
In the second scenario, I only need to change the matrix data in InstanceData for the animated elements, and send just 16000 floats back to the GPU. The GPU will do the real work with the matrix.
In both cases by the way, most stuff is in the buffers anyway. The only uniform that helps is a float array with definitions that accompany the atlas texture.
Maybe we’re talking about different things because I don’t understand how a 20,000 element float array would be needed for atlas stuff.
I wasn’t proposing a specific implementation… just that you try to reduce draw calls and put object data in vertex attributes… which seems like exactly what you describe with “instance data”… which generally, by definition, are vertex attributes.
I don’t understand enough about what you are actually doing with the data to comment further. Like, I don’t even know why you’d need a TBT quad… those really only exist so that the default shaders can be used to stretch texture borders. It’s possible that you could encode the border information into vertex attributes and do math in the frag shader instead.
Instancing itself is not fast for small meshes. Batching objects is way faster. Updating uniforms is again faster and there is a possibility to get a kind of instancing behaviour with only uniforms. But it comes with a whole bunch of downsides such as having to duplicate vertices and indexing is not possible.