Deferred Texturing (~10% performance increase)

This is more a fastly tested idea than a solution to nearly everything.

What i did is following:

  1. Pack all your textures into a textureArray
  2. Modify your material parameters to use accept a “Int” instead of a Texture
  3. Pass the array position of your texture to your shaders instead of your actual texture
  4. Modify the output to write gl_FragColor= vec4(texCoord,textureId,0);
  5. Create a post filter that reads the texCoord and texId and lookups the textureArray

[java]
#extension GL_EXT_texture_array : enable

uniform sampler2DArray m_Textures;
uniform sampler2D m_Texture;
varying vec2 texCoord;
void main()
{
vec3 realTexCoord=texture2D(m_Texture,texCoord).xyz;
vec4 color=texture2DArray(m_Textures, vec3(realTexCoord.x,realTexCoord.y,realTexCoord.z));
gl_FragColor=color;
}
[/java]

Voila, you saved a lot of unneccesary texture lookups.

5 Likes

Seems like a neat experiment. The longer I thought about it, the more issues/limitations I thought of… so I don’t want to sound critical but I thought I’d mention some things in the slight chance you hadn’t considered them yet.

First, I’ll start with a performance clarification…
905 FPS = 1.105 ms per frame
828 FPS = 1.208 ms per frame

So about an 8% or 9% increase depending on which way you calculate it. I’m being pedantic but I thought I’d get the most nit-picking thing out of the way first. :):

Second, I would think this would have the most impact on scenes with a lot of overdraw. Otherwise, texture lookups will be about one-to-one.

So now the limitations:
-no lighting
-no alpha blend (at all… no second pass can fix it except as a separate viewport which comes with its own limitations)
-no proper AA I guess
-post processing means only works on hardware that supports non-power of two textures… so something like Android (where I’d think a trick like this might have the most impact) won’t be able to use it
-requires texture array support… which I guess pretty much everything does now but the cards that will be weakest on over-draw are likely the ones without.

Actually, thinking about it some more, I wonder if in some configurations this ends up being worse than regular texture mapping because it is harder to optimize texture access in this case. At least with normal fragment processing, the GPU already knows where the next lookup is likely to come from as it draws the mesh for that material. In this case, the locality of texture lookups will be potentially all over the place. This may have more of an impact as texture size vs GPU RAM increases.

Also, in the no overdraw case you’ve traded one texture lookup for two now. You end up saving the most performance only on the pixels that overdrawn more than twice.

Another interesting thing would be to know the impact of the post-processing pass… more out of curiosity than anything else. What kind of performance do you get if you just render the texture ID but don’t fix it up in post? Some cards seem to do worse with post-processing than others.

Sorry to sound like I’m poo-poo’ing all over it. It’s a neat experiment in any case.

The limitations are mostly the same as with deferred lighting. If its worth exploring further in that direction i have a combination with deferred lighting in mind.
It would be possible to do lighting in forward mode too, but it would require an additional rendertarget.

As for android, if you are not going to design your map/level/game in a way there is basically no overdraw then you are already lost :slight_smile:

@pspeed said: Another interesting thing would be to know the impact of the post-processing pass.... more out of curiosity than anything else. What kind of performance do you get if you just render the texture ID but don't fix it up in post? Some cards seem to do worse with post-processing than others.

Its 1110 fps without the post processing. I would not have tought that a single post pass makes such a big difference. Its actually that big that i am going to revalidate the results before.

Another thing is, i currently don’t know why the uniform number is higher in the “post texturing” mode.

Since i am curious i would also like to bake the texture id into the mesh, just to measure uniform change impact.

@pspeed said:

Actually, thinking about it some more, I wonder if in some configurations this ends up being worse than regular texture mapping because it is harder to optimize texture access in this case. At least with normal fragment processing, the GPU already knows where the next lookup is likely to come from as it draws the mesh for that material. In this case, the locality of texture lookups will be potentially all over the place. This may have more of an impact as texture size vs GPU RAM increases.

GPU RAM is definately an issue, memory paging does not work anymore if you use a textureArray > gpuRam
On empty scenes with no overdraw it will beeing worse. Again the same issue as with deferred lighting.

The (for me) nice upside would be that once you paid the nearly constant texture lookup fee, the cost of adding overdrawn objects to the scene is only very little. (At least from a texture lookup’s point of view).

No problem at all regarding the poo-pooing, keep it going :slight_smile:

Space Texture Arrays might solve the gpu ram problem later on (read when using opengl4 as baseline), (already had atalka bout that in the irc today)