[SOLVED] Need help to improve my shader code


In my terrain material, I am using the below code in my fragment shader to fix the texture repeating effect.

  // The MIT License
  // Copyright © 2017 Inigo Quilez
  // Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  // A technique to avoid texture tile repetition when using one small
  // texture to cover a huge area. Basically, it creates 8 different
  // offsets for the texture and picks two to interpolate between.
  // Unlike previous methods that tile space (https://www.shadertoy.com/view/lt2GDd
  // or https://www.shadertoy.com/view/4tsGzf), this one uses a random
  // low frequency texture (cache friendly) to pick the actual
  // texture's offset.
  // Also, this one mipmaps to something (ugly, but that's better than
  // not having mipmaps at all like in previous methods)
  // More info here: https://iquilezles.org/articles/texturerepetition
  vec4 textureNoTile3( sampler2D diffuseMap, in vec2 texCoord, float diffuseScale, float noiseScale, int noiseLayerCount, float noiseOffsetFactor, float noiseRotationFactor) {
      vec2 uv = texCoord * diffuseScale;
      vec4 color;
      #ifdef NOISEMAP
        if (noiseScale > 0.0) {
          // sample variation pattern
          vec4 noise = texture(m_NoiseMap, noiseScale * texCoord); // cheap (cache friendly) lookup

          // compute index
          float index = noise.x * noiseLayerCount;
          //          float ia = floor(l); // my method
          //          float ib = ia + 1.0;
          float indexA = floor(index + 0.5);
          float indexB = floor(index);

          float fraction = fract(index);
          fraction = min(fraction, 1.0 - fraction) * 2.0;

          // compute offsets for the different virtual patterns
          vec2 offsetA = sin(vec2(3.0, 7.0) * indexA) * noiseOffsetFactor; // can replace with any other hash
          vec2 offsetB = sin(vec2(3.0, 7.0) * indexB) * noiseOffsetFactor; // can replace with any other hash

          // compute rotations for the different virtual patterns
          const float TWO_PI = 2.0 * 3.141592f;
          float angleA = offsetA.x * offsetA.y * TWO_PI * noiseRotationFactor;
          float angleB = offsetB.x * offsetB.y * TWO_PI * noiseRotationFactor;

          // compute derivatives for mip-mapping
          vec2 dx = dFdx(uv);
          vec2 dy = dFdy(uv);

          // sample the two closest virtual patterns
          vec3 colorA = textureGrad(diffuseMap, rotateUV(uv, angleA) + offsetA , dx, dy).xyz;
          vec3 colorB = textureGrad(diffuseMap, rotateUV(uv, angleB) + offsetB , dx, dy).xyz;

          // interpolate between the two virtual patterns
          color = vec4(mix(colorA, colorB, smoothstep(0.2, 0.8, fraction - 0.1 * sum(colorA - colorB))), 1.0);
        } else {
          color = texture2D(diffuseMap, uv);

          color = texture2D(diffuseMap, uv);

      return color;

it reads diffuse texture two times each time with a slightly offseted UV and mixes the colors to remove texture repeating effect. Also, there is one texture read for a noise map.

I noticed it decreases by around 5 fps per texture layer on terrain. I am using 4 texture layers on terrain and am losing 20 fps just for running the above method.

Any suggestion on how I can improve its performance?

Is it possible to bake the result to vertex color or into an in-memory texture and only run it when re-baking is required?

I also found about extensions like ARB_shader_image_load_store and SSBO but sounds like they require OpenGL 4. Also could not find an example of usage in JME

Note: “losing fps” is not a good measure. Losing 20 off of 5000 FPS is not the same as losing 20 off of 30 FPS.

ms/frame is a better comparison metric.

As to the other, I guess you look up the noise and make the offsets for every texture sample? Maybe you could just look up the noise offset once. Still, you’d be doing twice the number of texture lookups as normal. There isn’t much way around that.

Bake what result? What would be an acceptable value to stretch over the whole triangle?

1 Like

Thanks for your feedback

Well, In my case I am losing 20 out of 60. (It is an old AMD/ATI Robson CE Radeon HD 6370M). I will look into detailed profiler results.

Ah, wait, I am stupid, was confusing per vertex vs per pixel!

I noticed it decreases by around 5 fps per texture layer on terrain. I am using 4 texture layers on terrain and am losing 20 fps just for running the above method.

Thats why i did split terrain into grids(Geoms), where each grid have like 4 tex-layers.

Blending multiple textures take time sadly. In your case i belive you need all textures able for each part of Terrain so its a problem here.

1 Like

Yes, it could vary for different textures. Currently, I let each texture has its scale factor for noise, which is multiplied to texture coordinates when reading from the noisemap here: texture(m_NoiseMap, noiseScale * texCoord); but I might need to change this to use one scale factor for all layers if I must.

60 with vsynch enabled?

I mean, it only cuts it down from 12 texture fetches to 8, right?

Back in the 90s, with the fixed function pipeline, there used to be a technique where you’d render the same mesh multiple times using depthTest=equals on the subsequent passes. This was used to layer textures similar to how we do it all in one shader today. As per-layer processing goes up, I’ve been curious if this old technique would have some benefits. On the one hand, we’re doing multiple draw calls but on the other hand we get to do things like alpha discard to skip some fragments and stuff… and maybe the shader optimizes a little better dealing with one texture at a time.

I wanted to try this when I was doing triplanar mapping of terrain with bumps+normals… it gets quite expensive to sample 3 textures from multiple directions… and throw a similar noise scheme into the mix and it doubles all of that. In simple side + top + bottom ‘triplanar’ mapping that’s 18 texture fetches… even in the case where you throw 2 of the layers completely away.

Two reasons I never tried it: 1) I suspect it would give no real benefit in practice, 2) in triplanar mapping sometimes you mix based on the bumps that you only have when you have sampled all values together.

1 Like

Nope, without vsync


Are you still doing the pow() in your shaders? For every texture?

Yes still using the pow() but now I apply it to the final diffuseColor instead of per texture color.

My 1990s brain still has pow() wedged into it as “avoid in perf critical code” along with sqrt. But like sqrt, maybe it’s not so bad anymore.

1 Like

Changed this as you said so it reads noise just once now.

Some detailed profiler info (note vsync is disabled)

Without noise map:

With noise map:

With noise map but without below code from my other post:

float s = smoothstep(0.0, 1.0, noise.g) * 0.25;
diffuseColor.rgb = pow(diffuseColor.rgb, vec3(0.9 + s));

It looks like the effect of pow() is negligible.

I am ok with this if it is just my 2010 GPU chip, I suppose it should be fine on today’s gaming GPU beasts! :slightly_smiling_face:

1 Like

how does the player see the map? from above like in your screenshos or close up? or both?

Well, they will see the map from close but they can zoom out to some distance to see around. But when entering to map I would like to zoom in from very high like in my screenshot. Also terrain LOD will be activated during gameplay.

Different cards+drivers may also have different random bottle necks. If you haven’t already, you may want to add some settings to turn on/off different non-critical shader features. Something using #ifdef in the shaders so it’s not even compiled in if turned off.

That way if you see strange slow downs on some platform users can at least toggle things on/off until they find something that works… and maybe you learn something about why it’s slow on platform X, etc…

1 Like

Yeah, I was thinking of this as well. It can be disabled by unsetting the noise map or I can activate it just for some texture layers (e.g. grass and dirt)…

It seems there is a simple trick I can use that has a great impact!

So before reading diffuse color, I do check if color reading is required for that layer and do that only if it is needed. I do this by checking if the alpha value from alphamap is 0 for that layer or if there is an upper layer that has an alpha=1 then do not process that layer.

bool requiresDiffuseBlend(int alphaIndex, vec4 alphaBlend) {
      bool disableBlending = false;
      float minThreshold = 0.01;
      float maxThreshold = 0.99;
      if (alphaIndex == 0) { // r
          disableBlending = (alphaBlend.r <= minThreshold || alphaBlend.g >= maxThreshold || alphaBlend.b >= maxThreshold || alphaBlend.a >= maxThreshold);
      } else if (alphaIndex == 1) { // g
          disableBlending = (alphaBlend.g <= minThreshold || alphaBlend.b >= maxThreshold || alphaBlend.a >= maxThreshold);
      } else if (alphaIndex == 2) { // b
          disableBlending = (alphaBlend.b <= minThreshold || alphaBlend.a >= maxThreshold);
      } else if (alphaIndex == 3) { // a
          disableBlending = (alphaBlend.a <= minThreshold);
      return !disableBlending;

Note: that on some platforms the shader might still run all of the branches and just take the best result. I like to hope this is rare these days. Conditional branching seems to be ok for a long time now.

I don’t remember… do you have an nvidia card or ATI? Traditionally, nvidia will handle a bunch of stuff that other GPUs fall over on. So it’s worth testing on a couple different GPUs if you can.

1 Like

I am doing it like this if that is what you mean

    #ifdef DIFFUSEMAP
      if (requiresDiffuseBlend(0, alphaBlend)) { //r
          diffuseColor = textureNoTile3(m_DiffuseMap, texCoord, m_DiffuseMap_0_scale, noise, m_NoiseLayerCount, m_NoiseOffsetFactor, m_NoiseRotationFactor);
          #ifdef USE_ALPHA
            alphaBlend.r *= diffuseColor.a;
      diffuseColor *= alphaBlend.r;
    #ifdef DIFFUSEMAP_1
      if (requiresDiffuseBlend(1, alphaBlend)) { //g
          vec4 diffuseColor1 = textureNoTile3(m_DiffuseMap_1, texCoord, m_DiffuseMap_1_scale, noise, m_NoiseLayerCount_1, m_NoiseOffsetFactor_1, m_NoiseRotationFactor_1);//texture2D(m_DiffuseMap_1, texCoord * m_DiffuseMap_1_scale);
          #ifdef USE_ALPHA
            alphaBlend.g *= diffuseColor1.a;
          diffuseColor = mix( diffuseColor, diffuseColor1, alphaBlend.g );
    #ifdef DIFFUSEMAP_2
      if (requiresDiffuseBlend(2, alphaBlend)) { //b
          vec4 diffuseColor2 = textureNoTile3(m_DiffuseMap_2, texCoord, m_DiffuseMap_2_scale, noise, m_NoiseLayerCount_2, m_NoiseOffsetFactor_2, m_NoiseRotationFactor_2);
          #ifdef USE_ALPHA
            alphaBlend.b *= diffuseColor2.a;
          diffuseColor = mix( diffuseColor, diffuseColor2, alphaBlend.b );

    #ifdef DIFFUSEMAP_3
      if (requiresDiffuseBlend(3, alphaBlend)) { //a
        vec4 diffuseColor3 = textureNoTile3(m_DiffuseMap_3, texCoord, m_DiffuseMap_3_scale, noise, m_NoiseLayerCount_3, m_NoiseOffsetFactor_3, m_NoiseRotationFactor_3);
        #ifdef USE_ALPHA
          alphaBlend.a *= diffuseColor3.a;
        diffuseColor = mix( diffuseColor, diffuseColor3, alphaBlend.a );

Thanks for the hint, will test it on a different card as well, mine is ATI.

Yes, presuming that all of the #ifdefs are true then all of that code will be compiled in. On some platforms, the GPU may choose to run parts of all of the if blocks even if the if() would return false… then it will ignore the result.

1 Like