Hello everyone,
I’ve got a fairly strange performance problem when discarding fragments in a PBRLighting fork. The geometry with which this problem is occurring is made up of a stack of 16 large quads in close proximity. I’m discarding fragments based on a displacement map to simulate the height of the material, much like how fur is done.
When the camera is very close to the geometry, rendering it takes roughly 0.5ms, but when the camera gets further away, it’s taking around 20ms (the geometry still takes up a good portion of screen space). When I comment out the discard statement in the shader, I get from 0.15ms to 1ms.
No lights, probes, or other meshes are present in the scene. My shader is set up to do gbuffer writes, but that is currently disabled.
#import "Common/ShaderLib/GLSLCompat.glsllib"
#if defined(DIFFUSE_GBUFFER) && defined(NORMALS_GBUFFER)
#define GBUFFER_WRITE 1
#endif
// enable apis and import PBRLightingUtils
#define ENABLE_PBRLightingUtils_getWorldPosition 1
//#define ENABLE_PBRLightingUtils_getLocalPosition 1
#define ENABLE_PBRLightingUtils_getWorldNormal 1
#define ENABLE_PBRLightingUtils_getWorldTangent 1
#define ENABLE_PBRLightingUtils_getTexCoord 1
#define ENABLE_PBRLightingUtils_readPBRSurface 1
#ifndef GBUFFER_WRITE
#define ENABLE_PBRLightingUtils_computeDirectLightContribution 1
#define ENABLE_PBRLightingUtils_computeProbesContribution 1
#endif
#import "Common/ShaderLib/module/pbrlighting/PBRLightingUtils.glsllib"
#import "RenthylPlus/ShaderLib/GBuffers/PBRCompactModel.glsllib"
#ifdef DEBUG_VALUES_MODE
uniform int m_DebugValuesMode;
#endif
uniform vec3 g_CameraPosition;
#ifdef USE_FOG
#import "Common/ShaderLib/MaterialFog.glsllib"
#endif
uniform sampler2D m_DisplacementMap;
uniform vec2 m_DisplacementRange;
uniform int m_NumSlices;
uniform float m_StackHeight;
varying float sliceLayer;
float mapRange(float value, float fromMin, float fromMax) {
return (value - fromMin) / (fromMax - fromMin);
}
#ifndef GBUFFER_WRITE
uniform vec4 g_LightData[NB_LIGHTS];
void computeLighting(inout PBRSurface surface) {
// Calculate necessary variables from pbr surface prior to applying lighting. Ensure all texture/param reading and blending occurrs prior to this being called!
//PBRLightingUtils_calculatePreLightingValues(surface);
// Calculate direct lights
for (int i = 0; i < NB_LIGHTS; i += 3) {
vec4 lightData0 = g_LightData[i];
vec4 lightData1 = g_LightData[i + 1];
vec4 lightData2 = g_LightData[i + 2];
PBRLightingUtils_computeDirectLightContribution(
lightData0, lightData1, lightData2,
surface
);
}
// Calculate env probes
PBRLightingUtils_computeProbesContribution(surface);
// Put it all together
gl_FragColor.rgb = vec3(0.0);
gl_FragColor.rgb += surface.bakedLightContribution;
gl_FragColor.rgb += surface.directLightContribution;
gl_FragColor.rgb += surface.envLightContribution;
gl_FragColor.rgb += surface.emission;
gl_FragColor.a = surface.alpha; // this line seems to cost about 4ms
#ifdef USE_FOG
gl_FragColor = MaterialFog_calculateFogColor(vec4(gl_FragColor));
#endif
//outputs the final value of the selected layer as a color for debug purposes.
#ifdef DEBUG_VALUES_MODE
gl_FragColor = PBRLightingUtils_getColorOutputForDebugMode(m_DebugValuesMode, vec4(gl_FragColor.rgba), surface);
#endif
}
#endif
void main() {
// discard layer fragments
float height = texture2D(m_DisplacementMap, texCoord).r;
height = mapRange(height, m_DisplacementRange.x, m_DisplacementRange.y);
if (sliceLayer >= 0.0 && sliceLayer > height) {
discard; // this line seems to cost about 14ms
}
vec3 wpos = PBRLightingUtils_getWorldPosition();
vec3 worldViewDir = normalize(g_CameraPosition - wpos);
// Create a blank PBRSurface.
PBRSurface surface = PBRLightingUtils_createPBRSurface(worldViewDir);
// Read surface data from standard PBR matParams. (note: matParams are declared in 'PBRLighting.j3md' and initialized as uniforms in 'PBRLightingUtils.glsllib')
PBRLightingUtils_readPBRSurface(surface);
#ifdef GBUFFER_WRITE
GBufferWrite_writeSurfaceToGBuffers(surface);
#else
computeLighting(surface);
#endif
// visualize top and bottom layers for tuning displacement range
#ifdef LAYER_USAGE_DEBUG
if (sliceLayer >= 1.0) {
gl_FragColor = vec4(0.0, 1.0, 0.0, 1.0);
} else if (sliceLayer <= 0.0) {
gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0);
}
#endif
}
It’s also worth noting that commenting out gl_FragColor.a = surface.alpha
gives back roughly 4ms with discarding enabled, even though surface.alpha
is always 1.0. With both discarding and alpha writing disabled I get from 0.1ms to 0.8ms.
Edit: rendering only 1 quad rather than 16 makes the problem less noticeable with discarding and alpha writing enabled (only 0.15ms to 1.9ms), but there is still too much rise in rendering time. Disabling both discarding and alpha write with only 1 quad gives roughly the same time as with 16 quads (0.1ms to 0.8ms).
Edit: I swapped out my custom material for the standard PBRLighting material with the same material parameters (where possible), and I get a constant 0.09ms render time with one quad, and 1.5ms with 16 quads, regardless of the camera distance.