Hello fellow jMonkeys,
since it’s been some time that i wrote my blocky voxel mesh shader i figured it might be time to look over it again and i stumbled upon some questions, some of which are directly related to this shader, other are more GPU related in general.
first question:
i use the ‘Normal’-buffer for each vertex of a face i store a number representing one of the 6 directions a face can be facing, like 0 for face with normal pointing towards negative x, 1 for face with normal pointing towards positive x and so on. (that is to reduce the data that has to be sent to the GPU, i only need 1 byte instead of 3x3x4=36 bytes (3 vectors tangent, bitangent and normal with 3 floats each and floats are 4 bytes)
now i read that GPU is fast - no exact quote but quite like that - “when executing a program in parallal that is considered ‘one program’ on the GPU”, something in the meaning of when branching needs to be done, one branch might have to execute first before any calculations can be done in programs that chose the other branch
so i changed my vertex shader from
if (inNormal < 0.5) {
t = vec3(0.0,0.0,1.0);
b = vec3(0.0,1.0,0.0);
n = vec3(-1.0,0.0,0.0);
} else if (inNormal < 1.5) {
...
} else if...
TBN = mat3(t, b, n);
to
//some lookup tables
const vec3 normals[6] = vec3[6](
vec3(-1.0, 0.0, 0.0),
vec3( 1.0, 0.0, 0.0),
...
const vec3 tangents[6] = vec3[6](
vec3( 0.0, 0.0, 1.0),
vec3( 0.0, 0.0,-1.0),
...
const vec3 bitangents[6] = vec3[6](
vec3( 0.0,-1.0, 0.0),
vec3( 0.0,-1.0, 0.0),
...
float weightZP = step(32.0, inNormal);
float weightZN = step(16.0, inNormal) - weightZP;
float weightYP = step( 8.0, inNormal) - weightZP - weightZN;
float weightYN = step( 4.0, inNormal) - weightZP - weightZN - weightYP;
float weightXP = step( 2.0, inNormal) - weightZP - weightZN - weightYP - weightYN;
float weightXN = step( 1.0, inNormal) - weightZP - weightZN - weightYP - weightYN - weightXP;
vec3 t = vec3(0.0,0.0,0.0);
vec3 b = vec3(0.0,0.0,0.0);
vec3 n = vec3(0.0,0.0,0.0);
t += tangents[0] * weightXN;
b += bitangents[0] * weightXN;
n += normals[0] * weightXN;
t += tangents[1] * weightXP;
b += bitangents[1] * weightXP;
n += normals[1] * weightXP;
t += tangents[2] * weightYN;
b += bitangents[2] * weightYN;
n += normals[2] * weightYN;
t += tangents[3] * weightYP;
b += bitangents[3] * weightYP;
n += normals[3] * weightYP;
t += tangents[4] * weightZN;
b += bitangents[4] * weightZN;
n += normals[4] * weightZN;
t += tangents[5] * weightZP;
b += bitangents[5] * weightZP;
n += normals[5] * weightZP;
TBN = mat3(t, b, n);
and instead of sending 0, 1, 2, 3, 4 or 5 for the faceDirection, i send 1<<faceDirection, so 1, 2, 4, 8, 16 or 32 as byte in the Normal buffer
i did that because now every vertex shader program runs the exact same code and no branching needs to be done at all
and guess what, i got 0 difference in performance, why is that? what is concidered efficient code on the GPU?
i always read that code thats efficient on the CPU is not efficient on GPU and vice versa and now that i thought i got what that means i’m confused i see no difference in performance
btw there is millions of vertices using this vertex shader in the scene i use to test the performance difference and using the DetailedProfilerState i see that flushQueue - opaqueBucket takes by far the most time of my frame time and almost all objects in the scene use this shader
second question
i use the steepParallaxMapping that jme uses but copied that function and moved some of the calculations from the fragment shader to the vertex shader, namely
vec2 vParallaxDirection = normalize( vViewDir.xy );
// The length of this vector determines the furthest amount of displacement: (Ati's comment)
float fLength = length( vViewDir );
float fParallaxLength = sqrt( fLength * fLength - vViewDir.z * vViewDir.z ) / vViewDir.z;
// Compute the actual reverse parallax displacement vector: (Ati's comment)
vec2 vParallaxOffsetTS = vParallaxDirection * fParallaxLength;
// Need to scale the amount of displacement to account for different height ranges
// in height maps. This is controlled by an artist-editable parameter: (Ati's comment)
parallaxScale *=0.3;
vParallaxOffsetTS *= parallaxScale;
vec3 eyeDir = normalize(vViewDir).xyz;
float nMinSamples = 6.0;
float nMaxSamples = 1000.0 * parallaxScale;
float nNumSamples = mix( nMinSamples, nMaxSamples, 1.0 - eyeDir.z ); //
if im not mistaken thats basically 3 square roots that are calculated for each fragment and i was expecting a performance gain from moving that to the vertex shaders, but guess what, exactly 0 fps difference again. So how does that make sence?
third question
this one is not shader specific but still GPU related
I got a notebook with a GeForce 840M and when i run dxdiag i can see 8GB storage for that card, so i guess it uses my ram which makes me wonder: does that mean for such setups it takes no time to send for example a meshes buffers to the vram because it is shared memory (so it is already where it has to be to be accessible for the GPU) but on the other hand texture lookups and such take longer because it takes longer to get data from ram than from vram on a GPU?
last question
when exactly does a shader need to be recompiled?
i first though that is whenever a material parameter changes, but from what i get now material parameters are send to the gpu and updated when necessary but are not compiled into the shader meaning the shader does not have to recompile but the GPU has to lookup the values in vram. so does a shader only have to recompile in the case when a material parameter changes that is bound to a define or are there other cases?
EDIT: so for performance, i should bind all material parameters that i expect to never or really rarely change to defines and use these defines in the shader instead of looking up the values in vram? while for those that i expect to change somethat frequently i should look them up in vram?
EDIT: no i dont have vsync enabled and the bottleneck is not on the cpu side, as soon as i toggle parallax mapping off (which changes a define and then does not run any parallax mapping functions) fps raises from 40 to 46
Many thanks in advance and many greetings from the shire also,
samwise