A quick question! Suppose I have loaded some 3d models (sphere for example) in the scene and the camera will never go inside them. In this case, will enabling/disabling backface culling on the material make any difference in performance?
I don‘t know for sure, but my understanding is, that depth testing a triangle against the depth buffer may take some time, as well as the rendering itself. For those triangles which are culled however, these steps can be omitted. If you have very complicated fragment shaders, it sure has an impact. (E.g. for deferred rendering, you need to render the light shapes also from the inside with quite a lot of calculations going on)
But if you have a whole sphere of 1,000,000 triangles with half facing away and half facing the camera then backface culling will render 500,000 triangles and not backface culling will render 1,000,000 triangles… perhaps fully frag rendered 1,000,000 triangles depending on order.
To me, 500,000 triangles is always going to be faster than 1,000,000 triangles.
Well for the same reason you cannot do perfect sorting on triangle level on the cpu you also cannot do that on the gpu. It turns out its hard to answer the question “is that triangle completly behind other already rendered objects” without asking the counter question “is any part of that triangle visible” and as long as “that triangle” doesnt have a specific size, “any part of that triangle” neither has a specific size, which means you have to ask that question recursively until at some point you are asking “is that pixel of the triangle visible” (which finally is a question that you can answer) and once you answered the question for all pixels of a triangle with “no”, then you can safely cull that triangle but at that point you already rasterized the whole triangle, so all you can do now is to not run the fragment shader.
And because all pixels are rasterized already, in case you found that some are visible, of course you dont have to render the whole triangle, instead you can still avoid having to run fragment shaders for a pixel that is occluded (which is what happens, as long as you dont change the depth of the fragment in the fragment shader. there is an extension also (not sure if it made it into core later) that allows you to specify for example that you might change the depth of the fragment but in case you do so only ever increase the depth value and never decrease it, in which case the fragment shader can still be skipped if the depthbuffer already contains a value lower than the initial fragment depth, given the depth test condition is “less” as you can also do depth tests that only pass when the fragments depth is the same as or greater than etc the depthbuffers value)
and because the fragment shader with complex lighting, textures and what not is way more taxing on the GPU than rasterizing a triangle you still get decent performance improvements
EDIT: i am sort of lying here, which means potentially you could cull a whole triangle: imagine you got a single triangle centered at the screen. now you could check all its 3 corners if their depth values are smaller than their values in the depthbuffer (you got the position of the vertices on screen and the screensize / depthbuffersize) and if you could now make sure that no pixel in the depthbuffer that is between any of the corners does have a lower value than the interpolated value of the corners, then you could cull that triangle. and you can do that when generating a mipmap chain of the depthbuffer and using the one level that uses adjacent pixels for the corner lookups (because then there is no pixels between the ones you used for the lookups). that technique is called hiz culling (or loz culling depending on which way it goes) and i actually implemented it, there is a link somewhere in the “Suggestions for 3.4” topic that i created if you want to do for the adventure. just that technique is not used to cull the triangle, instead it is used to check bounds of objects against the depthbuffer mipmap chain to cull the whole object
Your mesh triangles are drawn in the order they are in the mesh.
So if you have back face culling off and the far side of the sphere happens to be drawn first then all of those are fully rendered… lighting, texture lookups, etc. and then the front facing faces are rendered on top of them essentially wasting that time.
If the front faces are drawn first then the back face spans may be aborted early if the GPU can already determine that the particular span is fully Z-buffer obscured. But more likely, the fragments themselves are aborted early because they already know the Z is behind the current z-buffer value.