Question about backface culling performance

Ali_RS · August 24, 2021, 1:34pm

Hi

A quick question! Suppose I have loaded some 3d models (sphere for example) in the scene and the camera will never go inside them. In this case, will enabling/disabling backface culling on the material make any difference in performance?

Regards

Apollo · August 24, 2021, 1:48pm

I don‘t know for sure, but my understanding is, that depth testing a triangle against the depth buffer may take some time, as well as the rendering itself. For those triangles which are culled however, these steps can be omitted. If you have very complicated fragment shaders, it sure has an impact. (E.g. for deferred rendering, you need to render the light shapes also from the inside with quite a lot of calculations going on)

pspeed · August 24, 2021, 2:10pm

If you render twice the triangles then it will take more time than rendering half the triangles.

Backface culling is very fast in the GPU rasterizer. e1 < e2 : yes, e1 > e2 : no

So with backface culling off then you render twice as much stuff… which is even worse if the backfaces end up being drawn first since you get overdraw.

Ali_RS · August 24, 2021, 2:16pm

I see. I was thinking only the face that is toward the cammer will be rendered.

pspeed · August 24, 2021, 2:17pm

Then I don’t understand the question… because that is the literal definition of backface culling.

Backface culling on = facing away not rendered. Backface culling off = facing away rendered.

Ali_RS · August 24, 2021, 2:44pm

So triangle will always be rendered twice. But what is the use case?

I was thinking back face will be rendered only if camera moves to the back side. (for example think of a quad)

oxplay2 · August 24, 2021, 2:47pm

i also thought it is like you said.

but i also think Paul mean same, but he mean not Quad(for example Leaves) but models that get their backfaces rendered means 2x more render.

so i belive there was some miscomunication.

pspeed · August 24, 2021, 3:02pm

No A triangle will always be rendered once.

But if you have a whole sphere of 1,000,000 triangles with half facing away and half facing the camera then backface culling will render 500,000 triangles and not backface culling will render 1,000,000 triangles… perhaps fully frag rendered 1,000,000 triangles depending on order.

To me, 500,000 triangles is always going to be faster than 1,000,000 triangles.

Edit: this is how the thread started:

Ali_RS · August 24, 2021, 3:08pm

It’s clear now. Thanks!

Ali_RS · August 24, 2021, 4:28pm

And by the way, curious to know besides the backface culling technique, is there some kind of Z-ordering happening on triangles on the GPU to cull the triangles away? or it would be an overkill?

Samwise · August 24, 2021, 6:03pm

Well for the same reason you cannot do perfect sorting on triangle level on the cpu you also cannot do that on the gpu. It turns out its hard to answer the question “is that triangle completly behind other already rendered objects” without asking the counter question “is any part of that triangle visible” and as long as “that triangle” doesnt have a specific size, “any part of that triangle” neither has a specific size, which means you have to ask that question recursively until at some point you are asking “is that pixel of the triangle visible” (which finally is a question that you can answer) and once you answered the question for all pixels of a triangle with “no”, then you can safely cull that triangle but at that point you already rasterized the whole triangle, so all you can do now is to not run the fragment shader.
And because all pixels are rasterized already, in case you found that some are visible, of course you dont have to render the whole triangle, instead you can still avoid having to run fragment shaders for a pixel that is occluded (which is what happens, as long as you dont change the depth of the fragment in the fragment shader. there is an extension also (not sure if it made it into core later) that allows you to specify for example that you might change the depth of the fragment but in case you do so only ever increase the depth value and never decrease it, in which case the fragment shader can still be skipped if the depthbuffer already contains a value lower than the initial fragment depth, given the depth test condition is “less” as you can also do depth tests that only pass when the fragments depth is the same as or greater than etc the depthbuffers value)
and because the fragment shader with complex lighting, textures and what not is way more taxing on the GPU than rasterizing a triangle you still get decent performance improvements

EDIT: i am sort of lying here, which means potentially you could cull a whole triangle: imagine you got a single triangle centered at the screen. now you could check all its 3 corners if their depth values are smaller than their values in the depthbuffer (you got the position of the vertices on screen and the screensize / depthbuffersize) and if you could now make sure that no pixel in the depthbuffer that is between any of the corners does have a lower value than the interpolated value of the corners, then you could cull that triangle. and you can do that when generating a mipmap chain of the depthbuffer and using the one level that uses adjacent pixels for the corner lookups (because then there is no pixels between the ones you used for the lookups). that technique is called hiz culling (or loz culling depending on which way it goes) and i actually implemented it, there is a link somewhere in the “Suggestions for 3.4” topic that i created if you want to do for the adventure. just that technique is not used to cull the triangle, instead it is used to check bounds of objects against the depthbuffer mipmap chain to cull the whole object

pspeed · August 24, 2021, 6:54pm

In a word: no.

Your mesh triangles are drawn in the order they are in the mesh.

So if you have back face culling off and the far side of the sphere happens to be drawn first then all of those are fully rendered… lighting, texture lookups, etc. and then the front facing faces are rendered on top of them essentially wasting that time.

If the front faces are drawn first then the back face spans may be aborted early if the GPU can already determine that the particular span is fully Z-buffer obscured. But more likely, the fragments themselves are aborted early because they already know the Z is behind the current z-buffer value.

Ali_RS · August 24, 2021, 7:58pm

Guys, thanks for the detailed information.