Occlusion Queries and Conditional Rendering

For example i can render the same geometry on two different framebuffers with the same camera what will your code do in this case?

i got no explicit testcase for it but i guess its the same as with multiple shadowmaps, because the implementation doesnt take the framebuffer written to into account:
there is a separate state for each rendering of a single frame, thus there would be 4 query objects in total for your 2 render calls that happen each frame.
however i could add support to setup some scenario where you know that if its occluded in one render call it will be occluded in another one that happens in the same frame, too. but im not sure about how far the build in implementation should go then, because the user can always provide a custom implementation that extends AdvancedControl.
Actually, maybe i could change it to only do the test in the first render of each camera within a frame, and use that result for all following renders with the same cam within the same frame (so like when the geometry is occluded in the highest resolution shadowmap, there is no need to render it in the lower resolution shadowmaps at all )
this might have to be optional though or specifically for the shadowmaps, because in other cases you might want to render the geometry several times per frame from the same camera but in another scene, meaning it could be occluded in one scene but visible in another (is that possible actually?)

The problem is that the engine doesn’t enforce the way you render the scene, one could call RenderManager.renderScene (or any other render method). Do you think it would be possible to implement occlusion culling entirely as a scene processor? And do it specifically only on the scene nodes of the viewport to wich it is attached? At this point we could say that this processor requires specific constraints to the scene and you know that you can’t just pick a node and render it outside of the viewport.

The queries can be stored into a weak map on the processor.

i dont know how that would be possible, we need a way to start a query before a specific geometry is rendered, and end the query after the render command was issued and with what the engine currently has there is no way to do that (at least i dont know any).

Right, i was still thinking about my implementation where the depth is rendered before by itself…
Maybe we can have a toggle in the viewport? And enable occlusion culling only for RenderManager.renderScene ? What do you think about this?

im not sure i can follow that approach:
you suggest like a boolean field in viewport class so you can toggle occlusion culling on and off,
and in renderManagers method (i guess you mean renderManager.renderViewPort() method) we could check for that boolean and pass it all the way from flushQueue to renderGeometry and then inject occlusion query calls for every geometry in the viewport?
that would be a solution but im not sure doing occlusion queries for all geometries is what we want: small geometries / those with lightweight shaders or fast moving objects are no good candidates, its rather useful for bigger geometries or ones with heavyweight shaders, many vertices etc
or would the usage then look like create a post viewPort, enable occlusion culling, set its output buffer to be the same as the main viewports framebuffer, and add the geometries you want to be occlusion cullable to a new rootNode that you set as the scene of the occlusion culling enabled viewport?
so the occlusion queries would be built into the engine, meaning i would have to take other query types and conditional rendering into account while doing that also because there is still no way for the user to inject code

another approach is to add a RenderListener interface and methods to set a RenderListener on a Renderer object so this listener can be notified upon specific tasks during the render stage (eg when a new viewport has started to be rendered, when specific buckets are processed, when a specific geometry is rendered, has finished rendering, etc) and give the user a tool to extend the renderer that way. i personally better like the AdvancedControl approach though, but i cannot believe there is no really nice way to fit it into the engine

EDIT: what about the in-pass-shadows? doesnt it do a depth render at the beginning of the frame? maybe if that would make it to core (there is that branch that is said to need testing and might make sence to include in 3.3), that would probably give a place to put the occlusion queries and later do conditional renderings based on those queries

Most of those issues could be fixed by having a programmable pipeline in jme, but since we don’t have it atm, my idea was to find a way to associate the result of the render query for each geometry to the viewport, since the viewport is the thing that contains camera,scene and render target in jme and all those things must remain constant for the occlusion culling to work, i’m not sure how this should be implemented, but i think you are right when you say that there should be a way to filter out some geometries and i add that there should be also a way to specify the shader to use for the occlusion test.

I’m still not sure why something couldn’t be done with buckets, techniques, etc… and having the render manager keep the appropriate state based on the context within which it is run. To me, the “keeping the state” part is the trickiest bit but not impossible.

controlRender() could ask the RenderManager for this state and decide to do things… like the LOD control does. The state could even be made available in something like the global uniforms if the shader wants it.

The new occlusion query geometry list would already know that every geometry in there should be processed for occlusion culling and the state kept. It doesn’t have to call a set of new methods on Geometry to do it.

In the suggested implementation, the rendering happens in this way: (pseudo code)

foreach(geometry){

    // occlusion query
    currentFrameBuffer.disableDepthWrite()
    occlusionQuery.start()
    currentFrameBuffer.drawDepth(geometry.boundingbox)
    occlusionQuery.end()
    result=occlusionQuery.getResult()
    // end occlusion query
   
     // conditional rendering
    if(result==GL_ANY_SAMPLES_PASSED){
        currentFrameBuffer.enableDepthWrite()
        currentFrameBuffer.drawColorsAndDepth(geometry.mesh)
    }else{
        // skip
    }
    // end conditional rendering

}

Basically we have to “draw” a placeholder geometry right before the actual geometry is drawn and we need to do this for every geometry that needs to have occlusion culling, occlusionQuery.getResult() is not available immediately so we need to either wait for it or read it the next frame.

There are other ways we can do it (eg have a dedicated depth pre pass), but the one proposed in this thread is probably the cheapest and most suited alternative for the current structure of the engine.

About that, PR 1078 introduces QueryObject to handle queries.
It also introduces BufferObject, which can then be utilized to implement conditional rendering (to implement Query Buffer Object).

1 Like

In your pseudo-code example…

If the foreach(geometry) is a GeometryList from a new OcclusionSensitive (whatever) bucket…
And if instead of drawing the geometry.boundingbox you draw its mesh…
And instead of using the result right away, you tuck it away based on the current ‘context’.
Then provide a method on RenderManager so that controlRender() could swap out the mesh with a placeholder if the object was invisible last frame.

…that’s what I’m suggesting.

As I understand it, the disableDepthWrite()/enableDepthWrite() is not a necessary part of occlusion queries, it’s just the usual optimization when you are always using a separate placeholder. This too could be up to controlRender.

I think probably also we could provide the information in a global uniform for the shader, too… so the material could just draw flat shaded, for example.

The benefit if my approach is that it doesn’t require adding a bunch of extra apparatus to Geometry that will mostly never be used. It also can work for multiple viewports, etc…

We need to draw the occlusion test geometry on top of the real framebuffer to see if it pass the test and only then fill in with the real depth data and then repeat for the next geometry, it seems to me that what you are proposing is to have an occlusion test only pass before the geometries are rendered, is this correct?

In the proposed approach that started this thread, it was up to the application layer to decide whether the geometry was placeholder or not and there was a frame delay. That’s the approach I was taking in my post.

Render your regular mesh until its occluded… then render a placeholder until it isn’t. Lose one frame of object in the process as the original approach was doing.

Ok, but if we use buckets we would be testing the geometry against all the other geometries in the Occlusion bucket, while we need to test one geometry in the occlusion bucket against all the geometries in the opaque bucket minus itself. So that we aren’t testing (for example) boundinboxes against all other boundingboxes, but one boundingbox against the real geometries so that we render always more than what is visible (remember the query will pass if at least one fragment of the boundingbox is visible) and not less.

I’m just following the original approach.

The control that’s querying the last frame’s occlusion information can decide to turn off depth write if it wants to. Maybe it’s not using bounding boxes but stand-in low LOD geometry and it’s fine for occluded geometry to occlude other geometry.

ok well its right, the initial approach was to render either the original mesh or the simplified mesh (with simplified material), based on the occlusion result of the last frame.
this was the approach that i liked most because in the documentation of Renderer interface stopProfilingTask it was stated that the query result will be available to the CPU in the next frame and thus would not cause stalls.
although this is true, querying for the result even if available already implicitly flushes the queue and i realized that it would in fact cause stalls if we issued some number of render commands before querying for the result (although that result is available).
so assuming the 2 frame solution is the way to go, the solution could include querying for the results of the last frame as early as possible in the current frame.
for the actual queries however its rather the opposite: the more stuff has been drawn already, the more likely the new geometry will be occluded (except if you heavily sort for distance only instead of material, then there wont be too much of an advantage in rendering it later)
then again, for conditional rendering (which is based on the same queries) the result might not even be available to the GPU right away because it might not have finished rendering it, thus rendering the geometries that use queries should be done some time before the conditional renderings.
if using the queries is built into the engine and not only a standalone tool that the user can use to inject whenever they want, then it could be built in to check query results at the beginning of a frame, then render the viewports and thus their buckets and then have the occlusion queries using bucket after the opaqueBucket and it would not stall the CPU because the rendermanager collected the results at the beginning of the frame already?
and then have another bucket at the very end for the conditional renderings, so the user can decide to use the occlusion queries to read the result to the CPU and do the 2 frame approach or rather use the occlusion queries in the same frame several buckets later to ensure the results will be available?
about the context, what would define a specific context? would it be a combination of viewport + framebuffer + camera? or is that not a “unique” context?

There is no guarantee that the result will be available even several buckets later.

I have a different proposal that still uses scene processors. We can add two methods beforeGeometry and afterGeometry that are called from RenderManager.renderViewPortQueues(ViewPort,boolean) (we can get the processors from the viewport, so we don’t even need to change the method signature).

beforeGeometry will normally return true if the geometry can be rendered or false if the geometry must be skipped.

This is all we’d have to change in the core.

In our processor, we will have access to ViewPort, Framebuffer, Bucket and Geometry, and so we will be able to easily test the occlusion culling against the framebuffer and also use different logics for different buckets (eg. skip the transparent and sky buckets).

To filter which geometry to test we can use a control as you proposed, or some sort of tagging (eg. userdata or a generic TagControl) and we can decide that the occlusion test will use the last lod level or, if no lod is available, the boundingbox.

The result of the query can be cached into a weak hash map inside the processor, and updated only when a new result is available.

The pros of this approach are:

  • we keep all the code outside of the core
  • the culling logic can be modified by just extending the processor
  • all the requirements are automatically met, since processors are unique and tied to the viewport
  • this can also potentially open new possibilities for different types of culling, eg. instances culling with transform feedback, that would still need some way to inject before and after the rendering call for each geometry.
  • The only required user interraction will be to add the OcclusionTestProcessor to the viewport

We can define the new methods as default methods in the interface to avoid breaking existing code.

not available to the CPU maybe, but available to the GPU wont it? i dont mean at the moment we put the conditional rendering into the queue but at the moment the GPU would actually process it?
it was more meant like an option:
you can either do the CPU based approach with the 1 frame delay which only uses a single draw call because you render either the original mesh/material or the simplified versions based on last frames query result
or you can do the queries, let the GPU render some other things, issue the rendering commands wrapped in a conditional render based on the queries from the beginning of the frame and do not have to read the result back to the CPU, thus dont have the 1 frame delay, but instead use 2 drawcalls per geometry, 1 to do the check and the second one in the conditional rendering (which the GPU might skip, but will still be issued from the CPU side)

I like the SceneProcessor approach because its somewhat similar to mine in that it would give some access to the user to do stuff prior to and right after rendering a speficic geometry and would thus allow for more use cases
however im not sure how to notify in RenderManager.renderViewPortQueues(ViewPort, boolean), the geometry specific calls are indirectly delegated to the RenderQueue.renderGeometryList(GeometryList, RenderManager, Camera, boolean) method (which is private and could be changed but RenderQueue.renderQueue(Bucket, RenderManager, Camera, boolean) is public) and im not sure iterating over all sceneProcessors for each geometry we want to render is worth it compared to how often it is used

maybe a combination of both, introduce a new bucket and call the new SceneProcessor methods only for geometries put into that bucket?

What about using a new TechniqueLogic and putting before & after rendering logic there just as @pspeed mentions?

I’m not sure if i understood correctly this proposal of using a bucket, but the scene processor has a method postQueue that is called before every bucket is rendered, if the goal is to test the occlusion before everything then this is what i would use. But in order to work correctly it would require the occlusion test meshes to be close to the actual mesh shape, because your depth buffer will be made out of only occlusion test meshes. It will also still require special logic for the transparent bucket, it will have to be skipped or tested only against the opaque bucket (since transparent geometries can’t occlude each other).

RenderQueue.renderQueue can be overloaded to receive an array of scene processors, so that;s not a big problem.

You are right, we would have to iterate over the scene processors everytime, but even with the control proposal you would have to iterate over the controls, right? You probably have more controls than scene processors.