Occlusion Queries and Conditional Rendering

Good morning/evening jmonkeys,

I was recently pointed towards OpenGL compute shaders and trying to implement them i found its more work than i thought and my openGLKnowledge = null, so i started reading.
soon i read about Occlusion Queries and Conditional Rendering and found that would be a good exercise in adding some OpenGL functionality to jme.

I guess none of the changes is a breaking change (although the GL3 interface and the Renderer interface have methods added, guess nobody provides custom implementations for those though)
however it required a way to inject code right before a specific geometry is rendered and also right after it was rendered (to actually query for the correct draw calls instead of all drawcalls of a frame), thus I added 2 methods to Geometry:
notifyRenderStarted(RenderManager rm) and
notifyRenderFinished(RenderManager rm)
(might rename them to preRender() and postRender() maybe)

otherwise i would have had to add methods to the control interface (the actual way to extend a spatials behaviour) and in addition to notifying the controls when the render stage is entered we would notify again when the specific geometry is rendered and a third time when rendering was finished and for nodes it woult not work at all.
but strictly speaking i dont want to extend the functionality of SPATIALS, only the functionality of GEOMETRIES so i found the Geometry class to be a good place
inside the Geometry class those methods are empty though so if you dont want to use it its a no-op.

However a new spatial was added: OcclusionQueryGeometry (extends Geometry), which overrides those functions, injects the query and in case a query result is available (from last frame) and it says 0 fragments were drawn, instead of rendering the original mesh, renders a box mesh with Unshaded.j3md material with the same extend as the original meshes bounds so in the next frame it is checked if any fragments of the box made it to the screen and if so, render the original mesh again
note: the box material has setColorWrite and setDepthWrite set to false, they still count in the query though so you can see if any fragment WOULD have made it to the screen if it was actually drawn

Also another spatial was added: StaticConditionalRenderNode (extends Node), which just as the OcclusionQueryGeometry uses a box mesh the same size of the nodes bounds to see if any part of the node is visible and skips rendering any geometries in that node in case no fragments would have been drawn. it only works with opaque objects that stay within the bounds of the node when it was initialized
The StaticConditionalRenderNode is quite tricky though:
Usually, all geometries that are not culled are added to a geometry list, which is then sorted and finally rendered. that means there is no guarantee that all geometries of a node are rendered in a row which is needed for easier conditional rendering.
thus all geometries added to the StaticConditionalRenderNode will have their CullHints set to ALWAYS (resulting in less geometries in the geometryList and shorter sorting times, but also meaning the sorting might not be perfect -> overdraw, dependant on your scenegraph structure though).
and when the test box mesh finished rendering injects starting the conditional render, renders all geometries of that node and stops the conditional render, before the rendermanager continues rendering the geometries from the queue. if the result of drawing the test box in the last frame was 0 fragments, the conditional render and rendering all geometries is completly skipped

about the OcclusionQueryGeometry: For small and/or fast moving objects this might not be ideal because of the 1 frame delay (objects might “pop in”) and because there is not much benefit in drawing a box instead of another simple geometry, however for big and/or static geometries with many vertices this should give a performance improvement, especially if the original mesh uses heavyweight shaders (many uniforms, tessellation stage, many texture lookups, dynamic loops, high number of vertices, etc) for exmaple terrain chunks. it does NOT save drawcalls, but it replaces complex meshes / shaders with simple ones when the geometry is occluded

About the StaticConditionalRenderNode: as mentioned, it only works with static geometries because the test box mesh does not react to bounding volume changes. it is especially useful when you have several geometries close by another but also some bigger geometries around that occlude the vision. example: rooms with furnitures (could be batched but if you set the nodes bounds to fit the room you can still move around the geometries separately, still they would all be culled when you leave the room and look at it from within another room). another example is chunks with several geometries per chunk

is there any interest in adding this to the engine? i would clean it up and document it then, which i would skip ofc if i only used it for myself :smiley:

if you made it down here thanks for your attention and many greetings from the shire,
samwise

some advantages you mentioned are awesome.

but…

myself i got lack of openGL knowledge itself and i afraid we would need to see opinions from people who know openGL draw system very well.

They might say for example that it could break something else, etc.

yes ofc youre right but i had breaking changes in mind. some more information about how it works but i guess doing a commit and reading the commit would be easier

the mentioned notifyRenderStarted(RenderManager rm) and notifyRenderFinished(RenderManager rm) calls are called in RenderManager’s renderGeometry(Geometry geo) method.
as i said, in the Geometry class those methods are empty.
the OcclusionQueryGeometry overrides them, and in the notifyRenderStarted() method sets the ‘mesh’ field of the geometry to the box mesh if the fragments drawn last frame were 0, otherwise sets it to the original mesh, same for the material.
in the notifyRenderFinished() method, the mesh and material fields are reset to the original mesh so calls in code to getMesh() or getMaterial() return the original mesh / material.
this is not threadsafe ofc, however it happens only as long as they are rendered (thus are attached to the scenegraph) and should just as getLocalTranslation() etc be called from the main thread)
this is a breaking change though if you use multithreading and made use of your personal knowledge that the mesh wont ever change, and every now and then have calls to getMesh() there is the small chance you get the box mesh instead of the original mesh
on the other hand, that does not happen with normal geometry and if you decide to use OcclusionQueryGeometry you can take that into account

the StaticConditionalRenderNode sets all its childrens CullHints to ALWAYS, thus those children wont be rendered the traditional way (but via that “injection” explained above) and their controls render() method wont be called because the spatials runControlRender() method is not called. this can be overcome by calling that method on all childrens controls when the StaticConditionalRenderNodes runControlRender() method is called. so the only difference would be that you could no longer use the CullHint to switch between enabling calls to render() and disabling calls to render() for specific geometries. (i doubt anyone uses CullHints that way)
and again, when you decide to switch to StaticConditionalRenderNode you can take that into account when i document it accordingly.

i still think there are several usages for this functionality, not only for voxel based games with chunks, but i can definitly tell about a performance improvement, it really shines when big parts of the screen are occluded by few geometries like in buildings, caves, mountainious terrain, cities, etc but as always requires some knowledge or some more trial and error, just as batching and instancing for example

EDIT: i know for fragment shader there is early depth tests that try to avoid running fragment shaders, however that is quite some late stage and using occlusion queries we can avoid drawing the original mesh completly by introducing 1 frame latency for the objects to pop up on the sceen

Anyone who is using multithreading to call “get” methods thinking “because this will never change” doesn’t have any real experience multithreading and will eventually have random problems that they will never solve.

It is not ok to even READ unsafe values from multiple threads as different cores/CPUs might be handling them with their own stale cache of local memory.

I will have to read your proposal in detail but I’m off-the-cuff against any new “notify this specific thing for this specific reason” methods. It seems unnecessary.

yes i know thus i put the example of getLocalTranslation and i dont do it in my code, i was just trying to figure out potentially breaking changes because i can only guess how people use the code

in general i agree with you, but it would only be unnecessary if there was another way of achieving what I want but I cannot think of any.
Also, if its a no-op it wont ever make a difference in your FPS (I can only guess but i cannot see how that would be measurable). Especially if you dont use OcclusionQueryGeometry then it wont ever be loaded by the class loader and thus there wouldn’t be any polymorphism in calling geo.notifyRenderStarted(this) in the renderManager.

and its not notifying about a specific thing, its notifying about rendering (which the engine already does, just not detailed enough) and its not limited to Occlusion Queries and Conditional Rendering (which are great if you ask me), you could also profile the rendering of specific geometries (in terms of time spent on the GPU, fragments that made it to the screen to cull objects that are far away although in view frustum and only draw a handful of pixels) or when you have a bunch of geometries close to each other, only add 1 geometry to the renderQueue and inject rendering the other geometries after that one geometry)

I’m currently working on making the StaticConditionalRenderNode a DynamicConditionalRenderNode that accounts for BoundingVolume changes because I really like the node thing because it actually reduces the number of draw calls (it can be used in 2 ways, let the CPU decide what to occlusion-cull or let the GPU decide what to occlusion cull, because when you have many geometries saving the drawcalls can be worth reading the result from the GPU, when you have few geometries you can still do the draw calls and let the GPU skip rendering it and although doing the actual drawcalls still get an improvement over only skipping the fragment shader due to early depth test)

EDIT: Occlusion Queries require OpenGL 1.5 so thats required by the engine anyway and Conditional Rendering requires OpenGL 3.0 which is from 2008, just to have it mentioned

Untrue. The “render” part is basically instant since the draw calls are queued.

I think the base version we support is 1.2.

I’m hoping someone else can.

Calling dead methods on every spatial all the time just in case there is the rare possibility that some of them want to hook it seems wasteful to me. For example, there was a noticeable speed-up when I fixed updateLogicalState() not to call every spatial in the tree… though admittedly some of that was because no traversal either.

I’m hoping someone more familiar with the specifics of OpenGL occlusion culling can weigh in on different engine approaches that aren’t this invasive.

Untrue, you query the GPU for the time spend for commands between glBeginQuery and glEndQuery when querying for GL_TIME_ELAPSED. you can also query for more things

this might be true for glsl shading language but im sure the engine requires openGL 2.0 as thats when shaders were actually added

i dont think it will ever be noticable taking into account what else happens each frame, but maybe someone else will have a better idea for a design

Quickly reading your approach, I think it may not work in some important cases… for example, in certain material techniques you might still want to draw the mesh (for example, shadows). A specific Geometry may be handled by MANY renderers in a frame and you are only concerned with keeping the state of one of them. This will also mess up with viewports, for example.

Also, with your “draw a box instead” approach, it seems likely that in many cases every other frame will be drawn as the box is quite likely rendered when the object is only just out of view. Especially for roundish objects, a bounding box is way bigger than the object.

I think the “JME way” would probably be to hook into material techniques somehow so that certain techniques could continue to ignore the occlusion results. Maybe there is even a specific occlusion query technique. You will still have an issue with where to keep the state, though. Only down side of this approach is that the mesh is the same.

To get the mesh involved, I think you’d have to use buckets or something… and then solve the state keeping problem. It’s definitely something that would require core changes to the renderer itself, I think.

BTW, the state storage problem is similar to why we don’t have clipping. If someone can come up with a way to tag sections of the scene graph with data specific to certain viewports/renderers that also doesn’t leak, doesn’t put burden on every non-using spatial, etc… we could add 2D clipping also.

oh yea thats a valid point ill think about solutions and tell you when i find some but there must be a way of fitting it nicely into the engine

i did not mention everything in the description, but you can provide a custom mesh-simplifier to create a replacement sphere or pyramid or anything instead but youre still right, some objects will still be drawn every other frame, however from my tests this is a minority and even then its still an improvement since they are only drawn every other frame

im not sure i understand what you mean, how should 2D clipping work? do you mean like when you have your nodes organized in a grid and then set specific ones nodes to CullHint.ALWAYS just without a relation between the node and the region? so you could say "dont render region (1, 1, 1) to (10, 10, 10) regardles of the scenegraph?

For things like scroll panes in a UI, you want to clip everything outside of some 2D box… but it’s viewport specific. Right now the only way to do this is with viewports but that’s kind of inconvenient for UI things like list boxes and scroll panes.

It occurs to me that this is an entirely different problem and only similar, though. Similar because you have a piece of data you want to tag to a thing + a spatial. In this case, clipping needs to be tagged to a viewport + spatial. The user will be managing this.

In the occlusion case, you want to keep state based on renderer + spatial… and then automatically clean it up later. It’s going to be the renderer that is managing this.

And if the renderer is keeping track of this data then perhaps it’s also accessible in controlRender(). It’s hypothetical so anything works. :wink:

You’d just need a way to indicate that the information should be captured. Then it could be treated similarly to LOD at that point.

thanks for the explanation, i see now

that was the initial idea, to have a OcclusionCullingControl just like a LodControl, however there need to be calls to glBeginQuery() and glEndQuery() and all fragments that passed depth test between those calls will increase the counter that is returned when querying for the result.
and exactly because of the reason that i did not want to put a burdon performance-wise onto anyone not making a use of Occlusion Queries i decided not to add this functionality to the renderer which would do it for each geometry or to the controls (because it would require the controls to be called prior to rendering that specific geometry and after rendering it, not compatible with the current set up), instead add a no-op method to the Geometry class and only override it in the OcclusionQueryGeometry, i felt like there is no less-burden-way

ill go to sleep now but ill continue to think about what you said tomorrow, so thanks for your input already and have a nice evening

To me this implies that you’d want the object for sure rendered after all other opaque bucket geometry.

To me this also implies that you’d want a special bucket for these to make sure that was the case.

…and a special bucket could therefore have special processing in the renderer. Whatever that might be… whether rendering a special occlusion technique or just keep track of the counter.

I guess that has advantages as well as disadvantages:
in a frame where the simplified mesh is rendered, thats an advantage because it does not write to depth and color buffers anyway, and the result will be more precise than trusting the geometry sorting on the CPU (as it cannot be precise in most cases)
in a frame where the original mesh is rendered, thats an disadvantage because all opaque geometry will be rendered before that special geometry which might overdraw big areas when finally rendering it, when its close to the cam

It still sounds like this is the jme way of doing it, but looking into the code im not sure how to properly set it up, basically the renderManager manages everything (for example setting the depth ranges for the different buckets) but when adding something like this to renderViewPortQueues()

if (!rq.isQueueEmpty(Bucket.OcclusionQueries)) {
    if (prof!=null) prof.vpStep(VpStep.RenderBucket, vp, Bucket.OcclusionQueries);
    renderer.setDepthRange(0, 1);
    renderer.setOcclusionQueriesEnabled(true);
    rq.renderQueue(Bucket.OcclusionQueries, this, cam, flush);
    renderer.setOcclusionQueriesEnabled(false);
}

thats already more work in the renderer than the empty method call because it means there have to be ‘if (occQueryEnabled)’ prior to and right after each rendering (its not like with the depth range where you do these calls anyway and only change the value of the variable, instead new gl calls need to be injected)
another solution is to have some RenderListener that you can set on the renderer which will then be notified when starting and when finished rendering a geometry.
then this listener could be set to a NullListener for all Buckets other than the OcclusionQueryBucket, so there would not need to be any ‘if (…)’, not yet sure what a clean solution would look like

a couple more notes, not because i expect you to solve the problem for me but because i guess you might be interested in it yourself:

OpenGL has so called Query Objects, using them for Occlusion Queries is only one type of information you can query for. Basically you got 4 methods: begin a query, end a query, check if a query result is available to the CPU, and get the result (turns into a blocking call if the result is not available right away)
Actually Query Objects are already part of the engine (the DetailedProfiler uses them to query the GPU for the time spent for the several app steps, thus those numbers in the right column actually show the time spent on the GPU)
now because of the setup, it is guaranteed that the result of a query, that was ended with glEndQuery last frame, is available in the current frame, so if we check for the result the frame after we did the query, we never have to wait.

now the point is, I do not want that the changes would only allow for Occlusion Queries, i would rather have some interface or something added, that gives the user more control about injecting gl calls when they feel like it is the right time. another example:

Conditional Rendering: you do the same glBeginQuery and glEndQuery querying for GL_SAMPLES_PASSED that you do for occlusion queries, only you never read the result back to the CPU, you never even check if its available, instead you start a conditional render section with glBeginConditionalRender, then issue as many rendering commands (drawing commands, framebuffer clearings, framebuffer blittings (guess not used) and compute dispatches (hopefully soon used)) and finally call glEndConditionalRender and although the rendering commands are still send to the GPU, the GPU will skip doing any calculations entirely, when the condition is false (for GL_SAMPLES_PASSED thats when its == 0). those conditional renderings can be set to wait for the result or to do the rendering commands anyway if the result is not available

and another usage of more control over the gl calls is to profile your grass shaders for example, to get the time spent for rendering, to check the number of primitives that made it to the geometry shader or that were written by the geometry shader to see result when playing around with LOD on the GPU, any stuff like that

so the changes should not be limited to Occlusion Queries, instead i’m trying to find a way to inject custom code where i want, the OcclusionQueryGeometry and ConditionalRenderNode could be put into external projects, they are basically an example of what you can do with the changes, maybe the most useful one, but as mentioned there are other usages

some new findings / ideas:
not only is the state dependant on the viewport, or the camera used for rendering, but also the “invocation index”.
for example if you specify more than 1 shadowmap, each geometry is rendered several times in the same viewport with the same camera even

also, the PreShadow technique forces depth writes, which means that even for the simplified versions that have their materials additional render state set to writeDepth false shadows would be calculated

another note about the Query Objects as they are currently implemented:
it probably doesnt matter much since they are only used in the DetailedProfiler which is not meant for final versions but since they are OpenGL Objects just as Buffers etc, we need to delete them when they are not used anymore.
and in opposite to Buffers and Images that have classes extending NativeObject (which the renderers NativeObjectManager manages), there is no such class for the QueryObjects, neither a method added to the interface to delete them

i still think calling an empty method on the geometries is a no-op and taking into account what else happens per frame for each geometry would definitely be negligible and thus the best place to add the functionality.
here is why i think so:

  • if the renderer was to care about it, the renderer would have to check it for each geometry and regardless of how fast the check is, its slower than an empty method

  • it extends what we already have an interface for: the controls. only since controls are for spatials and nodes are spatials too, their render method is not called before they are actually rendered (nodes are not directly rendered and the geometries might be rendered in any order, independant of their parent node), but when the render stage is entered.

if we had a special kind of control that we could explicitly add to Geometries, we could be notified right before the geometry is actually rendered (like a preRender(RenderManager rm, Camera cam)), and right after rendering has finished (say postRender(RenderManager rm, Camera cam)).
although the runControlsRender() method is probably pretty fast i guess its better to not have such method for the preRender and postRender since that would be more than a no-op aready, so there would be a single new Geometry class, that would override the 2 empty methods of Geometry class, add some special treatment for the new Control Interface and call the added controls methods inside the previously empty methods.
That means a user doesnt have to extend geometry, instead they can add those Controls to inject specific code

  • also, as people tend to underline always: a geometry is not a gameObject, thus i dont think its strange to give it some control over the render calls that will happen “to” this geometry, the geometry basically is only there to tie together pieces needed for rendering, why not give the user a little control over it using controls. then we could have a thing like OcclusionCullableControl or ConditionalRenderControl or DetailedProfilerControl that could be added to geometries

i added a QueryObject class and some constants and methods to the GL classes as well as methods to the Renderer interface to handle them, it would be pretty simple and quite clean if you write one of those special controls, have your QueryObject of some specified type like SAMPLES_PASSED or TIME_ELAPSED that in preRender you can ask the renderer to start the query and in postRender you can ask it to stop it, you can directly use the QueryObject to see if the result is available and get the result

if anyone has any thoughts, maybe even someone who has experience with OpenGL or maybe used another engine that supports that kind of thing, please chime in
on the other hand, if someone is strictly against adding it in any way, also tell me, i would stop looking into it then

I use occlision queries and conditional rendering in jme, but i hacked it inside the engine, i think in the same way you are proposing… the problem is that the core of the renderer in jme is not really easy to extend with new features and to have this properly implemented you need to take into account a lot of different cases.

I don’t have the time to read the whole thread, but if you can make quick summary of your final proposal i will take a look at it more carefully.

thanks for your answer!

the (currently) final idea is to add the needed object query related methods and constants to GL and Renderer interfaces and additionally make these changes:

add 2 methods to Geometry.java:

public void preRender(RenderManager rm, Camera cam) {} //its empty

and

public void postRender(RenderManager rm, Camera cam) {} //empty too

in the RenderManagers renderGeometry(Geometry geo) method the 2 corresponding calls are added:

//right at the beginning of the method
geo.preRender(this, prevCam);

//and at the very end of this method
geo.postRender(this, prevCam);

thats is with the changes definitely needed for my approach.
additionally to keep users from extending Geometry, there would need to be 1 special Geometry class added (lets call it AdvancedGeometry) and 1 special interface added that extends Control interface (lets call it AdvancedControl) which would in addition to render() and update() have preRender() and postRender() (those refer to render the geometry opposed to render() which means the render stage is entered basically).
The AdvancedGeometry adds a private field SafeArrayList that keeps track of the AdvancedControls added and overrides the 2 newly added methods of Geometry class to call all added AdvancedControls 2 corresponding methods, also in addControl, removeControl etc, handles the new controls accordingly.

EDIT: also how exactly the methods added to Renderer and GL interfaces would look like depends on if we add QueryObject class or leave treating them as is, although i would really favor for adding the QueryObject class as currently those objects are not deleted from GPU.
methods for contidional rendering would be nice too, but also only 2 methods needed and could work with the QueryObject instead of the ‘taskId’ currently identifying a queryObject

EDIT2: this would leave it to the implementation of the specific AdvancedControl to care about several renders per frame but you would be given everything you need.

if you want to sum up your approach i can think about it and maybe find a better solution than my current approach, although i guess it currently is as light-weight as can be in terms of changes

OK, so your plan is to perform the occlusion query with the simplified representation (boundingbox/lod) right before each geometry is rendered and proceed with the rendering only if not fully occluded.

Mine is a bit different, since i do the occlusion query long before the actual rendering, but yours is fine and much simpler to implement with the current renderer. However… how will you wait for the query result? It would be easy to read it between frames, but this might be a problem since afaik the engine doesn’t force one geometry to be rendered only once per frame.

as the specific controls wouldn’t be neccessary additions i did not explain them in detail now, let me add some points:

my current “OcclusionCullingControl extends AdvancedControl” implementation can be toggled between working for each camera and only work for specific cameras, ignoring others (so you can just enable it for the main cam, leave is disabled for shadow cam or reflection cam), it still takes into account the number of times the geometry is rendered with that same cam (for the shadow maps case) and also has the capability to override the PreShadow techniques renderstate to depthWrite false prior to rendering the simplified version and reset it back afterwards.
you can manually set a “MeshSimplifier” that produces a simplified version of the mesh (the default implementation uses the bounding volume, yes, lod if available is awesome, too, but you can provide any implementation).
and the result of a query is available in the next frame, thus you only need 2 queries and switch between them (introduces that 1 frame lagging behind where objects might first be visible in the second frame, this meant there is no additional render call, you render either or but never both. i guess you first render all your testmeshes, then render some different stuff and then do a conditional render based on the queries you did at the beginning of the frame?)

edit: once a good solution was found i would like to work on compute shaders and i guess it would be nice to have control over when to dispatch them as well which could be done with the AdvancedControl