Writable buffer in fragment shader

codex · October 29, 2024, 9:07pm

How would one add a writable buffer to a fragment shader, and be able to read the buffer on the cpu afterward? I am trying to implement occlusion culling, and the algorithm requires writing a ‘1’ to a buffer indexed by object id if the current fragment is visible. The cpu can then refer to the buffer to determine if the geometry should be rendered later.

buffer visibilityBuffer {
    int visibility[];
};
void main() {
    visibility[objectId] = 1;
}

Sample code:

github.com

nvpro-samples/gl_occlusion_culling/blob/master/cull-raster.frag.glsl

/*
 * Copyright (c) 2014-2022, NVIDIA CORPORATION.  All rights reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *
 * SPDX-FileCopyrightText: Copyright (c) 2014-2022 NVIDIA CORPORATION
 * SPDX-License-Identifier: Apache-2.0
 */

This file has been truncated. show original

zzuegg · October 29, 2024, 10:01pm

You have to use a shader storage buffer object if you want to write to a buffer.
I have not yet played with jme’s ubo/ssbo, so i cannot give jme related tips.

Reading back the buffer is a possible blocking operation, and it’s up to you to synchronize data access on the buffer. (i think usually done with a mapped buffer) Buffer Object Streaming - OpenGL Wiki

Another solution would be to use the query object and check for any_samples_passed
Query Object - OpenGL Wiki , and then follow up by some conditional rendering operation to avoid a blocking operation.

or you can write draw commands to a buffer and then use drawIndirect to exeucte those without having to read them back to the cpu.

The latter options of course work only if you do not need the data on the cpu.

Note, that the resource you posted, used multi draw indirect, and without nv_bindless you are limited to one set of textures. So at first glance i see lots of features used that are not (yet) supported by jme

pspeed · October 29, 2024, 10:20pm

OP, I think you are a more experienced developer so probably you are on the right track with whatever you are trying to do.

…but I just wanted to mention that “occlusion queries” are a common thing some devs think they need when there are other ways to accomplish their larger goals (or occlusion queries will not actually solve those goals). They are useful in the right use-case, to be sure… but they come with enough caveats and down sides that those use-cases are very narrow.

“I need the data back on the CPU”… is an indicator that “This may not do what you want”… because you may have to waste effectively 2-3 frames just getting that data back.

So I would be interested in knowing what the end goal is. Maybe it still fits but it might still be useful to know.

codex · October 30, 2024, 12:07am

Thanks for the resources. Query object looks especially interesting. Unfortunately though, I haven’t seen anything directly related to SSBOs in jme. The closest thing maybe being FrameBufferBufferTarget, but oddly it doesn’t seem related to buffers at all.

That is concerning. I think this approach might still work, as long as I can get SSBOs to work.

codex · October 30, 2024, 12:23am

In this case, I’m just interested in occlusion culling for the sake of occlusion culling. I thought it’d be a relatively easy feature to work on as a sort of break from banging my head against some other things.

You could be right, I’ll have to see what sort of performance I get. There are probably ways to keep everything on the gpu, but I’m not certain at this stage how useful it will be to also have that information on the cpu.

pspeed · October 30, 2024, 2:13am

I think for occlusion queries (and experts can correct me if I’m wrong), you need to not strongly care about frame coherence… in the sense that you may only get the occlusion results on some future frame. So it’s good for giant occluding buildings and stuff but not for “every object in the game”.

…and then there are other scene-graph side tricks that might be used in that case. Or just tailored scene/cull hint management based on zone, etc…

zzuegg · October 30, 2024, 7:22am

I think that the ubo/ssbo are setup as material parameters. There is a test included in the 3.7 release. Not at the pc at the moment so i cant check

Always good to play around things and learn from them. Unfortunately it looks like an easier problem to solve then it is. It comes down to what the current bottleneck is, and how much you can save by paying the cost of the additional testing workload. So it is going to be highly scene/game dependent in any case.

As a general hint, gpus do not like to be queried. And as paul stated, current frame queries (back to the cpu) are a big No-No. If you test things like this, make sure you have vsync off, because certain gpu stalls can be masked easily if your scene is not demanding enough.