Bloom improvement: 25 fps to 110 fps on Geforce 7600 GT

Thanks for the tip about rerendering, nice catch there!  Bringing the throttle down to 1/30 makes a very noticeable trail effect (1/50 you still see it slightly, but not as bad) so I'll leave it where it is, but of course you can change that in your own app as you note.



Thanks again!

You are right, 1/30 throttle leaves a noticeable trail…

To counter that, I have came up with another solution: I need to buy a new computer and set the throttle to 1/50 :slight_smile:

heh.  well, at least you now have some bump up.

Yeah.

Thanx for the heads up on the trail for low throttle levels, I didnt notice it at first.

I just checked the code from CVS, thanx for the update!



There was also a small improvement I havent mentioned, though it was in the posted code. For intensity and blur passes it's possible to save some memory bandwidth by not clearing buffers (they get entirely overwritten, so there is not need to clear them).


tRenderer.render(fullScreenQuad, mainTexture/secondTexture, false);



The only time the render buffer needs to be cleared is when the scene is being rendered (if useCurrentScene = false).

This improvement wont show up on the basic test, but might make a difference on for high resolutions and multiple blur passes when the memory bandwidth is tight due to heavy texturing.

On 1600x1200 with renderScale factor 4, the effect textures would be 400x300, or 0.458Mb. The default number of passes is 3, so per one bloom pass, you would save 1.37Mb bandwidth, or 68.7Mb per second with throttle set to 1/50.

Doesn't seem to be much, but it scales quickly if you increase your quality settings:
On 1600x1200 with renderScale 2, you get 1.83Mb per clearing one texture. With an extra blur pass, you would have 4 total passes and 7.32Mb per pass bandwidth for clearing buffers. With 1/100 throttle that amounts to 732Mb/second.

Trying to understand why not clearing the buffer would amount to less texture data being copied?  Could you explain that?

The buffer is a piece of memory, to clear it, the hardware has to go and set everything to 0 (or set everything to the specified background color). So not clearing the buffer should save an extra memory write operation for each byte corresponding to the texture buffer.

sure, but since you are actively working with that texture I don't think that would result in data being sent across the AGP or PCIe bus.  (your bandwidth statement)

lex said:

The buffer is a piece of memory, to clear it, the hardware has to go and set everything to 0 (or set everything to the specified background color). So not clearing the buffer should save an extra memory write operation for each byte corresponding to the texture buffer.


I literally know squat about how these kinds of things work, but I have to think that any decently engineered video card/driver combination would not waste time resetting every byte in a buffer just to clear it.
sure, but since you are actively working with that texture I don't think that would result in data being sent across the AGP or PCIe bus.  (your bandwidth statement)


Warning: I will have to go into much details, sorry if it's too much.
Yes, no data transfer takes place over the agp bus. However the memory on the graphics card has to be manipulated. Graphic cards have internal bus between the GPU and the video memory. The memory badwidth of the graphics card is measured as

bus width (in bytes) * memory speed (in Mhz)
so with 128 bit bus and memory clocked at 1.4Ghz, your bandwidth would be (128/8) * 1.4 = 22.4 Gbyte/second (Geforce 7600 GT).

Any read or write from the graphics memory will consume a part of that bandwidth. When the performance of the application is limited by the bandwidth, the application is called "fill rate" limited.
In the case of fillrate limited application you can get a significant performance gain by reducing the consumption of the memory bandwidth on the graphics card. Usually by reducing texture resolution (thats why the games often have different texture quality options: high texture quality, medium and low).

These days high-end graphics cards have a very high memory bandwidth, even the cheaper (but not the cheapest) cards have 20 to 40 Gb/sec, thats why I have said the improvement was minor. All I'm saying is if it involves an overhead and can be helped by half a line of code, why not do it? :P
literally know squat about how these kinds of things work, but I have to think that any decently engineered video card/driver combination would not waste time resetting every byte in a buffer just to clear it.


Most likely the way its done is setting big chunks of memory at once with a single command. This way it doesn't load the Graphics processor, however it still consumes the same amount of video memory bandwidth as if it was set one byte at a time.

I would have to think that some bit or byte is set simply saying that the buffer is no longer valid.

Ok, so then you are reading AND writing to some other buffer that keeps a bit for every pixel of the frame buffer.
You still have to clear that other buffer somehow, and you would have to clear it every frame. Then you have to set a bit for every pixel that is written to. On top of that at the end you have read the buffer and do more processing to verify if each pixel was modified. Thats 3 memory operations plus processing...
And then you decide to draw transparent objects on top of each other and on top of the background and things get really complicated...

Literally, I could be totally wrong, but common sense would point to something a bit more lightweight than your example.

Sleep on it :P

I was thinking more along the lines of how files are handled.  When i create a file space is allocated for it.  Somewhere in the file system itself a reference to that new file is created and it holds things like the size and state of that file.  When I add data to the file, the file itself is updated and so is the file system information.  When I delete the file, the file itself stays intact nothing gets changed, in the file system data the state of the file is changed such that the space consumed by the file becomes available for a new file should it be needed.  Only if that space needs to be used is the data contained there overwritten.  This makes deletes very efficient because only a small amount of data is changed/updated.  This is also why I can un-delete a file assuming the allocated space for it has not been overwritten.  I just toggle that small bit of state data back and there's my file again.  Yes, this is a bit simplified but it is accurate as to how most file systems operate.



Why wouldn't a memory buffer behave the same way?  A buffer is created and the space for it is allocated, perhaps some extra memory (a few bytes?) is allocated at the head of the buffer to contain data like the size and state of the buffer.  When the buffer is cleared, only those few bytes at the head of the buffer are changed - 1 operation, very low overhead - to reflect the state that the buffer data is no longer valid.  Operationally this is the same effect as setting it to all 0's but without the overhead of doing so.



Is there some reason that in graphics memory a buffer would literally NEED to be reset to a zero state?  Unless there is it would seem terribly wasteful to do so until the next write to that memory area was required.

I was thinking more along the lines of how files are handled.  When i create a file space is allocated for it.  Somewhere in the file system itself a reference to that new file is created and it holds things like the size and state of that file.  When I add data to the file, the file itself is updated and so is the file system information.  When I delete the file, the file itself stays intact nothing gets changed, in the file system data the state of the file is changed such that the space consumed by the file becomes available for a new file should it be needed.  Only if that space needs to be used is the data contained there overwritten.  This makes deletes very efficient because only a small amount of data is changed/updated.  This is also why I can un-delete a file assuming the allocated space for it has not been overwritten.  I just toggle that small bit of state data back and there's my file again.  Yes, this is a bit simplified but it is accurate as to how most file systems operate.


We are not trying to organize many buffers, we are working with one buffer and for the purpose of this discussion the buffer is already allocated. We never reallocate/move or delete it. We are in the constant state of modifying the buffer.


Why wouldn't a memory buffer behave the same way?  A buffer is created and the space for it is allocated, perhaps some extra memory (a few bytes?) is allocated at the head of the buffer to contain data like the size and state of the buffer.  When the buffer is cleared, only those few bytes at the head of the buffer are changed - 1 operation, very low overhead - to reflect the state that the buffer data is no longer valid.  Operationally this is the same effect as setting it to all 0's but without the overhead of doing so.


If you simply say "the whole buffer is clear" that will imply every pixel in the buffer has the background color. Now you wrote a pixel into the buffer, and at this point the buffer is no longer "clear". But you are not going to just write one pixel, you have to write many pixels and many of them will override each other.

Having just 1 bit of data saying the buffer is "clear" or "in use" is an insufficient amount of information to describe the state of the frame buffer. How would you know which pixel is clear (has background color) and which pixel is "in use" (has a color assigned)?
The reason why 1 value works for file stems is because they are sequential: its either clear or occupied up to a byte X, and after the byte X you have random garbage.
The file system analogy with a frame buffer is like comparing apples to oranges.

Is there some reason that in graphics memory a buffer would literally NEED to be reset to a zero state?


When you render objects into the frame buffer, you are reading from and modifying the memory bound to the buffer. Many rendering operations, such as blending require the knowledge of the current color in the buffer to function properly.
After you wrote some pixels into the buffer, you might need to read some back. So you should either reset everything to the background color or keep track if each pixel was modified. And you have to keep track of that information for every pixel, resulting in some other buffer... see my previous post of what would happened if you created such other buffer.

Unless there is it would seem terribly wasteful to do so until the next write to that memory area was required.

Yes it is wasteful. On the other hand you have a full control to decide if the buffer should be cleared or not. So it's up to you not be wasteful and leave the buffer uncleared if you know everything will be overwritten.

Thanks for the explanation Lex.  Seems counter-intuitive but your explanation seems reasonable based upon how they are used.  Guess you learn something new every day.

Right now, jME's bloom pass works at about 0-1 fps on my machine. I tried making a similar setup in RenderMonkey and optimizing it, I got it to work 10 times faster and to look much better. I will try to optimize it some more and maybe post the setup later.

With all the talk about the impact of the glClear() call, I've made a simple test that makes many clearBuffer() calls and computes estimated bandwidth on the video card memory these calls are consuming.



You can increase or decrease the number of clear calls and observe that the bandwidth rate stays nearly the same, roughly close to the maximum bandwidth of your video card. The bandwidth calculation is done under the assumption you are using 32 bit RGBA color buffer and 24 bit ZBuffer, so if you are using different buffers then the calculated bandwidth will not correspond to that of your video card, however you can still observe the fact that the bandwidth stays nearly constant.



The more clearBuffer() calls made each frame, the more accurate the test is, because the video memory bandwidth becomes more of a bottleneck and CPU time and other OpenGL calls become less significant.



import java.text.DecimalFormat;
import java.util.logging.Level;
import java.util.logging.Logger;

import com.jme.app.BaseSimpleGame;
import com.jme.app.SimpleGame;
import com.jme.input.KeyBindingManager;
import com.jme.input.KeyInput;
import com.jme.renderer.Renderer;
import com.jme.scene.Text;
import com.jme.scene.shape.Quad;
import com.jme.scene.state.TextureState;

public class ClearBufferTest extends BaseSimpleGame {

   private KeyBindingManager keyboard =
      KeyBindingManager.getKeyBindingManager();
   
   private Quad sampleObject;
   private DecimalFormat format = new DecimalFormat("0.000");
   
   private int minClears = 9;
   private int extraClears = minClears;
   private int increaseAdditive = 10;
   
   private String cmdIncreaseClear = "increase";
   private String cmdDecreaseClear = "decrease";
   
   @Override
   protected void simpleInitGame() {
      display.getRenderer().enableStatistics(false);
      
      sampleObject = new Quad("Sample", 10, 10);
        sampleObject.updateRenderState();
       
        keyboard.add(cmdIncreaseClear, KeyInput.KEY_2);
        keyboard.add(cmdDecreaseClear, KeyInput.KEY_1);
       
        Text info = Text.createDefaultTextLabel( "infoKeys" );
      info.setTextureCombineMode( TextureState.REPLACE );
      info.setLocalTranslation(0, display.getHeight() - 20, 0);
      info.print("Keys: 1 = decrease, 2 = increase " +
            "the number of clearBuffer() calls.");
        fpsNode.attachChild(info);
   }
   
   @Override
   protected void update(float interpolation) {
      super.update(interpolation);
      
      /*
       * assumption is that we are using 32 bits for RGBA color buffer and
       * 24 bits for ZBuffer, total of 7 bytes per pixel
      */
      float bandwidthPerClear = display.getWidth() * display.getHeight() * 7
                                 / (1e9f);
      float bandwidthPerSecond = bandwidthPerClear * (extraClears + 1)
                              * timer.getFrameRate();
      
      updateBuffer.setLength(0);
      
      updateBuffer.append("FPS: ");
      updateBuffer.append(format.format(timer.getFrameRate()));
      
      updateBuffer.append(" Estimated clear bandwidth: ");
      updateBuffer.append(format.format(bandwidthPerSecond));
      updateBuffer.append(" GB/s.");// gigabytes per second
      
      updateBuffer.append(" clearsBuffer() calls: ");
      updateBuffer.append(extraClears + 1);
      
      fps.print(updateBuffer);
      
      simpleUpdate();
   }
   
   protected void updateInput() {
      super.updateInput();
      
      if (keyboard.isValidCommand(cmdIncreaseClear, false)) {
         extraClears += increaseAdditive;
      }
      else if (keyboard.isValidCommand(cmdDecreaseClear, false)) {
         extraClears -= increaseAdditive;
         if (extraClears < minClears) extraClears = minClears;
      }
   }
   
   protected void render(float interpolation) {
      Renderer renderer = display.getRenderer();
      
      renderer.clearBuffers();
      
      for (int i = 0; i < extraClears; i++) {
         renderer.clearBuffers();
      }
      
      renderer.draw(sampleObject);
      renderer.draw(fpsNode);
      renderer.renderQueue();
   }
   
   public static void main(String[] args) {
      Logger.getLogger("").setLevel(Level.WARNING);
      
      ClearBufferTest game = new ClearBufferTest();
      game.setDialogBehaviour(
            SimpleGame.ALWAYS_SHOW_PROPS_DIALOG);
      game.start();
   }

}