Project RealSlimShader (Single Pass Lighting and Multi Pass Parallel Lighting)

@atomix said:
This test should be in the show case of JME3 ! 8)
as it run in 25fps, I consider the performance can be quite playable... Can you tell what parameter will affect the performance here (too lazy to read the article), as I just need the effect in specific scene...


but wouldn't the other stuff require there own investment in frame rate sapping resources

The 25 or 50 FPS are misleading. I used maximum AA to make a good video and FRAPS clamps the FPS to a multiple of the recording FPS (25 in my case). Itā€™s not as bad as it seems.



The performance hitting parameters are:

  • ParallaxShadows (bool): Toggles approximate soft shadows on or off. Obviously, if itā€™s off it canā€™t hurt. If itā€™s on, it reduces the FPS by a constant amount (in a static scene) since itā€™s basically 8 additional height map lookups per fragment.
  • SteepParallax (bool): Toggles between POM and classic parallax mapping (not steep!).
  • ParallaxHeight (float): The maximum displacement height influences the (maximum) number of steps in the linear search for an intersection with the height field profile. I use the same number of steps and scale as SteepParallax of ā€œLighting.j3mdā€.



    If desired, I can create a patch to parallax.glsllib that adds POM and shadows in a compatible way. But I think the fake shadows are not the way to go. Iā€™m currently investigating a Silhouette POM algorithm that corrects the depth buffer so it can be correctly used with shadow mapping.
1 Like
@survivor said:
You can see the LoD transition. Maybe one could scale the LoD threshold by angle.


I wonder can the depth of brick can be decrease to gain some frame rate, And yes, in some cut-scenes, the effect can still be good even if it can't work with physics or AI at the same time thought.
@atomix said:
I wonder can the depth of brick can be decrease to gain some frame rate, And yes, in some cut-scenes, the effect can still be good even if it can't work with physics or AI at the same time thought.
Yes, that's the parameter ParallaxHeight. I has a strong impact on the frame rate since it influences the number of iterations in the fragment shader.
1 Like

This looks great. Iā€™m using this in the Forester lib now.

2 Likes

Nice! Iā€™m looking forward to use the Forester in a polished version of my ball maze game.

2 Likes
@androlo said:
This looks great. I'm using this in the Forester lib now.

@survivor said:
Nice! I'm looking forward to use the Forester in a polished version of my ball maze game.


That's a Win-Win :p
1 Like

@survivor

Using this for my water now. At least the world space lighting stuff (only to collect everything in one space). Great stuff.

1 Like

Nice to read that my stuff was useful. Thanks!



Although Diablo 3 is a huge time sink (some say waste), I managed to work a bit on this project. I refactored a lot. Here are the new ā€œfeaturesā€:


  1. AccumulationBuffer
  2. MaterialEx with LightingRenderer
  3. MultiPassParallelLightingRenderer + Lighting_MPPLR.vert + Lighting_MPPLR.frag



    Download:
  1. AccumulationBuffer

    The AccumulationBuffer is a SceneProcessor that provides a floating point frame buffer for accumulating a high number of blend operations without too much precision loss. Itā€™s like glAccum(), but faster, more flexible and supported by most drivers. As I mentioned before, Lighting.j3md and all multi pass blending shaders suffer from an ugly effect that occurs when adding too much stuff to the low precision RGBA8 frame buffer. This is how it looks with 20, 40, 120, 250 lights:





    My test sphere with 64 lights looks this ugly:





    When using the AccumulationBuffer, these artifacts are almost not visible even with 1024 lights (is no fiction, see 3.). The AccumulationBuffer costs almost no FPS. Itā€™s used by SimpleTestApplication, my new base class for testing. If the tests donā€™t work, you might try to comment it out there. Please give feedback whether the AccumulationBuffer works or not.


  2. MaterialEx with LightingRenderer

    MaterialEx is MaterialSP refactored and extended by a mechanism called LightingRenderer that allows to take control of how MaterialEx renders the light and material parameters. It was required to implement the next point.


  3. MultiPassParallelLightingRenderer + Lighting_MPPLR

    This is a multi pass lighting renderer that renders quads of 4 lights in parallel. Multiple quads can be rendered per pass. This gives quite a boost compared to Lighting.j3md.



    32 sphere segments, 1024 lights on GTX 670

    => Lighting: 14 FPS, Lighting_MPPLR: 196 FPS



    32 sphere segments, 32 lights on ATi Mobility 9700 (without parallax mapping since Lighting.j3md doesnā€™t compile otherwise)

    => Lighting: 11 FPS, Lighting_MPPLR: 41 FPS



    ToDo:
  • Spot Light
  • POM
  • Light culling (simple bounding volume test in LightingRenderer)



    After that, Iā€™ll try a light pre pass renderer. Here is another technique and also an an interesting paper discussing lighting techniques.
5 Likes

You got me so excited, that i canā€™t find anything adequate to say XD

Awesome stuff, i am currently using your single pass lighting methods, and seeing its continued made my day!



Gonna fiddle around with it now :roll:

Small update:

I made some bugfixes and also added SpotLight support for all Lighting_* shaders. Since Iā€™m trying to use SIMD commands as much as possible, I do the spot light calculations for all lights even if thereā€™s only one spot light in the scene. This keeps the number of instructions low and is even faster. Especially on old hardware without dynamic branching support. I also pass some parameters in SIMD friendly SoA (Structure of Arrays) format so I donā€™t need to swizzle. Loops are also unrolled with some preprocessor magic. The result is that this shader runs on my old ATi Mobility Radeon 9700 (R300) which is limited to 64 instructions. On that old hardware, it renders up to 8 lights per pass. If there is a spot light in the scene, it can only render 4 because of the 44 varying floats limit.



I also improved the test application(s). You can now specify the most important parameters in one place:

[java]

/* PARAMETERS TO PLAY WITH */

mpplr.setQuadsPerPass(1); // 1 is safe, > 1 yields more fps

useAccumulationBuffer = false; // enable for better quality with many lights

sphereSegments = 32; // increase for more vertex shader load

numDirectionalLights = 2;

numPointLights = 2;

numSpotLights = 4;

[/java]



The default should work on most hardware. If i.e. the AccumulationBuffer is not completely supported by your hardwareā€¦





ā€¦it will look like this:





Next on my list was POM for Lighting_MPPLR, but I think Iā€™ll give Quadtree Displacement Mapping a try. It sounds really simple and promising.



Download:

3 Likes

@survivor

Running latest version. Very excited about all the new stuff. Probably gonna update all Forester shaders to the new ones in the next build.



Btw I wanna link to your site on Foresters google page and reference it in the lib description now. I am of course referring to you in the code, but I wanna make slimshader an ā€œofficialā€ part of the lib now. If you donā€™t mind.



Accumulationbuffer works for me, with a slight fps drop. Using a radeon HD 63xx (canā€™t remember last numbers).

Iā€™m really happy that my stuff is useful. Of course you can use it as you like. Thatā€™s what itā€™s for.



At the moment, Iā€™m hunting down a nasty bug in the new QDM code.







As you can see, Iā€™ve localized the problem in the shader and will create a new snapshot once I have fixed it and cleaned up SVN (itā€™s messed up atm). This snapshot will also contain fixes for some other bugs I found (v_View had wrong sign, ā€¦).

1 Like

@Momoko_Fan

@nehon



In the new UniformBindingManager WorldMatrixInverseTranspose is not set. Please fix that because my stuff needs it.

@Momoko_Fan was there an issue with this code?

g_WorldMatrixInverse seems to be broken, too. Multiplying with it gives NaN values.



[java]

bool isnan(float value)

{

return !(value == value);

}

[/java]



Edit: My fault, g_WorldMatrixInverse is mat4, not mat3, but g_WorldMatrixInverseTranspose (mat3) is not set, so multiplying with it in the shader gives NaN values. Did I mention this refers to nightly? Stable works fine. I just want to make sure itā€™s not in the next stable release.

Yeah this change has been made like a week ago.

Small update:

  • General bugfixes
  • POM optimized and fixed
  • Experimental ā€œParallaxDepthCorrectionā€
  • QDM removed / postponed



    Snapshot: RealSlimShader-2012-07-31.zip



    Mostly cleanup and fixes to make a stable snapshot. I cut the useless crap from POM (which was in the reference shader from DirectX SDK). Itā€™s now mostly the same as Steep Parallax Mapping from Lighting shader.



    There is an experimental switch ā€œParallaxDepthCorrectionā€. When enabled, the shader tries to correct the fragment depth according to the parallax displacement. See here:



    http://www.youtube.com/watch?v=sG48ZNEeunQ


http://www.youtube.com/watch?v=OpA7SRRK_Vg


Unfortunately, this doesn't work with PSSMRenderer / PreShadow. Does any of the cracks know if it is possible though?

I removed / postponed QDM because it's a lot slower than POM at the moment and needs texture2DLod(). I'm planing a revival without mipmaps / texture2DLod() where the mip levels are stored / interleaved "near" the original height pixel. That means less cache misses and no need for texture2DLod(). Maybe it can compete with POM.
4 Likes

Thatā€™s really impressive.



Correcting the depth for shadows would be really complicatedā€¦youā€™d have to correct it before comparing the shadows, but the shadow maps are rendered with their own viewprojection matrix and you donā€™t have it in your pass.

You could try to do it in the post shadow pass.