Project RealSlimShader (Single Pass Lighting and Multi Pass Parallel Lighting)

mcbeth · March 5, 2012, 2:18pm

@atomix said:
This test should be in the show case of JME3 ! 8)
as it run in 25fps, I consider the performance can be quite playable... Can you tell what parameter will affect the performance here (too lazy to read the article), as I just need the effect in specific scene...

but wouldn't the other stuff require there own investment in frame rate sapping resources

survivor · March 5, 2012, 2:33pm

The 25 or 50 FPS are misleading. I used maximum AA to make a good video and FRAPS clamps the FPS to a multiple of the recording FPS (25 in my case). It’s not as bad as it seems.

The performance hitting parameters are:

ParallaxShadows (bool): Toggles approximate soft shadows on or off. Obviously, if it’s off it can’t hurt. If it’s on, it reduces the FPS by a constant amount (in a static scene) since it’s basically 8 additional height map lookups per fragment.
SteepParallax (bool): Toggles between POM and classic parallax mapping (not steep!).
ParallaxHeight (float): The maximum displacement height influences the (maximum) number of steps in the linear search for an intersection with the height field profile. I use the same number of steps and scale as SteepParallax of “Lighting.j3md”.

If desired, I can create a patch to parallax.glsllib that adds POM and shadows in a compatible way. But I think the fake shadows are not the way to go. I’m currently investigating a Silhouette POM algorithm that corrects the depth buffer so it can be correctly used with shadow mapping.

atomix · March 5, 2012, 2:37pm

@survivor said:
You can see the LoD transition. Maybe one could scale the LoD threshold by angle.

I wonder can the depth of brick can be decrease to gain some frame rate, And yes, in some cut-scenes, the effect can still be good even if it can't work with physics or AI at the same time thought.

survivor · March 5, 2012, 2:41pm

@atomix said:
I wonder can the depth of brick can be decrease to gain some frame rate, And yes, in some cut-scenes, the effect can still be good even if it can't work with physics or AI at the same time thought.

Yes, that's the parameter ParallaxHeight. I has a strong impact on the frame rate since it influences the number of iterations in the fragment shader.

androlo · March 20, 2012, 7:44pm

This looks great. I’m using this in the Forester lib now.

survivor · March 20, 2012, 8:43pm

Nice! I’m looking forward to use the Forester in a polished version of my ball maze game.

atomix · March 21, 2012, 1:38am

@androlo said:
This looks great. I'm using this in the Forester lib now.

@survivor said:
Nice! I'm looking forward to use the Forester in a polished version of my ball maze game.

That's a Win-Win :p

androlo · May 28, 2012, 7:52pm

@survivor

Using this for my water now. At least the world space lighting stuff (only to collect everything in one space). Great stuff.

survivor · June 18, 2012, 12:08am

Nice to read that my stuff was useful. Thanks!

Although Diablo 3 is a huge time sink (some say waste), I managed to work a bit on this project. I refactored a lot. Here are the new “features”:

AccumulationBuffer
MaterialEx with LightingRenderer
MultiPassParallelLightingRenderer + Lighting_MPPLR.vert + Lighting_MPPLR.frag

Download:

Repository: Google code
Snapshot: RealSlimShader-2012-06-18.zip.

Read on…

AccumulationBuffer

The AccumulationBuffer is a SceneProcessor that provides a floating point frame buffer for accumulating a high number of blend operations without too much precision loss. It’s like glAccum(), but faster, more flexible and supported by most drivers. As I mentioned before, Lighting.j3md and all multi pass blending shaders suffer from an ugly effect that occurs when adding too much stuff to the low precision RGBA8 frame buffer. This is how it looks with 20, 40, 120, 250 lights:

6KSC6.png640×640

My test sphere with 64 lights looks this ugly:

iYZIY.png656×518

When using the AccumulationBuffer, these artifacts are almost not visible even with 1024 lights (is no fiction, see 3.). The AccumulationBuffer costs almost no FPS. It’s used by SimpleTestApplication, my new base class for testing. If the tests don’t work, you might try to comment it out there. Please give feedback whether the AccumulationBuffer works or not.
MaterialEx with LightingRenderer

MaterialEx is MaterialSP refactored and extended by a mechanism called LightingRenderer that allows to take control of how MaterialEx renders the light and material parameters. It was required to implement the next point.
MultiPassParallelLightingRenderer + Lighting_MPPLR

This is a multi pass lighting renderer that renders quads of 4 lights in parallel. Multiple quads can be rendered per pass. This gives quite a boost compared to Lighting.j3md.

32 sphere segments, 1024 lights on GTX 670

=> Lighting: 14 FPS, Lighting_MPPLR: 196 FPS

32 sphere segments, 32 lights on ATi Mobility 9700 (without parallax mapping since Lighting.j3md doesn’t compile otherwise)

=> Lighting: 11 FPS, Lighting_MPPLR: 41 FPS

ToDo:

Spot Light
POM
Light culling (simple bounding volume test in LightingRenderer)

After that, I’ll try a light pre pass renderer. Here is another technique and also an an interesting paper discussing lighting techniques.

Setekh · June 18, 2012, 5:51am

You got me so excited, that i can’t find anything adequate to say XD

Awesome stuff, i am currently using your single pass lighting methods, and seeing its continued made my day!

Gonna fiddle around with it now :roll:

survivor · June 22, 2012, 3:27pm

Small update:

I made some bugfixes and also added SpotLight support for all Lighting_* shaders. Since I’m trying to use SIMD commands as much as possible, I do the spot light calculations for all lights even if there’s only one spot light in the scene. This keeps the number of instructions low and is even faster. Especially on old hardware without dynamic branching support. I also pass some parameters in SIMD friendly SoA (Structure of Arrays) format so I don’t need to swizzle. Loops are also unrolled with some preprocessor magic. The result is that this shader runs on my old ATi Mobility Radeon 9700 (R300) which is limited to 64 instructions. On that old hardware, it renders up to 8 lights per pass. If there is a spot light in the scene, it can only render 4 because of the 44 varying floats limit.

I also improved the test application(s). You can now specify the most important parameters in one place:

[java]

/* PARAMETERS TO PLAY WITH */

mpplr.setQuadsPerPass(1); // 1 is safe, > 1 yields more fps

useAccumulationBuffer = false; // enable for better quality with many lights

sphereSegments = 32; // increase for more vertex shader load

numDirectionalLights = 2;

numPointLights = 2;

numSpotLights = 4;

[/java]

The default should work on most hardware. If i.e. the AccumulationBuffer is not completely supported by your hardware…

…it will look like this:

Next on my list was POM for Lighting_MPPLR, but I think I’ll give Quadtree Displacement Mapping a try. It sounds really simple and promising.

Download:

Repository: Google code
Snapshot: RealSlimShader-2012-06-22.zip.

androlo · July 15, 2012, 5:21pm

@survivor

Running latest version. Very excited about all the new stuff. Probably gonna update all Forester shaders to the new ones in the next build.

Btw I wanna link to your site on Foresters google page and reference it in the lib description now. I am of course referring to you in the code, but I wanna make slimshader an “official” part of the lib now. If you don’t mind.

Accumulationbuffer works for me, with a slight fps drop. Using a radeon HD 63xx (can’t remember last numbers).

survivor · July 15, 2012, 5:42pm

I’m really happy that my stuff is useful. Of course you can use it as you like. That’s what it’s for.

At the moment, I’m hunting down a nasty bug in the new QDM code.

As you can see, I’ve localized the problem in the shader and will create a new snapshot once I have fixed it and cleaned up SVN (it’s messed up atm). This snapshot will also contain fixes for some other bugs I found (v_View had wrong sign, …).

survivor · July 17, 2012, 12:30pm

@Momoko_Fan

@nehon

In the new UniformBindingManager WorldMatrixInverseTranspose is not set. Please fix that because my stuff needs it.

nehon · July 17, 2012, 12:37pm

@Momoko_Fan was there an issue with this code?

survivor · July 17, 2012, 12:47pm

g_WorldMatrixInverse seems to be broken, too. Multiplying with it gives NaN values.

[java]

bool isnan(float value)

{

return !(value == value);

}

[/java]

Edit: My fault, g_WorldMatrixInverse is mat4, not mat3, but g_WorldMatrixInverseTranspose (mat3) is not set, so multiplying with it in the shader gives NaN values. Did I mention this refers to nightly? Stable works fine. I just want to make sure it’s not in the next stable release.

nehon · July 17, 2012, 1:57pm

Yeah this change has been made like a week ago.

survivor · July 31, 2012, 10:03pm

Small update:

General bugfixes
POM optimized and fixed
Experimental “ParallaxDepthCorrection”
QDM removed / postponed

Snapshot: RealSlimShader-2012-07-31.zip

Mostly cleanup and fixes to make a stable snapshot. I cut the useless crap from POM (which was in the reference shader from DirectX SDK). It’s now mostly the same as Steep Parallax Mapping from Lighting shader.

There is an experimental switch “ParallaxDepthCorrection”. When enabled, the shader tries to correct the fragment depth according to the parallax displacement. See here:

http://www.youtube.com/watch?v=sG48ZNEeunQ

http://www.youtube.com/watch?v=OpA7SRRK_Vg

Unfortunately, this doesn't work with PSSMRenderer / PreShadow. Does any of the cracks know if it is possible though?

I removed / postponed QDM because it's a lot slower than POM at the moment and needs texture2DLod(). I'm planing a revival without mipmaps / texture2DLod() where the mip levels are stored / interleaved "near" the original height pixel. That means less cache misses and no need for texture2DLod(). Maybe it can compete with POM.

nehon · August 1, 2012, 8:25am

That’s really impressive.

Correcting the depth for shadows would be really complicated…you’d have to correct it before comparing the shadows, but the shadow maps are rendered with their own viewprojection matrix and you don’t have it in your pass.

You could try to do it in the post shadow pass.