@atomix said:
This test should be in the show case of JME3 ! 8)
as it run in 25fps, I consider the performance can be quite playable... Can you tell what parameter will affect the performance here (too lazy to read the article), as I just need the effect in specific scene...
but wouldn't the other stuff require there own investment in frame rate sapping resources
The 25 or 50 FPS are misleading. I used maximum AA to make a good video and FRAPS clamps the FPS to a multiple of the recording FPS (25 in my case). Itās not as bad as it seems.
The performance hitting parameters are:
ParallaxShadows (bool): Toggles approximate soft shadows on or off. Obviously, if itās off it canāt hurt. If itās on, it reduces the FPS by a constant amount (in a static scene) since itās basically 8 additional height map lookups per fragment.
SteepParallax (bool): Toggles between POM and classic parallax mapping (not steep!).
ParallaxHeight (float): The maximum displacement height influences the (maximum) number of steps in the linear search for an intersection with the height field profile. I use the same number of steps and scale as SteepParallax of āLighting.j3mdā.
If desired, I can create a patch to parallax.glsllib that adds POM and shadows in a compatible way. But I think the fake shadows are not the way to go. Iām currently investigating a Silhouette POM algorithm that corrects the depth buffer so it can be correctly used with shadow mapping.
@survivor said:
You can see the LoD transition. Maybe one could scale the LoD threshold by angle.
I wonder can the depth of brick can be decrease to gain some frame rate, And yes, in some cut-scenes, the effect can still be good even if it can't work with physics or AI at the same time thought.
@atomix said:
I wonder can the depth of brick can be decrease to gain some frame rate, And yes, in some cut-scenes, the effect can still be good even if it can't work with physics or AI at the same time thought.
Yes, that's the parameter ParallaxHeight. I has a strong impact on the frame rate since it influences the number of iterations in the fragment shader.
The AccumulationBuffer is a SceneProcessor that provides a floating point frame buffer for accumulating a high number of blend operations without too much precision loss. Itās like glAccum(), but faster, more flexible and supported by most drivers. As I mentioned before, Lighting.j3md and all multi pass blending shaders suffer from an ugly effect that occurs when adding too much stuff to the low precision RGBA8 frame buffer. This is how it looks with 20, 40, 120, 250 lights:
When using the AccumulationBuffer, these artifacts are almost not visible even with 1024 lights (is no fiction, see 3.). The AccumulationBuffer costs almost no FPS. Itās used by SimpleTestApplication, my new base class for testing. If the tests donāt work, you might try to comment it out there. Please give feedback whether the AccumulationBuffer works or not.
MaterialEx with LightingRenderer
MaterialEx is MaterialSP refactored and extended by a mechanism called LightingRenderer that allows to take control of how MaterialEx renders the light and material parameters. It was required to implement the next point.
This is a multi pass lighting renderer that renders quads of 4 lights in parallel. Multiple quads can be rendered per pass. This gives quite a boost compared to Lighting.j3md.
32 sphere segments, 1024 lights on GTX 670
=> Lighting: 14 FPS, Lighting_MPPLR: 196 FPS
32 sphere segments, 32 lights on ATi Mobility 9700 (without parallax mapping since Lighting.j3md doesnāt compile otherwise)
=> Lighting: 11 FPS, Lighting_MPPLR: 41 FPS
ToDo:
Spot Light
POM
Light culling (simple bounding volume test in LightingRenderer)
After that, Iāll try a light pre pass renderer. Here is another technique and also an an interesting paper discussing lighting techniques.
I made some bugfixes and also added SpotLight support for all Lighting_* shaders. Since Iām trying to use SIMD commands as much as possible, I do the spot light calculations for all lights even if thereās only one spot light in the scene. This keeps the number of instructions low and is even faster. Especially on old hardware without dynamic branching support. I also pass some parameters in SIMD friendly SoA (Structure of Arrays) format so I donāt need to swizzle. Loops are also unrolled with some preprocessor magic. The result is that this shader runs on my old ATi Mobility Radeon 9700 (R300) which is limited to 64 instructions. On that old hardware, it renders up to 8 lights per pass. If there is a spot light in the scene, it can only render 4 because of the 44 varying floats limit.
I also improved the test application(s). You can now specify the most important parameters in one place:
[java]
/* PARAMETERS TO PLAY WITH */
mpplr.setQuadsPerPass(1); // 1 is safe, > 1 yields more fps
useAccumulationBuffer = false; // enable for better quality with many lights
sphereSegments = 32; // increase for more vertex shader load
numDirectionalLights = 2;
numPointLights = 2;
numSpotLights = 4;
[/java]
The default should work on most hardware. If i.e. the AccumulationBuffer is not completely supported by your hardwareā¦
Running latest version. Very excited about all the new stuff. Probably gonna update all Forester shaders to the new ones in the next build.
Btw I wanna link to your site on Foresters google page and reference it in the lib description now. I am of course referring to you in the code, but I wanna make slimshader an āofficialā part of the lib now. If you donāt mind.
Accumulationbuffer works for me, with a slight fps drop. Using a radeon HD 63xx (canāt remember last numbers).
As you can see, Iāve localized the problem in the shader and will create a new snapshot once I have fixed it and cleaned up SVN (itās messed up atm). This snapshot will also contain fixes for some other bugs I found (v_View had wrong sign, ā¦).
g_WorldMatrixInverse seems to be broken, too. Multiplying with it gives NaN values.
[java]
bool isnan(float value)
{
return !(value == value);
}
[/java]
Edit: My fault, g_WorldMatrixInverse is mat4, not mat3, but g_WorldMatrixInverseTranspose (mat3) is not set, so multiplying with it in the shader gives NaN values. Did I mention this refers to nightly? Stable works fine. I just want to make sure itās not in the next stable release.
Mostly cleanup and fixes to make a stable snapshot. I cut the useless crap from POM (which was in the reference shader from DirectX SDK). Itās now mostly the same as Steep Parallax Mapping from Lighting shader.
There is an experimental switch āParallaxDepthCorrectionā. When enabled, the shader tries to correct the fragment depth according to the parallax displacement. See here:
Unfortunately, this doesn't work with PSSMRenderer / PreShadow. Does any of the cracks know if it is possible though?
I removed / postponed QDM because it's a lot slower than POM at the moment and needs texture2DLod(). I'm planing a revival without mipmaps / texture2DLod() where the mip levels are stored / interleaved "near" the original height pixel. That means less cache misses and no need for texture2DLod(). Maybe it can compete with POM.
Correcting the depth for shadows would be really complicatedā¦youād have to correct it before comparing the shadows, but the shadow maps are rendered with their own viewprojection matrix and you donāt have it in your pass.