Enhancements to the JME3 Rendering Engine

pspeed · October 16, 2023, 3:36pm

This is what I vote for… and yes, users who want to try it would pull it separately.

Essentially your branch would be an incubator of the “new technology”.

As a side goal, something to consider: once your code is to a certain point it might be nice to consider what parts could be extracted and applied to the engine separately.

Smaller incremental improvements (if possible) have a lot of benefits including: easier to review (so faster to get in) and potentially just being improved by the process. Sometimes extracting out some enhancement makes it clearer how it could be better or if it could indeed be backwards compatible with jme3.

The goal would be incremental engine improvements that give you the things you need to implement the foundation for the “new way”… with the possibility that the “new way” can all be done with incremental improvements.

The code will be better for it even if it is extra work.

JhonKkk · October 17, 2023, 5:11am

Thank you for your patient guidance . I basically agree with your viewpoint.(Sorry for the late reply.)

Some of the content in the PR I submitted is actually extracted from my previous JME3.4 modifications as the minimally integrable parts into the engine.

I understand the advantages of minimizing incremental improvements (in addition to reducing the difficulty and potential issues during code review, it also facilitates rollback of modifications, etc.). However, for a functional PR, a single minimized incremental improvement may not be just one or two code file submissions or modifications. For example, I need to enhance a certain functionality for the renderer, I would need to first develop a minimized framework A, then add compatibility code on top of framework A to make it compatible with some existing code. So for this “functionality”, its minimized change could involve dozens of new code files.

So I’m stopping here for now, because I basically submitted the PR following the minimized incremental approach you mentioned, although there are many file changes, most are new additions, and these new files are the minimal version (aka version 1.0) of the current incremental improvement.
For this PR, it can be considered as skipping the step of creating a “new tech branch” in the jme3 core github repo, but it is indeed the minimum part I could extract from my local JME3.4 modifications that can be applied separately to the core engine.
One final question, should I close this PR and restart the process: i.e. first create a “new tech branch” in the jme3 core github repo, then extract parts from the new branch that can be merged independently into the master? But I don’t have permission to create branches in the jme3 repo, nor merge branches into the jme3 core repo, I can only fork the code as a contributor and submit PRs.

oxplay2 · October 17, 2023, 11:02am

I guess all is fine, i belive Paul just wanted point out that minimizing incremental improvements are best.

So based on what you said pull request is fine i guess.

As i see in pull request you anyway already make it by small incremental improvements.

Like i see Sgold mentioned “pull request mark draft” that i never made myself before on github, but it seems very good way to allow you faster incremental changes, while also allow for PR and better testing.

i guess Sgold by marking draft mean:

Tho this screenshot might be little outdated with new Github GUI.

JhonKkk · October 17, 2023, 1:04pm

Compatibility for terrain unlit and lighting shading models has been added in the render path architecture:

（For terrain PBRLighting shading model, I may work on it tomorrow.

）

oxplay2 · October 17, 2023, 1:30pm

Nice

Tho im little scared when for example i got some custom Shaders, as example i have TextureArray based Shader for pbr terrain also using jMonkeyEngine | Library this lib. Or pbr character shader with tatoos/etc.

Hope it will be easy to adjust.

JhonKkk · October 18, 2023, 3:20am

I think it’s time to explain my current implementation details here, mainly what is ShadingModel and how to associate it with DeferredRenderPath.
Regarding built-in shading models, I currently added 4 of them, namely LEGACY_LIGHTING (i.e. Phong Lighting), STANDARD_LIGHTING (i.e. PBRLighting), UNLIT (ColoredTextured, Unshaded, Terrain, actually can be any material shader that only needs to output a single color value), SUBSURFACE_SCATTERING (real skin shading, subsurface scattering materials). More may be included later such as EYE (eye shading model), Hair, Cloth (fabrics).
The shading model determines shading results. Simply put, for DeferredShading, there are usually two passes, first the GBufferPass, then the DeferredShadingPass. GBufferPass is responsible for packing data required for computing specified shading model, and DeferredShadingPass fetches data and shades according to the specified shading model.
To make it general, GBuffer allocated 4 RTs and 1 Depth, let’s look at the GBuffer for LEGACY_LIGHTING, as follows:

Then the GBuffer for STANDARD_LIGHTING, as follows:

The GBuffer for UNLIT, as follows:

Finally the GBuffer for SUBSURFACE_SCATTERING, as follows:

The overall flow is as below:

I think if the custom shader shading model you use conforms to built-in lighting, pbrLighting, unlit, then you just need to add a Technique (named GBufferPass) in your custom material, and pack data into GBufferData following the specified format for that shadingModel.
However, if you need a custom shading model, currently you may need to modify the global shader DeferredShading.frag and TileBasedDeferredShading.frag based on engine source code.
I’m really thinking whether I should provide an external interface for JME3 users to copy DeferredShading.frag and TileBasedDeferredShading.frag for modification, and directly set it to the pipeline to override default global shaders loaded, so extending custom shading models won’t require modifying engine source code.

adi.barda · October 18, 2023, 6:02am

@JhonKkk
I looked at your PR and it’s huge. ~8500 lines of code added / 84 files changed …
Can you describe what are the benefits that JME’s end users will get from your version 1.0 PR? better lighting / performance etc. and what is your vision for next versions?
Thanks for your efforts & detailed explanations.

JhonKkk · October 18, 2023, 6:56am

hi, adi.barda!
I will briefly describe the features provided in this PR (1.0) version for JME3 users:

Out-of-the-box support for multiple render paths (forward/deferred), JME3 users should need very little or even no code changes to use them.
The deferred render path allows JME3 users to use more lights than before, especially complex scenes usually need more lights.
FrameGraph is used internally in the renderer to manage the rendering workflow of each frame (invisible to users), this is to better manage more complex graphics rendering and prepare for upcoming features.
Added subsurface scattering shading model, which allows JME3 users to express more realistic skin and ceramic effects using this specific shading model.
Added Pre-Pass, this is very important. In my many years of UE4 graphics experience, especially on mobile, it optimizes a lot of overdraw overhead, saving about 40% of power on average and improving frame rate by about 30% (even more for complex scenes).
Implemented AMD FSR 1.0. Many may not know that many games now do not render at target resolution. Instead, they use a lower resolution sceneColorBuffer for rendering, then perform an “upscaling pass” to output to the backbuffer, resulting in huge frame rate gains (I’ve lost count of how many mobile games have used these techniques, it’s also standard for AAA games now, AMD already has FSR 3.0, and Nvidia DLSS has also been updated).
Implemented VRS, unlike point 6 which affects the entire screen rendering at lower resolution, VRS adjusts shading rate at primitive granularity or even block pixel granularity to balance image quality and fill rate (however, currently only NV and Qualcomm Snapdragon 660+ devices support VRS in OpenGL).

The topic for the next PR is the “Global Illumination” part. I plan to implement:

Adjust the ShadowPost stage. Currently the shadows are not physically accurate. In the rendering equation, it should only affect direct lighting instead of accumulated lighting, otherwise subsequent global illumination will be incorrect.
Implement Light Probe Volume 1.0 and Light Propagation Volumes 1.0, both are global illumination techniques, but are used in different scenarios. Light Probe Volume is for static scenes + dynamic objects, prebaked, lighting information is immutable (i.e. precomputed realtime GI). Light Propagation Volumes is for all objects, no prebaking required, lighting information is mutable (i.e. realtime dynamic GI).
Since the two GI techniques in point 3 only provide diffuse GI, specular GI needs to be supplemented through other means. I plan to split the existing lightProbe’s specular part into a separate reflectionProbe (also known as reflection probe in other engines) to provide local indirect reflection effects. I noticed someone has implemented screen space reflection, so it’s also a supplement for specular GI.
Implement SSGI, different from SSR which is just a supplement for specular GI, SSGI contains bounced lighting.

After completing these two major PRs, I may plan to implement:

Optimize the deferred render path, currently the deferred render path is not fully optimized. Also, mobile generally does not use MRT to implement deferred rendering, but uses some hardware features to implement faster deferred rendering.
Optimize and improve global illumination (try adding LightProbeVolume tool in SDK), may try implementing SDF DDGI.
Optimize culling system…

oxplay2 · October 18, 2023, 8:28am

Wow. You just made the feature i were thinking about. I heard FSR works on both Nvidia+Radeon, while DLSS only on Nvidia.

Could you explain more about this? I thought it require little AI learning to know how to upscale. But maybe FSR 1.0 do not require it?

JhonKkk · October 18, 2023, 8:49am

I ported FSR 1.0 to ue4 mobile render 1 year ago, and also did some tests in jme3. FSR 1.0 is a pure image algorithm that does not rely on hardware, while FSR 2.0 relies on temporal caching (i.e. velocity buffer). Before JME3 has a complete CS (Compute Shader module), I think it is currently impossible to implement FSR 2.0 well, so only FSR 1.0 can be had.
For Nv DLSS, it is a super resolution technology implemented based on deep learning technology, which requires hardware support, the core library is provided by nv, and currently it seems to only support dx11/dx12 and vulkan.

oxplay2 · October 18, 2023, 9:44am

Thank you for explain

This is indeed very important feature for modern games where every modern game i see upscaling myself, like almost all new AAA games now and many AA ones. (last time played Starfield and ofc there is even Texture upscaling, not just resolution upscaling and both can configure % you want)

Im not sure of quality(speed mostly) difference between FSR 1.0 or FSR 2.0 or DLSS, but i belive even FSR 1.0 is a huge advantage already, so im very very happy you implemented this in PR.

zzuegg · October 18, 2023, 10:00am

I do not want to always bring in the negative parts. Sounds like i do not approve your changes and effort, which is not the case.

I only fear that the size of this pr is going to be an issue. Reviewing this takes ages even if there would be more reviewers around. I just do not want you to put in a lot of work to get frustrated when the merging process does not go as planned.

I would identify the minimal changes the core requires to support all this as user project.
I would make this changes as PR. I know this is a lot more work instead of a large core modification, but in the end the jme core would probably be more flexible. And probably people can use your code faster then if you are going the full integration route. (Speaking as an observer how past large PR’s got handled)

I personally am going to look trough your code and possible steal some ideas as well as code
But i have to say that i never did a professional code review so i am not a good choice.

VRS is something @richtea might can use for the vr library

oxplay2 · October 18, 2023, 10:21am

While i agree and understand, like already mentioned by Sgold, JME have lack of core-devs in that area(except Riccardo ofc, who knows maybe JhonKkk might be one of core devs later), so i belive we should do everything possible to not prolong PRs here, while maintaining code reviews / tests - at least at some optimal quality.

Ofc its everything up to current core-devs to decide route, but i feel like they feel the same.

Also a little history:
I still remember many PRs that goen into core without full testing / code review, like new animation system or even PBR that needed little adjust after merged, but its very similar here, where this features would keep engine more modern, so it might go similar route. Also for example where there still were Nehon arround, i do not think(i might be wrong) many people did proper code reviews of his PR, so process were faster.

But yes, in general smaller PR for sure would be merged faster.

JhonKkk · October 18, 2023, 10:45am

@oxplay2 @zzuegg
I noticed your suggestions, just as I discussed with @pspeed before, so perhaps I should only include the 1.0 version of framegraph, renderPath, shadingModel in this PR, and subsequent iterations and optimizations should be in additional PRs (otherwise it may grow this PR).
As for other features, I have implemented some of them locally, however, I currently did not submit them to this PR, so this PR still only contains the “1.0 version of framegraph, renderPath, shadingModel”. I did want to submit FSR and VSR related content in this PR, however, given some of the issues just discussed (long review time, obstacles to code integration, etc.), I will not continue to add other features to this PR for now. The theme of this PR is “1.0 version of framegraph, renderPath, shadingModel”, and I have done most of the testing on local examples. I think that after I fix the compatibility with PBR terrain, and add relevant comments, header information, etc. according to the engine code standard, I don’t think there will be any other content submitted to this PR. Maybe this weekend should be the time for this PR to enter review?

oxplay2 · October 18, 2023, 10:52am

I think its mostly about who will Code Review. Its all about the exact person who will make it(ofc more people can do that too). i think zzuegg you have knowledge in PR topic, so even if you never did code check professionally, you could try. In general i see arround 3-4 people who know PR topic enough to make Code review.

richtea · October 18, 2023, 10:53am

VRS sounds a lot like foveated rendering in VR world. Where the area the eye is looking straight at are rendered at higher resolution than the peripheral view (because our eyes see much more detail where we are looking). I assumed that wouldnt be possible in JME but perhaps VRS would unlock that. Exciting possibilities for the future

JhonKkk · October 18, 2023, 10:56am

No, VRS was initially applied to VR games, however, now many AAA games use it to improve fill rate in order to increase frame rate (FSR is rendering at a lower full screen resolution then stretching back to target resolution, while VRS is reducing shading rate by specified shader rate, specified primitive, and even specified pixel blocks), I have already implemented VRS in our mobile UE4 game using the glExtension provided by Qualcomm Snapdragon 660 to improve fill rate and increase frame rate by 40%.

JhonKkk · October 18, 2023, 10:58am

My screenshots testing and performance analysis in our game:

zzuegg · October 18, 2023, 11:11am

I am going to review the code out of personal reason anyways. So i will leave some comments.

Yet i still think that the best solution would be making it possible to switch from the current rendering system to this (or any other custom solution) at runtime/init time. That is the last i say to this topic and from now on i stay a the technical side.

How did you use VRS in the final product? reducing sample rate for the whole frame, or for the further away things?

JhonKkk · October 18, 2023, 11:23am

Thank you for participating in code review and providing valuable feedback and suggestions for improvement.

These modules currently allow switching at runtime or initialization phase (switching can be done with just minimal code).

I won’t go into the details here, VRS has 2 levels of granularity: Primitive granularity and Pixel granularity. We chose Primitive granularity. In UE4’s Material system we added a ShadingRate setting, each Mesh has its own Material, and each Material can set the required ShadingRate (e.g. 1x1 default, 1x2, 2x2, 4x2 etc). Since UE4 allows setting independent Materials for each LOD, we can also set it according to the Material corresponding to the Lod, for example LOD0_Material: 1x1, LOD1_Material: 2x2, LOD2_Material: 4x2 etc. The whole system can be turned off and on with one line of code.