VR rendering in one pass with instancing

pspeed · December 1, 2015, 8:57pm

Maybe InstancedGeometry is always in world space. I sort of remember that the full transform stack (world transform) is incorporated into the instance data of the mesh. Probably easier to calculate then transform relative to the node… but I don’t know why these decisions were made the way they were, actually.

phr00t · December 1, 2015, 9:11pm

Attempting to simply add a new field & function to InstancedGeometry that will use another spatial’s world bounds when asking what the InstancedGeometry’s world bounds are. That way, whatever fishiness & decisions that were made for whatever reasons (that changing may break something else) can be bypassed when using instancing for this specific purpose. Building & testing shortly…

Edit: HEELLLLZZZZ YEAH WORKS BRUTE FORCE BABY Ahem, OK, back to work.

KonradZuse · December 1, 2015, 9:43pm

I love following your progress, keep up the good work phr00t!

phr00t · December 1, 2015, 9:47pm

Rendering the simple test scene of cubes (as seen in screenshots above), with VR instancing, I get around 760 FPS. When not using instancing mode with two viewports & passes, I get around 410 FPS. 85% performance improvement so far. Not too shabby.

Skybox is still being skipped, though… so maybe just a 80% performance bump?

Can’t wait to get this into 5089 & test!

phr00t · December 2, 2015, 9:43pm

Got VR instancing into 5089 today – kinda works, kinda doesn’t:

1) Handheld weapon isn’t getting split
2) Laser pointer isn’t getting split
3) Some objects are not being removed from the left eye, like vertical spawn columns & fired projectiles
4) Shaders that rely on world space positional texture mapping are messed up
5) Fisheye effect seems pronounced, and objects in the center are rendering farther away than expected, causing things to pass the camera’s far frustum (when they normally wouldn’t in non-VR or non-instancing mode)

Happy to see many things are working, though. If you have any suggestions on fixing the above points, that’d be really appreciated. I’m going to be working through the list myself either way!

phr00t · December 3, 2015, 2:41pm

Problem #5 doesn’t happen when submitting textures to the VR Compositor, which is a good thing. Must only be an oddity when drawing to a regular monitor, which is irrelevant. Striking it off the problem list.

phr00t · December 9, 2015, 7:05pm

I significantly cleaned VR instancing up, fixing problems #1, #3 & #4 above:

Problems still exist, particularly in positioning of nested, instanced objects (as seen in the arms of the robot).

However, a bigger problem is revealing itself in performance numbers: for complex scenes (where improvement is needed the most), VR instancing so far appears to be slower. I don’t fully understand it, and it might be an implementation issue. It might be the additional vertex calculations being made to transform instanced data when vertex count gets high. For simple scenes, and “medium” complexity scenes, VR instancing does seem to be faster. For example, when I am in the menu, and right when I start a game – VR instancing is faster. However, as robots spawn & objects increase, the VR instancing scene gets slower faster.

I’ll definitely leave all the VR instancing stuff in, and it may be useful for certain situations. I’d love to have an “ah-ha” moment to figure out why the very complex scene is slow… profiling does point to about ~7% of time spent filling buffers with all of the instancing data. However, even cutting this out completely wouldn’t likely get us in the green in complex scenes.

Hard to work on fixing all the instance rendering oddities if consistent performance increases won’t be where they are needed most.

phr00t · December 9, 2015, 9:12pm

I just found a potentially big implementation problem. I do a bunch of geometry batching, like buildings and trees. I create all of the geometry & attach it to a node, and then batch that node. However, the instancing system is likely duplicating all the geometry before & after batching… causing who knows how much geometry to actually be showing. I’ll have to find a way to make it play nice with batching… might see some resolution there.

nehon · December 10, 2015, 7:33am

Do you use the batchNode or the GeometryBatchFactory?
BatchNode retains base geometries for reference when transforming, GeometryBatchFactory doesn’t because it’s static.
If that helps…

phr00t · December 10, 2015, 3:32pm

VR instancing is nearly visually perfect now:

Made a big improvement by no longer duplicating geometry. I just add the same geometry twice to the InstancedGeometry handler. This way, when an application updates one piece of geometry, both instances are looking in that same place for updates. This also simplifies the node tree, since there is no longer two culled geometries for every InstancedGeometry, but one.

I figure out what geometry to instance by monitoring the renderGeometry function. If something gets passed to that function, which isn’t instanced (or at least checked for instancing), I add it to a list. In VRApplication, I go through the list and instance recently added stuff & remove it from the list. This has the side effect of things not being rendered for the first frame, which isn’t ideal… but it should do a good job of catching everything that is meant to be part of the scene.

Unfortunately, it is still slow. The main menu is faster with instancing, but now even when I start the game & many objects are added, it starts out slower with instancing. Disparaging. I’m still not sure if it is just a fact that instancing won’t be quicker with complexity, or an implementation issue. Instancing just feels like it should be quicker, so I’m really scratching my head at trying to figure out what is going on wrong here (if anything).

pspeed · December 10, 2015, 7:50pm

Instancing sends more data over the bus per draw call. Normally in instancing, you are instancing lots of things and the data isn’t growing that fast. Even though only one draw call is made, instancing is not free internally. There is still per object setup and so on at the driver/GPU level.

One theory: for two instances of something, you don’t really gain because the cost doesn’t have enough instances to properly amortize. Now, if your stuff is instanced a lot besides the stereo pairing then this theory won’t hold water.

phr00t · December 10, 2015, 10:00pm

I think I may have traced the cause of the slowdown: all instanced geometry is being “updated”, even if it is being culled out of the camera’s frustum. I’m trying to come up with a way to skip updating an instance pair if the original geometry is culled. This doesn’t seem to work, and I’m not sure why:

public void renderFromControl() {
    int sent = 0;
    for (int i=0;i<igToRender.size();i++) {
        InstancedGeometry ig = igToRender.get(i);
        Geometry g = ig.getLinkedGeometry();
        if( g != null &&
            VRApplication.getCamera().contains(g.getWorldBound()) != FrustumIntersect.Outside ) {
            sent++;
            ig.updateInstances();
        }            
    }
    System.out.println("Sent:" + Integer.toString(sent));
}

… all things get sent that have been previously viewed & duplicated.

phr00t · December 11, 2015, 1:09am

Sooooooooo the main problem is, I’ve been trying to manage all instanced geometry in one VRInstancedNode that also happens to be the root node. I did this so it’d be easy to integrate with existing projects (e.g. no need for special nodes across the whole scene). However, this kinda breaks the inherent scene culling & updating trees. As a result, everything is being updated & sent along the bus every frame – killing performance. I need to redesign how geometry is split & rendered. Perhaps when geometry gets rendered, it will check if an InstancedGeometry copy of it exists, and render that instead…

phr00t · December 11, 2015, 2:21pm

Progress! This did the trick:

    for (int i=0;i<igToRender.size();i++) {
        InstancedGeometry ig = igToRender.get(i);
        if( ig.getLastFrustumIntersection() != FrustumIntersect.Outside ) {
            ig.updateInstances();
        }            
    }

… this skips updating instances that are no longer visible. Saves a ton of performance in complex scenes. Think we are finally in the green! However, the igToRender list isn’t getting properly cleaned as geometry gets removed from the scene, so more performance improvements to come…

phr00t · December 11, 2015, 6:17pm

VR instancing is virtually (no pun intended) done:

Cleaned up a ton. Even relatively complex situations like geometries using controls, like BillboardControls, are working. There is an odd fog difference in the screenshot above, but I suspect it is related to the odd warping that happens when the image gets scaled down to a non-VR device at a lower resolution. I’ll plan on actually testing this with my Vive in the next day or two.

Performance improvements during really simple scenes can reach 80% using this method. In the really complex scene above, performance gains were around ~10%. Most of the time is spent in fragment shaders, so draw call overhead saved with VR instancing is less. However, now VR instancing should be faster in all scenarios!

A few minor changes are required to vertex shaders & material definitions, as seen here in the Unshaded files here:

The above scene should be able to hit +90Hz in VR with a 760M!

phr00t · December 12, 2015, 3:12am

Just got done testing this on my Vive. Projection is a bit messed up – seems that warping isn’t just happening when scaling to non-VR resolutions, but also on the headset. It might be something related to the field of view, which somehow might be getting set higher than it should be. Anyway, still have some work to do (but this shouldn’t be that hard to fix).

EDIT: If it is field-of-view related, performance improvements should be better than quoted above, because more things are being rendered than should be! I’ve got a busy weekend, but I’ll find the root cause & get this fixed ASAP.

phr00t · December 13, 2015, 3:26am

Found the root cause: when I double the resolution width of the main viewport (since we are doing 2 eyes in one viewport), the aspect ratio was being “fixed”, which was breaking the projection matrix. I’ve made some changes that should get the proper projection matrix with the proper resolution. I’ll test tomorrow.

empires · December 14, 2015, 1:39pm

Just wanted to say I love following this. You do an awesome job of explaining the issues you ran into and how you fixed them. I’m not even doing any VR and I’ve learned a few things from this.

phr00t · December 14, 2015, 4:08pm

I got the projection / warping problem fixed:

… came with a nice boost in performance, since we are not rendering more than we need!

I also added a function to scale resolution down to improve performance. However, I’m noticing some oddities with the GUI only when wearing the headset (looks fine when using the “null” driver & SteamVR compositor). When I cut the resolution in half, only about the bottom left quadrant of the GUI is visible in the headset. If I use the “null” driver, or just display the split eye scene on my main monitor, the whole GUI (at a reduced resolution) shows up fine. Scratching my head on that one…

There are some well known OpenGL performance problems with the SteamVR compositor, so it makes it hard to actually experience the performance improvements when using the Vive (or Rift). Converting OpenGL textures to DirectX ones in the SteamVR compositor seems to be the culprit. It is a pain, because I can get well over 90 FPS on my mid-range laptop, but try it on my high-end desktop with the Vive and it stutters like crazy with the compositor (whether I am using instancing or not). Anyway, the Valve team is aware of it & I can’t wait until it is fixed…

phr00t · December 14, 2015, 6:08pm

I moved the VR instancing stuff into my main jMonkeyEngine branch. I was hacking in support with jMonkeyVR, but this is something that had to be more closely integrated with the main engine. Now it is handled much more efficiently & things are processed just as they are being rendered – no longer does instanced stuff “skip a frame” while waiting to be picked up by jMonkeyVR. jMonkeyVR now enables the VR instancing stuff in the engine, as needed. No more VRInstancedNode. Overall, much cleaner.