Joining heightmaps

jayfella · November 8, 2013, 5:37pm

At this point I actually think the delay we “see” is when we’re adding the chunks to the scene frame after frame. So we’re running at hundreds of frames per second and nothing is happening, happy clappy, then we move out of a chunk and for 14 frames (assuming a 3x3 - 7 chunks removed, 7 chunks added) something is going on. These actions result in a slight loss of frame speed, and it is conveyed to the user as a kind of lag spike. You can counter this by enabling vsync or using a fixed frame rate. If we threw everything into one frame (instead of spreading it out over multiple frames), it would only make the projected visualization worse.

We can lessen this behaviour by reducing the load required to add and remove chunks, but lets see exactly are we doing here and what we can do.

We return true if work was carried out, else false.

for every frame…

> Check for chunks outside our view distance and remove them by:

Iterating through the worldTiles (loaded tiles) hashmap
if any of these tiles are lessThan or greaterThan our viewdistance, iterator.remove();
return true if we removed a tile, else return false.

> Check for new chunks inside our view distance by:
> check the worldTiles.size() to see if it matches the total tiles (e.g. 7x7 = 49 tiles) and return false if the size is what we expect for a fully loaded scene.
> .poll() a ConcurrentLinkedQueue to see if any loaded tiles are in the que and ready to add. Add it and return true if .poll() != null.
> iterate over our view distance.
> check the worldTiles (that contains our loaded tiles) hashmap using .get(); to see if this tile is loaded. if found, continue;
> check the cache hashset using .contains();
> if found in the cache, add it to the scene, add it to the loaded tiles, return true;
> if not found, create a new runnable to load the tile, and return true
> nothing happened, so return false

Not a great deal of work is being done here. Worst case frame, we iterate over 49 items (assuming a viewdistance of 3 in all directions plus the center chunk), issue a .get() on a hashmap, issue a .contains() on a hashset and fire a runnable into the threadpool.

At this point in time, i’m all out of ideas on condensing that logic.

monkeychops · November 8, 2013, 5:43pm

@jayfella

wow, this is epic, it’s hard to believe this is the same system as yesterday, the performance is so much better. I can get away with a view distance of 4 or 5 now vs 2 or 3 yesterday and movement is fluid.

The block sizes seem a bit messed up though, if I get the block size to 257 rather than 129 I seem to get islands

pspeed · November 8, 2013, 5:44pm

" if found in the cache, add it to the scene, add it to the loaded tiles, return true;"

…what happens if you find a lot of items in the cache? Do they get added to that scene in the same frame or are they also staggered one per frame? (ie: are you adding them to the result queue like the background loads?)

reveance · November 8, 2013, 5:51pm

@jayfella said: At this point I actually think the delay we "see" is when we're adding the chunks to the scene frame after frame. So we're running at hundreds of frames per second and nothing is happening, happy clappy, then we move out of a chunk and for 14 frames (assuming a 3x3 - 7 chunks removed, 7 chunks added) something is going on. These actions result in a slight loss of frame speed, and it is conveyed to the user as a kind of lag spike. You can counter this by enabling vsync or using a fixed frame rate. If we threw everything into one frame (instead of spreading it out over multiple frames), it would only make the projected visualization worse.

That would be true, however when all tiles are coming from (memory-) cache there's no 'framedrop' whatsoever...So I'm doubting that that's actually the case...

@monkeychops said: @jayfella
wow, this is epic, it’s hard to believe this is the same system as yesterday, the performance is so much better. I can get away with a view distance of 4 or 5 now vs 2 or 3 yesterday and movement is fluid.

The block sizes seem a bit messed up though, if I get the block size to 257 rather than 129 I seem to get islands

This is because those tile’s are being grabbed from cache. Remove everything from ./world/* before trying to change block size…

jayfella · November 8, 2013, 5:52pm

@monkeychops - delete your saved chunks in the “./world” folder lol - it contains the old sized data.

@pspeed - if it is found in the cache, it is added to the scene right there and then, and returns true.

So basically what this update loop is showing is that if anything is ever “done” - if something is added, removed, or loaded - return true. The update loop then won’t try to do anything else until the next frame. This basically spreads every intensive action over multiple frames.

pspeed · November 8, 2013, 6:00pm

Trying wrapping your update() in some nanotime timings.
long start = System.nanoTime();
long time = System.nanoTime() - start;
if( time > someThreshold ) then log it.

You can see if your spikes are caused by the update() implementation or something else.

Note: attach() and detach() are not free operations even if the mesh has already been rendered once. In Mythruna I had to make sure that I only do one add and one remove per frame and nothing more. I also force my meshes to have a fixed size bounding box to avoid the expensive calculation of the mesh bounds (in Mythruna I can get away with this since my ‘chunks’ are always the same size). You can probably make sure this is calculated on the loading thread, though.

jayfella · November 8, 2013, 6:14pm

@reveance said: That would be true, however when all tiles are coming from (memory-) cache there's no 'framedrop' whatsoever...So I'm doubting that that's actually the case...

That actually reinforces my assumption because getting from the cache is just a .get() on a hashmap - literally free of charge in terms of performance, whereas loading from disk (loading pre-generated tiles) and generating from perlin noise is either reliant on your disk (7ms seek time average for a mechanical + conversion from object to float) or at the whim of the perlin algorythm.

Edit: actually, if it isnt in the cache, they are sent to a thread to be loaded or generated, so any delay in loading or generation would be in the threadpool, not the GL thread. I’m talking out my arse.

pspeed · November 8, 2013, 6:22pm

Some (most?) of the update performance drops will come from adding objects to the scene that have not yet been sent to the GPU. This always happens when loading new tiles but it also may happen when pulling something from cache. It just depends on whether the native objects are still active. Actually, maybe because references are held then they are never released anyway.

I just know that even removing things from the scene could be occasionally slow for me. It has to recalculate all of those world bounds and stuff… same with attach. When your scenes become more complicated than just terrain tiles you will probably want a way to universally gate how many things are added/removed from the scene per frame… instead of only doing it for loaded tiles.

reveance · November 8, 2013, 6:35pm

@pspeed said: Some (most?) of the update performance drops will come from adding objects to the scene that have not yet been sent to the GPU. This always happens when loading new tiles but it also _may_ happen when pulling something from cache. It just depends on whether the native objects are still active. Actually, maybe because references are held then they are never released anyway. <shrug>
I just know that even removing things from the scene could be occasionally slow for me. It has to recalculate all of those world bounds and stuff… same with attach. When your scenes become more complicated than just terrain tiles you will probably want a way to universally gate how many things are added/removed from the scene per frame… instead of only doing it for loaded tiles.

Ah alright, so it is also possible that the items that are in the hashmap cache are still cached on the gpu and because of that the adding wouldn’t cause any delay?

I barely have any knowledge how things are sent to the gpu etc, but when the LodControl transforms an object, would it be resend to the gpu aswell? If so, it may be that the object is send with full vertices on the first frame, then lod would kick in and send it again, and that would happen for every tile.

jayfella · November 8, 2013, 6:48pm

I have a plan. The world tiles are stored in a hashmap, but I think in this instance, since the worldTiles are of static size (49 tiles in a 7x7 layout, for example) it would be quicker to use an array because we have to iterate over them every frame, and iterating over a hashmap is suicide.

pspeed · November 8, 2013, 6:51pm

@jayfella said: I have a plan. The world tiles are stored in a hashmap, but I think in this instance, since the worldTiles are of static size (49 tiles in a 7x7 layout, for example) it would be quicker to use an array because we have to iterate over them every frame, and iterating over a hashmap is suicide.

It’s probably worthwhile… but iterating over a hashmap is not expensive compared to everything else going on. I doubt it will have any measurable performance impact.

If you really want to go all out, though, you’ll store your tiles in a one dimensional array and do the x + y * width index math. It avoids extra array access, bounds checks, etc… and in the case of iterating over all of them then you are only iterating one array. Very fast.

But I’d be very surprised if it matters at all.

Another thing you can do is not check every frame. Only check when you’ve crossed a known threshold boundary.

monkeychops · November 8, 2013, 6:56pm

@jayfella I’m pretty happy with the view distance now
With the speed improvements I can get away with a view of 3 x 257 tiles and then hide the far distance with fog and DOF, it works pretty well.

I can imagine with a properly scultped terrain and textures, some detail grass, normal mapping and some rock meshes etc, this is going to look really nice!

jayfella · November 8, 2013, 10:55pm

Ok I’ve added some logic to the github repo so that it won’t keep checking for new/old chunks when it doesn’t need to, it should get you a few more frames. I think we’re at the point of mico-optimization now, so time to make good on my word and implement jme scene support.

We now have:
> Noise-based and Image-based support.
> Heightmap data saving for quicker re-attachment after generation
> Ability to specify a view distance in all 4 directions.
> Multi-threaded tile loading.
> Caching of recently loaded tiles.
> Better Lod thread management.
> A TileLoaded(TerrainQuad) and TileUnloaded(TerrainQuad) event.
> Ability to cancel a TileLoaded or TileUnloaded event - just return false to the event.

Todo:
> Add jme scene support
> modify the cache to pre-load skirt tiles instead of just keeping a history of old tiles.
> Add support for per-tile static rigid and non-rigid content (trees, grass, rocks, etc).

monkeychops · November 8, 2013, 11:11pm

@jayfella I’m keeping my <span style=“text-decoration:line-through;”>eyes </span>bananas peeled

jayfella · November 9, 2013, 11:10am

Changed cache to pre-load tiles around the visible area instead of tile history. Not only is this more efficient in terms of memory, but its a more intelligent approach.

Github Repo: https://github.com/jayfella/TerrainWorld

monkeychops · November 9, 2013, 12:13pm

@jayfella

Seems really good now… I am still getting holes in the terrain though.

Also, do you think there’s any chance we could add an alpha to the terrain shader and lerp it in so that it notices less when tiles appear? I think that would allow a much shorter view range without being visually jarring and double the available FPS.

jayfella · November 9, 2013, 1:03pm

The holes are to do with the Lod control, @Sploreg wrote that, so I guess he is the man with the plan.

Hiding terrain loading is an art in itself, and specific to each game. Its also worth mentioning that if I designed a “copy/paste” situation with everything, all games would just look the same. Personally, I’m not ashamed to admit that I look at how others got around it, re-write it for my situation, realize its not the best solution, try again, and again, and again… and eventually I come up with the goods.

Make a cup of tea, grab yo self some crisps and a donut, lock the doors, close the curtains, turn off the TV, stretch them fingers and represent what goes on in that crazy head of yours in the form of java. When you’ve done that, scrutinize the justification of that integer in that loop for days - can we use a char instead? We could save 5 bytes of data. We need to know, We must know. Microbenchmark until you feel like those never-ending list of nanoseconds are making pretty pictures to mock you, realize someone came up with a more efficient solution just as you are about to present it to the world… and finally… when your sanity is on the edge, when you look at the world and all you see are ways to interpret everything into neat classes and methods, and argue with yourself whether its actually energetically more efficient to skip instead of walk, whether the garden path should be part of the garden class or in a class of its own (paths are everywhere after all, we should probably sub-class it) you can present your 14 lines of code to the world, only for someone to point out that you were wrong, that integer has no right in that loop, and you should probably re-think the whole implementation anyway, and you silently, on your own, in the cold, dark, stale room you see yourself in, develop a mild aneurysm. But its ok. I mean you may need glasses now, you’ve been staring at a bright screen 2 feet away from your face in the dark for weeks - but you’re not alone. Welcome to the club, my son, for now you are part of something special. You are officially a nerd. And that’s ok too. We’ve come too far to go back now frodo, we must destroy the ring, and it’s going to take weeks to get back into that fitness regime anyway, who needs women these days. We are keyboard warriors! We are nobody and somebody at the same time! We are quantum! We need a bath too, but before that, Hear me ROAR! Shout from the comfort of your swivel chair! You did it, son. You reduced those 14 lines of code to 11. You substituted that integer for a boolean, not a char, and it cut the loop time from 388 nanoseconds to 349. You won. You fucking won. God bless the internet and all the nerds that created it. And god bless that guy that made my sandwich for lunch. It was the best water cress and cheese sandwich I ever had. But had he known you you now know, had he travelled the journey you struggled through like a lonely warrior, he wouldn’t have put the rocket salad near the tomatoes. Tomatoes are a fruit, you single-state buffoon. They have no place in the vegetable class.

jayfella · November 9, 2013, 5:50pm

I’ve pushed some preliminary support for vegetation paging:

[java]
protected class WorldTileListener implements TileListener
{
// return true to allow the tile to load.
@Override public boolean tileLoaded(TerrainChunk terrainChunk) { return true; }

    // return true to allow the tile to unload.
    @Override public boolean tileUnloaded(TerrainChunk terrainChunk) { return true; }

    @Override public void tileLoadedThreaded(TerrainChunk terrainChunk)
    {
        Node staticRigids = new Node("staticRigids");
        Node staticNonRigids = new Node("staticNonRigids");

        float[] heightmap = terrainChunk.getHeightMap();

        // add trees
        for (int i = 0; i &lt; FastMath.rand.nextInt(10); i++)
        {
            int x = FastMath.rand.nextInt(blockSize);
            int z = FastMath.rand.nextInt(blockSize);

            int pos = z * blockSize + x;
            float y = heightmap[pos] * worldHeight;

            // sea level
            if (y &lt; 60)
            {
                continue;
            }

            Vector3f localLocation = new Vector3f(x - (tileSize), y - 1, z - (tileSize));
            Vector3f worldLocation = chunkPositionToWorldLocation(terrainChunk, localLocation);

            Spatial tree = trees.randomTree();
            tree.setLocalTranslation(worldLocation);
            tree.setShadowMode(RenderQueue.ShadowMode.CastAndReceive);

            // add some vertical rotation so that they aren't all stood perfectly straight
            float xRotation = FastMath.DEG_TO_RAD * FastMath.nextRandomInt(-10, 10);
            float zRotation = FastMath.DEG_TO_RAD * FastMath.nextRandomInt(-10, 10);

            // rotate them so they aren't all facing the same direction
            float yRotation = FastMath.DEG_TO_RAD * FastMath.nextRandomInt(0, 359);

            tree.rotate(xRotation, yRotation, zRotation);
            staticRigids.attachChild(tree);
        }

        Node optimizedStaticRigids = GeometryBatchFactory.optimize(staticRigids, true);
        optimizedStaticRigids.addControl(new RigidBodyControl(0));
        terrainChunk.setStaticRigidObjectsNode(optimizedStaticRigids);
    }

    // un-used in this instance - I am using a NoiseBasedWorld
    @Override public String imageHeightmapRequired(int x, int z)
    {
        throw new UnsupportedOperationException("Not supported.");
    }

    private Vector3f chunkPositionToWorldLocation(TerrainChunk chunk, Vector3f chunkPosition)
    {
        float x = chunk.getLocalTranslation().getX() + chunkPosition.getX();
        float z = chunk.getLocalTranslation().getZ() + chunkPosition.getZ();
        float y = chunkPosition.getY();

        return new Vector3f(x, y, z);
    }

}

[/java]

The code above implements TileListener. To add it to your world, add the following code after your world creation:
[java]
// add tile Listener
tileListener = new WorldTileListener();
world.setTileListener(tileListener);
[/java]

In a nutshell, we are taking advantage of the TileLoadedThreaded event to load some vegetation in a threaded manner, rather than doing so on the GL thread.

In this example, i’m just randomly picking positons from within the Terrain Tile and planting a tree there. Your vegetation positioning logic would probably want a little more attention than that, for example this article.

When a tile is loaded, the vegetation is automatically loaded. When a tile is unloaded, the vegetation is automatically unloaded too. Bear in mind that as a result of my crude vegetation position logic, the vegetation will appear in different positions each time a tile is loaded. This code is meant solely to provide proof of concept for paging vegetation.

Video of it in action:
[video]http://www.youtube.com/watch?v=BXfDY6pJxPw[/video]

reveance · November 9, 2013, 6:02pm

@jayfella It looks like your push didn’t get through on github or am I missing something?

jayfella · November 9, 2013, 6:08pm

I added a video so you can see what the hell I’m on about.
The push to github just added the ability, it didn’t add a working demo because jmonkey doesnt have any vegetation in the demo files for me to use. Just copy-paste the code i put in the previous post and it will work, but you’ll need a tree model or something.