VoxelEngine Design Questions

Samwise · August 26, 2019, 9:52am

Hello fellow monkeys,

over time my project has turned into a huge mess and since i was quite busy the last month i feel like to find back into my code its a good idea to tidy up and redo some parts. and i would like to make sure i do it a reasonable way
ill try to sum up my approach and then ask for any idea about improvements / alternatives.

note: the terrain does not need to be “infinite” as in minecraft eg, still big enough to require some sort of paging. and about terminology: by “chunk” i mean 32x32x32 blocks, by “column” i mean a vertical stack of chunks (32x256x32 blocks = 8 chunks stacked on top of each other)

when a player spawns / moves etc i check which columns need to be loaded and mark them, then i tick the column the player is in.
the tick causes the column to “basic terrain generate”, that is create some rock strata, caves and overhangs (basically everything that does not need any information about neighbour chunks). this happens on column-level meaning all chunks in a column are basic-terrain-generated at the same time since it allowed for some optimizations.
when the column is done basic terrain generating, it ticks all 8 neighbours, which in return causes them to basic terrain generate if they are marked as “need load”.

some time later (that is, when the last of the 8 surrounding columns has finished basic terrain generation and ticks its neigbours) the column that was basic terrain generated first is ticked again, realizes that all surrounding columns are already basic terrain generated now and starts complex terrain generation.
complex terrain generation places structures like trees and buildings that might overlap chunk-borders / column-borders, which is why it needs the surrounding columns to be basic generated (so trees stop growing if they reach into a neighbour chunk and there is a rock stopping the tree eg)
those ticks happen whenever a column / chunk has finished some task.

after complex terrain generation there is a step called basic light calculation.
basic light calculation again does not require any surrounding columns to be complex terrain generated or even basic terrain generated, which is why it basically starts right after a column has finished complex terrain generation.
it starts to floodfill sunlight from the top and emits light from light emittting blocks, but never crosses a chunk border.

later, when a column is ticked which is basic light calculated already and it realizes that all 8 surrounding columns are basic light calculated too, it starts complex light calculation.
complex light calculation takes the outer most blocks of a column and floods their light into the corresponding touching neighbour column (north east south and west, 4 floodfills total), those flood their light into the diagonals of the original chunk (2 floodfills per north east south and west = 8 floodfills), those diagonals floodfill their light back into the north east south and west columns (again 8 floodfills) which then flood their light back into the center column (another 4 floodfills).
this is because the floodfilled light can go around corners and a chunk might have a wall of blocks reaching all the way to its chunk border, which means i have to floodfill the light out of the chunk, and then floodfill it back in, maybe even via the diagonal column.
when a column is done complex light calculating, it is considered fully loaded and can start meshing / collisionShaping in parallel

now although i feel like it works quite well, im sure there is a lot to optimize especially regarding the light calculation.
also my current approach requires me to have a loadrange that is at least 2 chunks higher than the viewrange (because to mesh a chunk it needs complex light which means its surrounding columns need basic light, so they are also complex terrain generated, which means their surrounding chunks needed to be basic terrain generated)

now i thought i could combine the 2 light calculation steps into 1 step and just fully floodfill the light thats emitted by blocks and not care about column borders because i know the surrounding columns are at least complex generated (meaning they got all their solid blocks placed and light cannot reach further than the length of a chunk-edge).

but thinking about it im not sure its an optimization because it might cause more calculations than needed: say i floodfill sunlight from the top in one column, but there is an overhang fully cutting the chunk horizontally. that causes the sunlight to floodfill into the neighbour column (losing strength because going sideways) then go down further (losing strength because its not at max strength anymore) and reach back into the original chunk underneach the overhang.
then the neighbour column is lit, sunlight is floodfilled from the top, and all the sunlight from the original chunk is basically overwritten because the new sunlight is brighter since it didnt need to go around such big corner

updating light upon block placements is no problem and im fine with the performance. im also fine with the performance of my current approach for light calculations upon generation but im sure the algorithm itsself is not efficient, i just dont have an idea to compare it to.
and in some article about occlusion culling in minecraft i read that 2 of their biggest improvements were occlusion culling and chunk scheduling (if im not mistaken) so i guess a good approach for loading chunks makes a good approach for scheduling that easier
i would like to base the chunk loading scheduling on the occlusion algorithm because that tells me which chunks are visible and thus which columns need to be loaded and i cannot fit my current approach with the ticks into this

as always, thanks a lot in advance for any input or idea, experience etc
many greetings from the shire,
samwise

oxplay2 · August 26, 2019, 7:06pm

im not voxel expert, but im not sure what you exactly do with this “light update”.
why cant you just use DirectionalLight and AmbientLight?

it starts to floodfill sunlight from the top and emits light from light emittting blocks, but never crosses a chunk border.

you really need to show me what you mean.

anyway if i understand correctly you dont have “infinite world” so why not just generate it all on “new game”.

“so trees stop growing if they reach into a neighbour chunk”

what if neightbour chunk have no obstacles, but tree anyway will look odd because it was unable to grow properly. it sounds like issue here. Why anyway all trees affected by chunks? not just part of tree?

i think chunk should contain “voxel blocks data within chunk boundings” but not full objects like trees. (again im not voxel specialist, but imo this should work this way, no exceptions)

So imo Tree if placed on border between 2 chunks, then 1 chunk should contain “half” of tree, and second chunk other “half” of tree. Because i understand your trees a made from voxel blocks too? (like minecraft trees)

Samwise · August 26, 2019, 8:10pm

first of all thanks for your answer!

well i cannot use directional light and amient light because i have a huge amount of light sources and they can be placed anywhere (just like in minecraft) so when meshing a chunk i already take the light at a face into account that i put into a vertexbuffer. so in the shader its super easy to get the light at the face

https://www.seedofandromeda.com/blogs/29-fast-flood-fill-lighting-in-a-blocky-voxel-game-pt-1 this basically explains it and definitely a lo better than i could

well i cannot generate it all on new game because the world might not be infinite but unfortunately is still too big to completly generate on start

im sorry if i wasnt clear, i meant texactly what you described, which is why the surounding chunks / columns need to be loaded already so i can check if there is a solid block. the chunks indeed only contain data about voxels within their bounds

jayfella · August 26, 2019, 8:11pm

I don’t know if you’ve read these before but in my endeavours it provided some handy tips for speeding things up.

It does include things like neighbour checking and fastest algorithms for flood filling.

There are two parts.

jayfella · August 26, 2019, 8:12pm

Hah you ninja’d me with the same links.

Samwise · August 26, 2019, 8:24pm

thank you too for your answer!
yes indeed i know these articles already, i have read a lot about all this on the internet and those are some really nice articles

just my problem is not updating the light (i actually use the algorithm from the article for the light updates) but instead light calculation upon world generation is my problem.

say a column has finished generating all the solid blocks including trees etc, and i can start lighting it.
currently i first check the whole column from top to bottom for the first solid block, setting all blocks on the way to full sunlight. then i start a floodfill from that lowest layer of sunlight (which touches the highest solid block). this is an actual floodfill with several blocks as source (the whole layer of sunlight)
in my current approach, for the lighting i dont require the neighbour chunks to have finished generating solid blocks, which is why all those floodfills stop at the chunk borders and i have to do that “complex light calculation” step to make sure light can pass through one chunk back into the emitting chunk behind a wall for example
if i change the approach to first start lighting when a column is surrounded by columns that all have finished generating their solid blocks, then it might happen that i do useless floodfills.
those floodfills are useless and the algorithm would actually skip them but the light that would cause them to be skipped because its brighter, is not there yet (it would come from the neighbour chunk which is not yet loaded)

im sorry im really bad at explaining things

oxplay2 · August 26, 2019, 8:32pm

oh man, i would really need spend a lot of time learning algorithm of this floodfills.
Anyway thanks for links im more clever now anyway knowing this thing.

but as i understand it just fill “free” space with light from lightsource.

I also belive your lightsources have something like “light radius”?

my question could be stupid because i dont know alhorithm fully, but is it possible you could do this “floodfill” alhorithm for light sources only if their “radius” have all chunks loaded and floodfill was not already done?

i mean lets say light source is between columns/chunks(where one chunk is not generated so you dont know how it will look for algorithm). So it will only “appear”(use alhorithm to calculate light) when player will move closer to load required chunk for this lightsource?

edit: but now when i re-think, it might be stupid anyway since Sun light is global(and i belive its also floodfill algorithm calculated.

pspeed · August 26, 2019, 8:38pm

I haven’t read the articles so I don’t know how they do it… but generally for a flood fill you have a list of pending cells… the ones you just injected light into. You work these one at a time to see what neighbors you need to add to the pending cells. (Lower light than you, yes, higher light than you, no.)

That out of the way…

A particular column has several “lit” states.
Not lit: no lighting has been calculated at all
Partially lit: all internal lighting has been done but only some sides have been calculated
All lit: all internally lighting and all neighboring lighting has been calculated and applied.

When you light column X,Y, you need to partially light X+/-1, Y+/-1 so that it can apply its lighting to you… part of that process will also reflect your own floods back to you. Edit: this is to fully light column x,y.

When you need to light those for real, you only have to seed the pending list with neighbor contributions (and your own edges).

Samwise · August 26, 2019, 8:42pm

they dont have an explicit radius, they have a strength instead. and since the strength decreases per step, they effectively got a radius. with my setup i guarantee that any light source can only reach into the surrounding chunks, meaning no light source is stronger than a chunk is long, to put it like that

and youre right, especially sunlight is the problem, i cannot find a way to get rid of useless floodfills that will be overridden when a neighbour is loaded (its not like this happens with every column, but with a terrain with many overhangs it will happen more often

Samwise · August 26, 2019, 8:51pm

thanks a lot for your answer too!
in my approach i also have several lit states:
not lit: no light at all
basic lit: all 32x32 top blocks have floodfilled their sunlight and all light emitting blocks have emitted their light, but none of these floodfills crossed a column border.
complex lit: the columns outer blocks (if in any color or sunlight brighter than the touching block in the neighbour column) are floodfilled into the 4 touching columns.
and now comes what i feel like must be a mess but still seems needed for correct light: i need to floodfill the light from those 4 neighbours into the 4 diagonal neighbours, from those neighbours back to the direct neighbours and then back into the center column to make sure the light has had all chances to reach around any corners
because i dont see how does light from diagonal neighbours reach into a chunk?

edit: all 8 surrounding columns are basic lit already when doing those tons of floodfills to complex light the center column. thus light from any light emitting blocks inside those columns has rached to the border already and all those floodfills are all only picking up the brighter blocks at the borders and floodfill those into the columns

pspeed · August 26, 2019, 8:56pm

I don’t understand the question.

In what I describe, to light column X, Y (and you are always basically lighting whole columns and not single chunks), you need to do a partial calculation on all 8 surrounding columns So lighting is calculated on 9 columns. Some of them may already be ‘mostly’ complete.

Personally, I keep a bit mask on the column to let me know when I’ve calculated lighting from a particular direction. So 0xff is full lit.

If the 9 columns have never had lighting before then the result is that the center is fully lit, the corners are lit from two sides, the directly up/down/left/right columns are lit from three sides, etc…

Actually, I’m fuzzy on this because I think in my analysis that I didn’t need to keep track of corners in the bit mask because tracking just the sides will take care of it if lighting is always done on a set of 9 columns.

Samwise · August 26, 2019, 9:05pm

although you said you didnt understand the question, you just answered it
i should just use a byte to keep track of which sides have been flooded already, so there is still 3 lit states: not lit, basic lit and complex lit. but complex lit would subdivide into states dependant on which directions have been flooded already and as you said only 0xf is full (and im sure youre right i dont need to keep track of the edges, i just need to ensure before flooding light from a column to the west side for example, this neighbour has had light flooded from north and south already (because those columns are effectively the edge-touching columns of the column that is to be filled)

and regarding the scheduling, do you also schedule the loading or only the meshing? like using my occlusion culling algorithm i can check which chunks are visible and schedule only those chunks for meshing, ordered by distance. but does it make sence for loading, too? i guess i should get a chunk trough all those generation steps as soon as possible instead of first basic generating all columns in range, then lighting them etc

pspeed · August 26, 2019, 9:13pm

I don’t really think of things in terms of a ‘scheduler’.

Things ask for stuff. That stuff might be computed by a thread pool and result provided synchronously or asynchronously depending on the request.

So the visualization says “give me this column/chunk/whatever”… that’s an asynchronous request that kicks of a thread pool worker. It collects the stuff it needs to generate the mesh… it asks the world for the voxel data synchronously. It asks the lighting system for the lighting data synchronously. These may have their own workers that calculate this stuff and return the results… or maybe they just return the results. The worker doesn’t care.

Once done, it puts its result in a queue so that the visualization can add one or two pending geometry to the scene per frame until the queue is drained.

Repeat.

Mix in caching at each stage.

As far as occlusion, whatever… I don’t think generating a chunk is as important as culling a chunk. It’s the per frame costs that you really care about and any chunk potentially needs to be generated eventually.

Also since lighting should be done on whole columns, if you need any lighting for that column then you are going to load/generate all of those chunks anyway.

(I tried lighting per chunk originally and nearly killed myself. I switched to keeping the lighting data separate in one giant one dimensional array for a whole column… life was simplified quite a bit.) (Edit: note that’s in unreleased Mythruna engine and not the released one. The released one is a hybrid where lighting is still stored in the chunk but lighting is always calculated for a whole column… I switched my world database to always load/generate full columns of chunks… always.)

Samwise · August 27, 2019, 4:41pm

i had to go to sleep because i had to work today so sorry for the delay.
this is really enlighting. although i also ALWAYS generate chunks in full columns (allowed for a lot of optimizations), i do still have the light packed into the chunk data (so a 2d array is given to those algorithms: one array containing 8 arrays, which contain the voxeldata for those 8 chunks per column)
thinking about cache efficiency, didnt you notice a drop in performance because you now have to switch between reading the voxel data array and reading the light array instead of having it at the same place? i can see that it might as well not make a difference because i can now use shorts instead of ints and when checking for a color to apply to a face i might have to jump in memory anyway

so say you got a chunk size of 32x32x32 and a mapheight of 256, then you got one array of length 32x32x256 for the light and 8 arrays of length 32x32x32 for the chunks? or do you keep the outer blocks that are needed for seamless chunk borders? because i guess i want to get rid of them now and im wondering if the lower memory is worth the performance for chunk-border-lookups while meshing and collisionShaping (because i use bullet i have to basically run the meshing algorithm twice and thus have double the performance loss)

pspeed · August 27, 2019, 9:08pm

If you are using multi-dimensional arrays then you probably should throw any thoughts about cache coherency (already a strange topic in Java) out the window… since a multidimensional array is really a bunch of one dimensional arrays potentially spread all over memory.

Note: you will reduce size a LOT and increase performance measurably if you implement things with single dimensional arrays.

Here is an example from the “not yet open sourced” MOSS libraries:

/*
 * $Id$
 * 
 * Copyright (c) 2016, Simsilica, LLC
 * All rights reserved.
 * 
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions 
 * are met:
 * 
 * 1. Redistributions of source code must retain the above copyright 
 *    notice, this list of conditions and the following disclaimer.
 * 
 * 2. Redistributions in binary form must reproduce the above copyright 
 *    notice, this list of conditions and the following disclaimer in 
 *    the documentation and/or other materials provided with the 
 *    distribution.
 * 
 * 3. Neither the name of the copyright holder nor the names of its 
 *    contributors may be used to endorse or promote products derived 
 *    from this software without specific prior written permission.
 * 
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 
 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
 * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 
 * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 
 * OF THE POSSIBILITY OF SUCH DAMAGE.
 */

package com.simsilica.mblock;

import java.util.Arrays;

import com.google.common.base.MoreObjects;

import com.simsilica.mathd.*;

/**
 *  A flat-array based implementation of the CellData interface.
 *
 *  @author    Paul Speed
 */
public class CellArray implements CellData {

    private final int xSize;
    private final int ySize;
    private final int zSize;
    private int[] array;
    
    public CellArray( int size ) {
        this(size, size, size);
    }
    
    public CellArray( int xSize, int ySize, int zSize ) {
        this.xSize = xSize;
        this.ySize = ySize;
        this.zSize = zSize;
        array = new int[xSize * ySize * zSize];
    }

    public final void clear() {
        Arrays.fill(array, 0);
    }

    public final void clear( int value ) {
        Arrays.fill(array, value);
    }
 
    public final int getSizeX() {
        return xSize;
    }

    public final int getSizeY() {
        return ySize;
    }
    
    public final int getSizeZ() {
        return zSize;
    }

    public Vec3i getSize() {
        return new Vec3i(xSize, ySize, zSize);
    }

    public final int[] getArray() {
        return array;
    }
    
    private int index( int x, int y, int z ) {
        // We store things as a sequence of y strips, basically.  This 
        // makes it better for run-length encoding and so on.
        // Note: to save my brain later, xSize doesn't appear because 
        // it's the 'major' coordinate.  Think if we were just doing
        // x, z... index = x * zSize + z.  xSize doesn't appear.
        // It's the same here: (x * zSize + z) * ySize + y
        // I just wrote it differently.  (I'd change it but this second
        // I'm not 100% sure it wouldn't affect behavior and I don't have
        // time to test it... I'm only 99.999% sure.)
        return (x * ySize * zSize) + (z * ySize) + y;    
    }

    @Override
    public final int getCell( int x, int y, int z ) {
        return array[index(x, y, z)];
    }
 
    @Override
    public final int getCell( int x, int y, int z, int defaultValue ) {
        if( x < 0 || y < 0 || z < 0 ) {
            return defaultValue;
        }
        if( x >= xSize || y >= ySize || z >= zSize ) {
            return defaultValue;
        }
        return getCell(x, y, z); 
    }    
       
    @Override
    public final int getCell( int x, int y, int z, Direction dir, int defaultValue ) {
        Vec3i v = dir.getVec3i();
        return getCell(x + v.x, y + v.y, z + v.z, defaultValue); 
    }
    
    @Override
    public final void setCell( int x, int y, int z, int type ) {
        array[index(x, y, z)] = type;
    }
    
    @Override
    public String toString() {
        return MoreObjects.toStringHelper(getClass().getSimpleName())
                .add("xSize", xSize)
                .add("ySize", ySize)
                .add("zSize", zSize)
                .toString();
    }    
}

Samwise · August 28, 2019, 7:09am

No multidimensional arrays, i just wanted to show where the size comes from, thus 9 arrays per column: one per chunk and another one holding those 8 chunks as a column
and i really dont think cache efficiency is a strange topic in java. it definitely is nonsense for methods that are not hot. but already using 1 dimensional arrays and always iterating in y, x, z order i noticed a big increase in performance regarding this specific task when i realized that one of the light calculations still iterated in old x, y, z order and all i did was switch the order of iteration. I can only see cache efficiency as a reason.
And also specifically regarding the floodfill, when i switched from an object based queue with nodes just as provided by java to my own implementation that tries to mimic struct arrays by using a primitive array and a blocksize (blocksize basically specifies how many positions in the array belong to one “node”), i also noticed a HUGE increase in performance

my problem really is specifically about how to efficiently floodfill the sunlight when i have to consider that the neighbour chunk might not have floodfilled its sunlight already
but keeping a byte per chunk to flag from which sides light has been floodfilled already seems like the solution

just as you did, i also came to the conclusion calculating light can always be done on a per column basis, not on a per chunk basis because it allows for some optimizations. but since i still keep those outer most “buffer-blocks” for seamless chunk transitions and my light is stored inside the voxels (i use a int array of length 39304 (34x34x34) per chunk of which 16 bits are used for the light, 4 bits per color + sunlight) but you seem to use an array of length 32768 (32x32x32) of shorts i guess for the voxels plus an array of length 262144 per column for the light.
since i have to run my meshing algorithm twice per chunk (for the mesh that is used for the geometry and for the mesh that is used for the MeshCollisionShape) i was wondering if you think i can still get rid of those outer most buffer blocks that i keep so i can run the algorithm on a single array and dont have to do the border checks

pspeed · August 28, 2019, 7:38pm

I don’t know. I’ve gone back and forth on border checks.

Either you end up setting the same cell in up to 8 chunks (because of borders) or you do out-of-bound border checks.

For regular mesh generation, I ended up just packing the visible faces masks right into the chunk. Then I only need to do border checks at generation time and when cell values change, at least… and only near where they change.

Similarly for lighting, you actually don’t care about border checks. You can seed your flood fill with the influence from the neighbors all at once. It’s only the actual fill that cares about moving around and that’s a relatively sparse propagation anyway. Further optimizations can be made if your “pending cells” keep track of where they came from to avoid immediate back-tracking. Then a traversal into the neighboring column is like a one time ‘miss’… then the seed it in the next column and moves around as appropriate.