in my blocky voxel game thingy, i still create new meshes each time a chunk needs to be remeshed and i thought its finally time to go for that optimization and reuse my meshes, rewind the buffers and put the new data instead of creating new buffers again and again
but now im wondering: if the mesh already existed (and thus is currently attached to the scenegraph) can i override the buffers from another thread than jmeMain thread because they are actually sent to the GPU already and then i only have to set the update hint on the jmeMain thread? if so, is this reliable or does it depend on the hardware?
Thanks in advance and many greetings from the shire,
thanks for the fast reply!
well that sounds sad though
but well i guess there is 3 cases: i might need a bigger buffer, but i might as well have a perfectly sizes one already or need a smaller one and i actually guess all 3 cases are valid and i can think of scenarios for each. since i can set the limit on the buffer (which works looking at the particle emitters) i can definitely at least sometimes save the act to create new buffers, and instead reuse the old ones. but i guess i will have to do that on the mainThread then…
I’ve always erred on the side of the argument that says regen is better than modification with these types of surface. Generation is usually really quick anyway and it’s threadsafe (I presume). I’m not sure I’d complicate my code for the sake of a millisecond or three once in a blue moon.
But when it doesn’t… random nearly-undetectable bad stuff will happen that often cascades into worse things. So every new bug will be “I wonder if that’s because I have code that isn’t thread safe”. Even worse when native code is involved.
It’s just so easy to avoid it. The code to do proper threading is always way simpler than the dozen work-arounds that never end up really solving the problem.
In this case, if native buffer reuse is really the thing… generate your chunks into temporary in-memory buffers and then flush them to native on the main thread.
To me, in a block world game, memory is the biggest issue. Every byte you waste is potentially more chunk data you could have put in the view. I’d trade a little speed for memory savings almost every time. So to me it’s better to always have “right size” buffers… ie: create them when you need them. Even if a new chunk is generated every second, that’s 60 frames of more chunk data able to be displayed. Compared to how often it’s rendered, chunk data is almost never regenerated.
thanks @RiccardoBlb and @pspeed for your answers, too!
i was actually hoping for something like what riccardo said, but since i already do the meshing in lists (similar to arraylists but for primitives) i can also follow pspeeds advice and check if buffers already exist, if so check their size:
if they are big enough i should, on the mainThread, put the new data from the primitives lists into the buffers and update the meshes counts
however if they are not big enough i can create new buffers and put the data from the primitives lists into the new buffers on the meshing thread that im currently on, and then on the mainThread only set them as the vertexBuffers data which causes them to be resent, and also update the counts
is that a safe way?
i still think it might be worth it (first because there is more chunk updates than one per second in my game and secondly because when placing or removing blocks the sizes wont change much. on average i have (including lod levels) ~12300 indices and ~8200 vertices per chunk with a total size of ~130KB and when i remove a block that will only change by a tiny amount relatively)
i might as well add a check to see if the already existant buffers are bigger than the required size + 5% and if so, create a fresh one because it would be too much wasted space.
but when i only fill the buffers up to like 90%, then flip them setting their limit to that position, when they are then sent to the GPU, only the 90% are sent and the buffer on the GPU is not bigger than the data that could be read, so i would only waste some memory but i would not sent unnecceserry data to the GPU, does it work like that?
and also a buffer is only sent once and each frame afterwards only linked too, right?
The point of using native buffers is that the RAM is in native memory. JME may use the limit to tell the GPU that there are only 250 elements of vertex data… but I think JME would have to send the whole buffer. And if it didn’t, then you’d be resending the whole buffer to the GPU every time it changed anyway… sort of negating any benefit you are deriving from this approach.
On the other hand, if you are allowing up to 10% overage on every buffer… for every 10 chunks, that’s another whole chunk of data you could have had in view. For a 10x10 flat area, you could have loaded a whole other row of chunks.
i still dont really get the point with potentially having more chunks in view. when the buffer is only resend when it changes and usually is only linked to, how does that have an impact on the amount of chunks in view? or do you mean loaded in memory in general?
for me i dont find the GPU memory to be the bottleneck but the fragment shader instead and according to HWmonitor also the bus interface has a maximum usage of 62% only
and even if a chunk is loaded it is still compressed as long as no blocks changed in this chunk within the last 5 seconds (in memory not on GPU)
on the other hand i recently played around with particle emitters and looked at the buffers usage hints like dynamic, static or streamed and chosing streamed had a high impact on the bus interface (no surprise) so adding it to the game might change the bottleneck to be the bus interface but i guess i will have to see what works best
regarding information sent to GPU, are meshes buffers that were sent to GPU already on GPU until they are explicitly destroyed? that is, can i detach them from the scenegraph and attach them again some frames later without having to send them to GPU again?
im wondering since i guess it would usually be better to detach chunks that are not in view but i noticed some hard FPS drops when detaching and re-attaching instead of changing cullhints.
on the other hand that might have been due to some other reasons, i guess its better to do all spatial detaching and attaching at once without any iterations inbetween to not cause the SafeArrayLists to regenerate the array several times per frame and it might be i had it messed up back when i tried last time
and also about the attaching and detaching: does it make sence to attach spatials at the beginning of an update loop because they could be send to GPU already while doing other stuff in the update loop, before finally issuing a draw call for them at the end of the update loop? im afraid it doesnt work like that, but would it be possible in theory?
im sorry my questions tend to get long and many, but i know little about when which data is send to gpu, wheather those are sync or async calls and how to overcome the potential bus interface bottleneck that i read about everywhere which im not actually facing yet
Bounding shapes, transforms, etc. all need to be recalculated all the way up the scene graph. Things that might not have been active before need to be active. So, the data might have still been on the GPU or it might not. Once you detach it then the system is free to decide that the space can be used for other things.
No. Nothing gets sent to the GPU until render.
You are better off limiting the number of thing you attach per frame.
Let’s say you have 100,000,000 bytes of RAM and each chunk takes 90,000 bytes on average… but for a while they took 100,000 bytes. Because your new ones only took 90,000 bytes you kept the old buffers around and reused them. So now you can physically only have 1000 chunks in RAM… but if you right-sized your buffers, you could have 1,111 in RAM.
It doesn’t matter whether we are talking about direct memory or heap memory. For me, the single biggest thing I always wanted was a more distant horizon and the single biggest limiter to that was RAM usage. I took every opportunity possible to push out the clip distance… byte buffers instead of float for position, precalculated normals/tangents/etc. based on an int attribute… whatever I could do to get the direct memory down so I could have more chunks visible. 192 meter max clip was a constant RAM pig gobbling up the memory I needed to run the game itself. And that’s not even very far away. Even LOD has minimal impact there.
In the new engine, I opted to render far terrain completely different just so I could at least push the visible horizon out to 2 km or so… then 128 m clip was ok. It just creates a whole bunch of other “can I get away with this?!?” visual problems.
allrightie gonna keep using the culling instead of detaching if i feel like i might need to reattach it later
but would it be possible? or would that make no sence because the drawcalls are async and the GPU is probably still busy rendering the last frame anyway during an update?
im doing that already, i just thought there might be an alternative
ok thats totally true of course, but im compressing my chunks (like even the chunk you are standing in currently as long as no blocks changed in that chunk during the last 5 seconds) and although i do still have those double-border blocks (meaning i got 34 x 34 x 34 ints = 157_216 bytes per chunk) i got an average compression rate of 87% meaning one chunk on heap is only ~20kB so i can fit around 5000 chunks into those 100mB
and regarding direct memory, greedy meshing was the optimization with the highest impact (aside from the packing that you also do, i actually just realized i was using shortbuffers for the positions still, quickly changed it resulting in around 96kB direct memory per chunk on average now including lod levels)
so heap and direct together thats ~116kB per chunk and with 500mB that im willing to spend for voxel related stuff i can fit ~4300 chunks in that
Just to note, This includes removing, too. Garbage collection will cause microstutter if you’re removing huge quantities per frame.
In my pagers I can specify how many chunks to add and remove per frame. I remove first for obvious reasons. I generally get away with removing more than I can add. So say remove 10 per frame but only add 6 or 7. Whatever suits your setup.
exactly the way i am doing it, too
although playing around with instanced geometries (i instanced a single block so i can have an instanced node with hundrets of those instanced blocks that i can use to animate structure placement etc) and i can easily remove 250 of those instanced geometries from that instanced node per frame, although i keep them around to reuse them later
then i had a look at the particle emitters and did another approach: have one geometry with point mesh with vertexbuffer for position, rotation, scale, alpha and texture, use streamed buffers and a geometry shader to emit one cube per point. when particles die, the buffers are not completly filled anymore and i can see a constant descrease in the vertex counts while seeing a constant increase in fps (thus i thought it works to set the limit and send less data to GPU but i guess its only for streamed buffers where there is just less data streamed because less data is looked up)
and that means i can remove all those blocks at once because its only one geometry
ye i got that when you said that would imply resending the buffer each time the limit changes, thanks for making it clear again though, but im now wondering how does it work with streamed buffers? are they only ever in main memory and GPU looks it up there when its needed? or whats the difference between static and streamed?
and geometries that are added to the scenegraph but are culled right at the beginning are also first send to the GPU once they are actually getting rendered right?
And i know that topic pops up again and again, but i want different culling behaviour for the main cam as compared to the reflection cam from the water postprocessor, is it legit to extend node in this case and override the checkCulling(Camera cam) method?