[SOLVED] Bullet Memory Footprint

Samwise · March 4, 2019, 9:04am

Hello jMonkeys,

I’m currently working on implementing physics into my blocky-voxel game.
I just decided to go for bullet and it works astonishingly well so far, I create meshCollisionShapes from the meshes that are already created for a chunk (so another time I’m happy about greedy meshing) and I can have like a hundred of balls rolling around the world.
The only problem is memory footprint. Without physics the game takes about 2GB (that is heap and direct memory together) but when I enable physics it takes more than 5GB, so thats 3GB only from bullet and I cannot even see that memory increase in the ingame memory stats so I guess that also hints towards Bullet.

Now that I’m also working on writing chunks to disk that can be seen, but are too far to interact with, i was wondering if there is a similar way to handle the bullet memory. Since creating the collision shapes takes some time (I use the MeshCollisionShapes ‘advanced constructor’ and profiler tells me 99% of the time the collisionshape creation takes is spent in native ‘createShape’ method), I try to create collision shapes for all chunk even when they are too far to interact with, but instead of adding them to the physicsspace I would like to write them to disk

so question: is there a way to let bullet create the data it needs and then write all that data to disk, later read it back and initialize a collision shape from that?

and little side question: is there a way to ask bullet for the amount of memory it currently consumes?

As always, thanks in advance already and greetings from the shire,
Samwise

RiccardoBlb · March 4, 2019, 9:23am

so question: is there a way to let bullet create the data it needs and then write all that data to disk, later read it back and initialize a collision shape from that?

Yes, serialize and deserialize the rigidbodies.

and little side question: is there a way to ask bullet for the amount of memory it currently consumes?

Not in jme afaik.

Samwise · March 4, 2019, 9:38am

Thanks for the quick response.
I had a look already into the read and write methods of the meshCollisionShape, but they seem to only store the byteBuffers that I need to provide in the constructor, but I guess Bullet is basically copying that data and initializing some more like a boundingShape or similar.
the read method in the end calls the same native createShape method that is what actually takes the most time

@sgold do you have something like that in minie? I had a look at your rope demo (which is awesome btw) but for the sake of simplicity i so far didnt change anything and use the bullet that was shipped with the SDK when i downloaded it, i might switch sooner or later anyway though and if i got the docs correctly they should be interchangeable as long as i dont use BetterCharacterControl and some other classes that i dont use

EDIT: I just noticed I might be mistaken as long as i use the memory optimized version, I’ll have a look in it, i dont really get it yet

EDIT2: seems to be exactly like that, when i use the memory optimized version, using a collision shape read from disk is 4 fold faster than creating a fresh one, however when i use the not memory optimized version, the collision shape created from the data read from disk takes as long as creating a fresh one.
Unfortunately creating an optimized version takes twice as long as creating an unoptimized version, so actually the optimized one is no solution

Any idea why for the memory optimized version the data bullet needs is saved, while for the not optimized version the data has to be recalculated everytime?

sgold · March 4, 2019, 2:42pm

I believe Minie still constructs mesh collision shapes using the same algorithms that jme3-bullet uses. If there’s a better way to do it, I’d consider implementing it in Minie.

MeshCollisionShape is powerful and easy to use, but it’s overkill for many situations. If you can decompose any part of your world into convex primitives (like BoxCollisionShape) your game should run more efficiently.

Samwise · March 4, 2019, 3:03pm

also thanks for your reply!
its not entirely about the algorithm used to construct the shapes.
the thing is, when a player walks far from a chunk, i want to remove that chunk from physicsspace.
but since there is a high chance that the player would walk back there later, i was hoping i could save the rigidBody to disk.
Now there is a total of 4 situations
memory optimized meshCollisionShape TRUE and FALSE combined with CREATE fresh meshCollisionShape and rigidBody and READ rigidBody from disk

use memory optimized version, create fresh collision shape: takes the longest amount of time
use normal version, create fresh collision shape: twice as fast as memory optimized version
use binaryImporter to load a rigidBody that was some time before written to disk with a meshCollisionShape that does NOT use the memory optimization: takes as long as creating a fresh one and not reading anything from disk at all
use binaryImporter to load a rigidBody that was some time before written to disk with a meshCollisionShape that IS MEMORY OPTIMIZED: fastest of all 4 situations

now im wondering, when the memory optimized version can safe data to disk that is critical for the setup bullet does, why cant the normal version, too?

my struggle is: i cannot use the memory optimized version since creation is much slower and i need to create the collisionShapes rather quickly. however since the player would walk around much, i would like to store the rigidBodys to disk instead of creating new ones each time the player comes back. this however only seems to work with memory optimized meshCollisionShapes

so its not about the actual algorithm that creates the collision shape, more like the algorithm could completly be skipped if there was a way to store the data to disk and use it instead of creating a new one

sgold · March 4, 2019, 3:23pm

I’m sure it’s possible. However, since the memory optimization is done in C++, capturing the optimized data in Java (in order to save it) might be tricky to implement. I’ll look into it.

pspeed · March 4, 2019, 3:29pm

I think he’s saying that the memory optimized version is saved… but it takes longer to generate.

The non-memory optimized version does some runtime optimization (takes a while to do) that is not saved/loaded… so it must do that again when reloaded.

Still, probably a similar issue. We hand some stuff to C++ and it still has to crunch on it.

Samwise · March 4, 2019, 4:39pm

exactly what @pspeed said.

here is what the profiler told me:
CREATE FRESH rigidBody and meshCollisionShape:
for memory optimized version:
6.192 ms (6.053 ms spent in native createShape() method)
for normal version:
2.560 ms (2.499 ms spent in native createShape() method)

LOAD rigidBody and meshCollisionShape from disk:
for memory optimized version:
734 ms (419 ms spent in native createShape() method)
for normal version:
2.677 ms (2.534 ms spend in native createShape() method)

i increased chunksize for this to make the differences more obvious

only when using the memory optimized version, the MeshCollisionShapes native saveBVH() method is called when saving it to disk, so it seems like for the optimized version, all data bullet needs is available to jme. i dont see why there is a difference for the normal version?

sgold · March 4, 2019, 5:07pm

Sorry, I was confused. This does seem worth looking into.

sgold · March 4, 2019, 10:10pm

I believe I got this change working in Minie and uploaded a new 0.7.1 release. Would you care to try it out?

Release Minie v0.7.1 · stephengold/Minie · GitHub
or
Version Minie/0.7.1 - stephengold

Samwise · March 5, 2019, 12:07pm

Sorry, I went to bed short before you posted and I’m at work currently, but I’ll try it out this evening (that is in like 7 hours) thanks already!

Samwise · March 5, 2019, 8:43pm

seems to work fine!
it still takes some time to load the rigidBody and half of that time is still spent in native createShape method but loading a rigidBody with not-memory-optimized meshCollisionShape is now 3x faster than before
thanks a lot for the quick change, i’ll stick to minie now

sgold · March 5, 2019, 9:24pm

It should be possible to add this feature to jme3-bullet. I’ll open an issue as a reminder.

sgold · March 6, 2019, 12:52am

serialize the BVH of a memory-optimized MeshCollisionShape in native Bullet · Issue #1032 · jMonkeyEngine/jmonkeyengine · GitHub

Empire_Phoenix · March 6, 2019, 7:20am

If I remember right, last time I took a look at this, the whole BVH serialisation was incompatible between different platforms. Using a linux generated one on windows resulted in hard crashes in the c code.

Samwise · March 6, 2019, 8:14am

I initially thought exporting rigidBodies was meant for some editor-stuff in the SDK which would always produce memory-optimized meshCollisionShapes because its doing it once only and makes for faster loading ingame (and thus would not need to support loading non-memory-optimized collisionShapes) but those exports would have to be compatible between platforms (well dont HAVE TO, but would make it easier).
On the other hand, for what I need, i create these rigidBodies and meshCollisionShapes at runtime (and as i need it to be as fast as possible i cannot go for the memory optimized version), still want to store them for later use when the player walks back and thus they dont have to be compatible between different platform

maybe i can even go for the memory optimized version, i’m planning to create new meshes (actually only the index and position buffers ofc) specifically for creating a meshCollisionShape from them. i do use greedy meshing and i greedy mesh faces as long as they got the same identifier, but this identifier could be different for physics than it is for geometries. for example for geometries this identifier includes the light at this face which prevent differently lit faces from beeing geedymeshed, but for physics light doesnt matter and thus i could exclude it from the identifier and thus greedymesh differently lit faces. i could instead add some sort of restitution or something to the identifier so faces that are different blocks (and textures) but have the same restitution factor would be greedy meshed for the meshCollisionShape. so all in all i guess i could reduce the size of the meshCollisionShapes because it consists of less but bigger faces and then creating the memory optimized version might be as fast as the non-memory optimized version currently is. also seems a good way to get the collisionShapes on the server and dont do any geometries meshing. I’m drifting off but thank god i got a free day today and can try it out

sgold · March 8, 2019, 7:30pm

Good point. The next Minie release will include a failsafe to prevent such crashes.

jayfella · March 9, 2019, 7:55am

Have you considered not loading so many and such large shapes? Why not just load a few small shapes around the player like this? (Skip to 1:30)

In this video I create any sized collision mesh I want because I have the noise data to be able to do that. I do the same for any other vehicles. You can pool them for entities that are near each other. They can even collide outside the drawn scene if for some reason you want that.

The point is that collision is not tied to the scene as you see it. You can make smaller collision meshes, and less of them.

Samwise · March 9, 2019, 9:53am

Thanks for the hint, but its exactly what I’m doing. (btw since I’m an active forum sneaker i have seen the video already ofc, good job on that one)
Just since its not one of those endless runner games, i would like to keep the collisionShapes somewhere around so i dont have to recreate them once the player walkes back. and since the map is big, well not ‘huge’ but still too big to keep completly in physicsspace, i would like to store them on disk instead of keep them in memory and only remove them from physicsspace

When i started with physics i thought like "uhm i’m so super smart, i just use the meshes that i create for the chunks anyway and create meshCollisionShapes from that.

In the meantime i had time to implement the optimization that i mentioned in that i do not use the meshes to create the collisionShapes, but instead create new index and position buffers that can greedy-mesh way more faces since it does not have to care for different textures or different light and once im done with this for a chunk i create a meshCollisionShape from these index and position buffers.

There is quite a lot of trees also and while 2 transparent blocks next to each other need to have their “shared face” be visible (like you have 2 leaves blocks, you need to see the faces in the middle since you can look though the other faces), but just because i can see them doesnt mean i can touch them, so i dont need them for the physics.
And while creating a NON-memory optimized meshCollisionShape with the first approach took like 60ms, i can now create memory-optimized meshCollisionShapes (which previously took around 120ms to create) in only 25ms (its still by far the longest calculation that is done on a chunk: creating, light-flooding, postGenerationOptimization and meshing are way faster) im still ok with it because its less than one frame at 30fps.
when creating meshCollisionShapes, like 95% of the time is spent in the native createShape methods so im fine with the fact that i need to invest like 2-3ms to create new index and position data since the buffers resulting from that physics-optimized greedy meshing are way smaller and thus result in way less time spent in native createShape call
just reading it from disk is a lot faster still, so for the chunks that had a collisionShape created already, i read them from disk instead of investing 25ms to create a fresh one

greetings from the shire,
samwise

jayfella · March 9, 2019, 11:40am

You create them on the GL thread?

I don’t know how your setup goes, but I would have a threadpool that creates chunks ONLY - because they need priority. Their generation shouldn’t be hindered by a busy multi-use pool. They should also have the ability to be cancelled. So if I move into a chunk but then move back - if that chunk is no longer required I should abort it’s generation if it hasn’t started being built. This means chunks that are required are generated faster instead of generating a chunk and not even displaying it because it’s not needed.

In a second pool with one or more threads, create the collision meshes as required.

Since the mesh visual and the collision meshes are separate as I outlined previously they aren’t dependent on each other. The collision meshes could well finish before the chunk because you probably have 96 chunks and only 9 collision meshes to generate.

Edit: Don’t forget that you can also page vertically. a 16x61x16 (or whatever size) collision mesh would be better than an entire chunk column of 16x16x128, for example. The size reduction would be significant.

My point here is in the size of the mesh you are creating. You arent (or at least shouldnt) be restricted.