Memory usage of Spatial

llama · July 23, 2005, 4:18am

For the past week or so I've been looking at how to reduce the memory used by Spatial. First is (as on the list) the memory used by Scale/Translation/Rotation.

I've looked at some solutions, and I've implemented them using "Scale" as a testcase. The same would apply to the others. There are 3 basic techniques I've used:

Init both local and world scale to null. When no changes are made the scale of the parent is returned, or a static final Vector3f(1f, 1f, 1f); Advantage: less memory used. Disadvantage: the obvious extra method calls.
During worldUpdate set the world scale field of Spatials with a "null" value to that of it's parent (to be clear: no extra memory will be used for this itself, I'm just changing the reference). Advantages: eleminates all extra method calls. Disadvantages: an extra field (just a reference) in Node if used in combination with 3. worldUpdate becomes slightly less efficient (branches).
Set local scale to null after it's been used in worldUpdate. This means each time that you want to change the scale of an object, you have to call setLocalScale(). The upside of this is that it potentially saves memory, the Spatial itself can "forget" about the local scale, and recalculate it by comparing it's world scale to that of the parent. Another advantage is the scale won't have to be recalculated every update (just when setLocalScale() is calles). However, the

irrisor · July 23, 2005, 5:39pm

sounds reasonable, but return an immutable vector in this case
this sounds problematic: attach a new spatial to a node, call updateGeometricState, call getLocalScale() on the spatial and alter the vector -> the whole node gets scaled!
this seems to require frequent creation of vectors upon getLocalScale() calls and to prepare setter calls…

Btw. lets keep in mind to change Scale/Translation/Rotation to a single matrix.

llama · July 24, 2005, 2:13am

irrisor said:

1. sounds reasonable, but return an immutable vector in this case

What's the normal technique to make it immutable? I can't think of anything that makes a class with a public field immutable, though maybe I am missing the obvious?

2. this sounds problematic: attach a new spatial to a node, call updateGeometricState, call getLocalScale() on the spatial and alter the vector -> the whole node gets scaled!

If the words scale matches the world scale of the parent, then we know we must first create our own word scale if we have a local scale.

3. this seems to require frequent creation of vectors upon getLocalScale() calls and to prepare setter calls...

When you do a setLocalScale, you provide your own object (so nothing is created). You can keep a reference to this object. With getLocalScale an object is created if there is no object present (in other words it has not been set since the last world update). But I don't think getLocalScale is used a lot..and when it's used only in a way that we are breaking with this (you'll have to do setLocalScale each time after you change it). I'll make sure to add a method getLocalScale(Vector3f) though, if we decide to go this way, that way you can avoid creating an object.

For that matter though, the way this is done now is not very consistent either. The Vector3f local scale is sort of used as a sort of "animator", however there are no garantuees that the Vector3f you have gotten stays valid for any amount time; as soon as someone else makes a call to setLocalScale it becomes invalid.

Btw. lets keep in mind to change

irrisor · July 25, 2005, 11:48am

Maybe we should consider to remove the explicit storage of the world-vectors and compute them each time they are needed instead of computing them in advance?

irrisor · July 25, 2005, 6:38pm

llama said:

Big change for a little feature. Maybe we can make the fields protected instead? (and add accessors) The accessors can check wether the object is mutable or not.

What does protected help here? The interface would change, too (I assume you mean this with 'big change'). For subclasses of vector?!

This would be entirely handeled by updateWorldData.

ok, if it's not exposed - fine

The same place it comes from now. Really there's not much of a difference here! If you want to reuse the same vector across updates all you have to do is keep a reference to it somewhere.

Hmm, I think keeping references to vectors of an other spatial is one of the things we should change. In this case I would favour proper encapsulation instead of pure memory performance.

How do you store/retrieve rotation, scale and translation? I assume this involves calculation.

Yes, a little.

Basically you're breaking everything the same way as my nr. 3 proposal does

Quite - as you already stated getLocal*(* value) methods should be added, and maybe the old ones deleted...

As for calculating before we draw instead of storing the position. I know simpleGame does an updateWorldData for rootNode every frame, but that's not at all needed for normal nodes. I think in some cases it could be significantly faster to calculate the positions only when updating, rather than every frame.

I wonder if it even costs more than the current rendering process, when we calculate matrices while rendering - you have to send a matrix there for every spatial, either way, the computation could be done by GL.

llama · July 25, 2005, 7:02pm

irrisor said:

llama said:

Big change for a little feature. Maybe we can make the fields protected instead? (and add accessors) The accessors can check wether the object is mutable or not.

What does protected help here? The interface would change, too (I assume you mean this with 'big change'). For subclasses of vector?!

protected would mean only Vector3f, subclasses and other classes in the math package can change the data. This means we can keep most of the current code. An immutable vector would also refuse the execute methods that perform a local calculation. This way it's very hard to accidently change the data of a Vector, and we kan keep most of the current code. It also keeps direct field acces for calculations between two different vectors.

The same goes for a matrix of course.

As for calculating before we draw instead of storing the position. I know simpleGame does an updateWorldData for rootNode every frame, but that's not at all needed for normal nodes. I think in some cases it could be significantly faster to calculate the positions only when updating, rather than every frame.

I wonder if it even costs more than the current rendering process, when we calculate matrices while rendering - you have to send a matrix there for every spatial, either way, the computation could be done by GL.

Well yes, I think keeping a matrix instead of seperate vectors will give you the faster rendering path. The updateWorldData could be a bit slower, though not much.

I'm starting to lean more and more to the matrix idea. Even though I think you'll rarely benifit from spatials having the same matrix as their parent, the reduction in object overhead combined with eliminating the need for a permanent local matrix is a good saving compared to now. Any speed ups during the render phase are a nice extra too. The only cases that keeping a seperate rotation, scale and translation starts to look more efficient is if you have many Spatials that share 2 properties (eg. rotation and scale) with their parent.

Someone will have to show me some foul proof math for the matrixes though.

irrisor · July 25, 2005, 7:50pm

llama said:

[...] This means we can keep most of the current code. [...]

That's a good point. But lets make them package visible then - a vector does not (and shouldn't) have a subclass.

llama · July 25, 2005, 8:43pm

irrisor said:

llama said:

[...] This means we can keep most of the current code. [...]

That's a good point. But lets make them package visible then - a vector does not (and shouldn't) have a subclass.

Yes, I was thinking of adding a new constructor, something in the line of Vector3f(float x, float y, float z, final boolean mutable)

Of course we could be talking about a Matrix now instead of a Vector.

irrisor · July 30, 2005, 2:58pm

First I want to correct myself: for representing scale, rotation and translation we would need a 4x3 matrix - so the savings would not be as much as described…

I agree that it would be better to keep the old local behaviour - so 1/4. sounds ok - but I still doubt that memory optimization is neccessary at all:

Especially because of the following: The swarm stress test allocates about 100MB of Memory - I would expect the savings discussed here (even the greatest ones where everything but local position is omitted) to result in 2000 Spatial * ( 5 Objects + 16 floats ) < 200 KB

irrisor · July 30, 2005, 3:49pm

I just profiled SwarmTest: The major amount of memory is allocated by Vector3f and Color3f.

But the Spatials themselfes (including translation and such) take below 1 MB…

Most of them are in the boundings!!

renanse · July 30, 2005, 4:09pm

Hmm, those are interesting findings given the complaints we had before on the forum about attempts to implement BSP and the amount of memory used by Spatial. You mention that a lot of the memory is used by Vector3f, are you breaking that out to include Spatial's Vector3f fields in with the Spatial memory usage?

As for the methods, I'd suggest exploring #4 more. It's true we'd need to replace the 3 Vector3f objects with 1 Matrix4f for Node, but that's not just 9 floats versus 16 floats, it's also 3 objects vs 1 object. Matrices also make it possible to perform complex transformations in fewer steps. You can also eliminate 2 JNI calls if we used a matrix to set opengl. My only reservation is the whole recalc on the fly. Won't that also require constant parent lookups?

Actually, my original recommendation was to ONLY use local fields. Then simply use OpenGL's GL11.glPushMatrix(); and GL11.glMultMatrix(localMatrix); to build up the scene. OpenGL would then handle world calcs for rendering. If we needed a world field for our own computation, we could compute that on the fly at that time alone.

Unfortunately, the RenderQueue kicks that idea in the butt, or requires extra JNI calls for each ancestor of an item in the queue. Maybe that's not too bad though if we were still planning to eliminate opaque items from the queue and draw them in tree order.

Ok, I realize I'm rambling now.

llama · July 30, 2005, 4:48pm

Well, as I posted (or mailed I think?), when every spatial "has it's spatial" (the boundingbox) with it's own vertex, color, etc. data, stored twice, arraylist of renderstates, is the biggest problem.

However, we have on the list of things to do:

not storing vertex data and such twice
only store renderstates when needed
reuse as much vertex etc. data as possible for boundingbox or sphere or whatever.

This will reduce the space bounding uses so significantly, that the most memory boundings will start to take up are actually in local/world and vbo data (this is why VBO , I think, should be extracted to a seperate class, so it can be "null" for geometry that doesn't use VBO. Should I open a topic for this?). I'll post a screenshot or a link of profile data to show what I mean.

irrisor · July 30, 2005, 5:01pm

Well, reducing memory footprint of bounding spheres (used in the swarm test) is quite easy:

Change the ctor of BoundingSphere to do

super(name); this.center = center; this.radius = radius; initCheckPlanes();

instead of initializing Sphere with (name, center, 10, 10, radius) --> SwarmTest uses 30MB instead of 100MB memory

(When you choose to show boundings this climbs up again - but this can be avoided, too, like Llama said)

llama · July 30, 2005, 5:04pm

Alright here is the link I promised:

Class tree

the "benchmark" is based on this topic:

http://www.jmonkeyengine.com/jmeforum/index.php?topic=1877.0

with 2500 SharesMeshes used

Of course this is a very artificial benchmark, however for profiling memory usage it's not so bad, since a scene with many shared meshes in it is not that unrealistic! Keeping all other factors out just makes the picture more clear.

I'll post my comments on it in a bit, so you two can look at it right now.

irrisor · July 30, 2005, 5:24pm

Hey, nice profiling output! So you knew the stuff I posted before

llama · July 30, 2005, 5:32pm

As you can see, it's SharedMesh and BoundingBox that take up practically all the memory. The intresting part is how they take it up of course, and what can reduce.

Look at boundingbox first.

By far the most goes into the arrays for color, vertex, normal, texture and indice. These will be gone completly.

Then it's the controllers and renderstates. Using "lazy" init, and considering boundings have no use for these, they will also be completly gone.

Then we come to the buffers. Most of these are not important (textures for a bounding box?), other are the same for each and every box so could be shared in any case, other depend on what kind of box it is, but could be emulated with scaling, and since every box in this scene is the same they could be rid of as well.

The top memory users are now: checkplanes (the first one we don't have to be ashamed about ;)) and the world/local stuff, and the box properties (I assume those could be nulled as well), and the vbo thing. Well, let's not forget that the shallow size is still the biggest of all and that we can optimize here too (like with vbo)!

Then a quick look at the SharedMeshes. Again: Shallow size, followed by the arrays and arraylist from states and controllers, followed by something related to texture (doubt a SharedMesh needs it), names (will be rid of!) and our world/local stuff again

So while right now it's seem like we're talking about a tiny optimization in Spatial, in the long run (hopefully not too long) we could have situation where this will actually make a big impact!

llama · August 7, 2005, 6:30am

Well, did some more coding on this. The memory reduction is feasable, however what will be faster, with the renderqueue and LWJGL/OpenGL in general will require a lot more testing, and probably some learning on my behalf.

As irrisor pointed out as well, right now the impact of this would be small in most cases since we have bigger problems so I'll focus on something else first, while keeping this in mind. Is anyone working on properly replacing the arrays with buffers yet, or on the render delegates?

renanse · August 7, 2005, 6:20pm

If you are interested in replacing bounds, that seems like a related and very useful next fix (as described above) It's a big job… Bounds calculation has been a real pain in the ass in the past for some reason. Perhaps though, making it pure math and using a delegate for rendering will clean things up quite nicely.

My wife is heading out of town towards the end of the week so I should be getting in some quality code time. :) I'll be hitting Geometry arrays.

llama · August 8, 2005, 2:28am

Well, I'll take a closer look at it and see what I can do.

renanse · September 9, 2005, 12:01am

fyi, please see: http://www.jmonkeyengine.com/jmeforum/index.php?topic=2152.0