Android Performance Tip

The_Leo · November 15, 2014, 12:39am

Hi Guys,

First of all, thank you for developing an amazing engine jme3!

I have been developing a 3d runner game for some time now, and finally managed to optimizing it. I decided to share what I did to get 60+ fps for a ~300k vertices, ~800 object scene with a 2.5k vertex animated character.

First of all, I used Unshaded material with baked textures. Thus, there is no light in the scene.
I used GeometryBatchFactory.optimize on nodes containing static objects close to each other.
Next, I used set CullHint.Never on nodes far away to hide them and CullHint.Dynamic on the ones close.

The result of these brought down the vertex count to max ~17k and object count 30-80.
The resulting fps was 20-35.

After using NetBeans profiler, and since my Main app extended SimpleApplication I found that significant part of the update loop was spent on methods: updateLogicalState, runControlUpdate, SafeArrayList.isEmpty().

I found that SimpleApplication calls updateLogicalState on rootNode and guiNode and then the Node class calls this method on its children and so on. Also I found that this method then is responsible for updating Controls for spatials.

Thus I commented those two methods and instead I did change the update method of SimpleApplication:

SafeArrayList<Spatial> logical = new SafeArrayList<Spatial>(Spatial.class);
public void update() {
    //...
    //    rootNode.updateLogicalState(tpf);
    //    guiNode.updateLogicalState(tpf);
    Spatial[] s = logical.getArray();
    for(int i = 0; i < s.length; i++)
     s[i].updateLogicalState(tpf);
     //...
}

Where I would manually add the spatials I would want to be updated. (eg. ChaseCamera, BitMapFont, etc…)
And not believing the result, this brought the fps when tested on android phone from 20-30 to 60+.

That’s all I wanted to share. Thanks for reading and if you are struggling with performance on Android feel free to give it a try.

pspeed · November 15, 2014, 1:13am

@The Leo said: After using NetBeans profiler, and since my Main app extended SimpleApplication I found that significant part of the update loop was spent on methods: updateLogicalState, runControlUpdate, SafeArrayList.isEmpty().

This is very suspicious, though… since SafeArrayList.isEmpty() is just size == 0 in a final method.

Do note: even if you have culling turned off for some node, it’s updateLogicalState() will still be run. So it sounds like you might have a lot of objects in your scene even if only a few are getting rendered. You might do even better to use a paging scheme rather than just hiding the ones you don’t want shown… like, actually remove them and readd them as needed.

Momoko_Fan · November 15, 2014, 2:26am

Would there be a point if scene graph branches with no controls were skipped? I am not sure how common it is to have such branches … Is it dynamic objects on which you’re using CullHint.Always?

The_Leo · November 15, 2014, 10:39am

@pspeed said: This is very suspicious, though... since SafeArrayList.isEmpty() is just size == 0 in a final method.
Do note: even if you have culling turned off for some node, it’s updateLogicalState() will still be run. So it sounds like you might have a lot of objects in your scene even if only a few are getting rendered. You might do even better to use a paging scheme rather than just hiding the ones you don’t want shown… like, actually remove them and readd them as needed.

Well, yes it seems isEmpty does not do much, but for like 800 objects at 60 fps, its 48000 calls a second. On the other hand in updateLogicalState it is called in condition if(…isEmpty)return; Thats 48000 conditions for which hopefully the branching prediction works fine.

Yes, updateLogicalState ran for all even the culled ones. But since I only needed to call these method for like 6-10 objects I added these in an array, not called updateOnLogicalState on rootNode and guiNode, but just the guys in array.

Regarding attaching and detaching, I was going to try this if I didnt hit 60+ fps, but I read on forums that guys had lag issues when attaching detaching nodes.

@Momoko_Fan said: Would there be a point if scene graph branches with no controls were skipped? I am not sure how common it is to have such branches .. Is it dynamic objects on which you're using CullHint.Always?

Well I use CullHint.Always on both static and dynamic. Pretty much I dont use the Control class except in Camera, Skeleton, Animation and BitMapText. And yes by just adding these guys in array and updating only these I skipped all the empty ones in a sense.

Momoko_Fan · November 15, 2014, 6:16pm

So you’re rendering only a small subset of those 800 objects … the rest you set the cull hint to always.
jME3 has to go through the entire scene graph 3 times in a frame in order to render it. The cost becomes much higher for big scenes if for example all objects cannot fit in the processor’s cache, etc so we have to go to slow memory to get it each time.

Also, there is a cost to adding and removing nodes from the scene, since once added back, the bounding volume and transforms are invalid and have to be recomputed again.

I don’t really see any solution here except just having less objects … Unless anybody has other ideas.

pspeed · November 15, 2014, 6:54pm

Adding/removing objects from the scene does have costs but you can mitigate that by doing it rarely. Proper spatial organization can help here as you can wholesale add/remove entire portions of the scene based on whatever requirements you desire… generally ‘nearness’. This is why I suggested paging. A grid could keep track of the roots of the grid cells and then only attach them when they are near, ie: the x number of closest grid cells. When crossing a grid boundary, reassess.

JME is not optimized for large scenes and we generally discourage users from having them. “Large” being a relative thing based on platform. (1000-2000 objects is fine on desktop, I’d probably keep 100-200 on Android I guess). @Momoko_Fan, if you are considering optimizing for large scenes then there are quite a few things we could do to handle more objects (I’ve thought about this many times)… it was always my impression that we opted for simplicity (though some things are not too much harder). It seemed like a conscious design decision. Even if we up the max recommended count, users will always be hitting it and have to fallback to classic scene optimization techniques.

The_Leo · November 15, 2014, 11:22pm

In the game I have to detach about avg. 8 nodes and attach 8 nodes like every 10 seconds or so. I tried this instead of setting CullHints and this resulted in 40-60. Thus, I’m sticking with CullHints + not calling updateLogical state for 60+ fps.

@Momoko_Fan said: I don't really see any solution here except just having less objects ... Unless anybody has other ideas.

You do know that I have presented a solution, right? By not calling the updateLogicalState on rootNode.

@pspeed yes I noticed that for large scenes updateLogicalState and updateGeometricState cause an overhead that makes a huge difference for Android. But by selectively calling updateLogicalState on just Spatials with Contol and using the below DynNode instead of Node, it seems that suddenly jme3 works pretty fast for large scenes as well.

When I profiled again, and found that a lot is spend in updateGeometricState. I have significantly reduced the overhead by instead of using Node.class using this:

public class DynNode extends Node {
    public DynNode() {
        super();
    }
    public DynNode(String name) {
        super(name);
    }
    public void updateGeometricState() {
        if(refreshFlags == 0) return;
        super.updateGeometricState();
    }
}

This seems to reduce much of the updateGeometricState calls. Also my rootNode and guiNode are now of DynNode.

Here is a screenshot taken on Android, while also playing background music, and using accelerometer. The fps is 63.
Taken on phone

nehon · November 16, 2014, 9:46am

@The Leo said: You do know that I have presented a solution, right? By not calling the updateLogicalState on rootNode.

Well, that's a solution for your game, but that can hardly be a generic solution for the engine. Kirill wants a generic solution. updateLogicalState updates controls, if a control moves an object, it has to be called even if the object is out of the screen, else this object will move only if it's in the view frustum, which is not an acceptable solution.

@The Leo said:

public class DynNode extends Node {
    public DynNode() {
        super();
    }
    public DynNode(String name) {
        super(name);
    }
    public void updateGeometricState() {
        if(refreshFlags == 0) return;
        super.updateGeometricState();
    }
}

You realize that if a children of this node is transformed you won't update it. Refresh flags are not updated upward, refreshFlags == 0 does not guarantee that the subgraph didn't change. Doing less things obviously gives better performance...but that's doing less things...

You may think those are good solutions because they give good numbers, but they may get you in trouble later, if you want for example to have moving platforms that goes in and outside of the view…

Your analysis is interesting, because it clearly shows that those methods are the weak point when you have a big scene. But your solutions only works for very particular cases and cannot be general.

On a side note, your character looks very nice, I like the cartoonish style.

The_Leo · November 16, 2014, 2:01pm

@Refresh flags are not updated upward, refreshFlags == 0 does not guarantee that the subgraph didn’t change.

Im not really sure about this. From my observation:

SetTransformRefresh always includes SetBoundRefresh which updates flags upward to parent.
Adding child to a node calls setTransformRefresh(), which calls setBoundRefresh which updates flags upward to parent.
Detaching also calls setTransformRefresh() and so on.

I do agree with your statement since setLightListRefresh() does not update upward to parent. I assumed I could ignore this because I do not use light.

Thus I assume DynNode can be used for scenes with no lighting, which is common on Android. I said ‘assume’ because so far I have not encountered any problems with it yet. Maybe it should have been called NoLightNode or something.

Regarding a general case solution for updating Controls. What about one of these:

I guess one can create two rootNodes, one for objects with controls and call updateLogicalState on it and another rootNode with objects without controls not calling updateLogical state. That really seems to be an easy way to do things.
What if there was one list of Contols. When attaching an object to rootNode or its subNodes, all objects controls were added to this list, when detaching removed. Also when adding a Control to an object and it was attached to the rootNode, that Control would be added. This would definitely eliminate the updateLogicalState overhead. Eg. Following is the sample code how this could be done:

For convenience I extended Node and Spatial classes, Thus NewNode would become Node, and NewSpatial would become Spatial. RootNode would be a special kind of node used as the rootNode.

public abstract class NewSpatial extends Spatial {
    public void addControl(Control control) {
        controls.add(control);
        control.setSpatial(this);
        /*Only required to call this method now, since if list had
         controls it would already be in added in the list*/
        if(controls.size() == 1) addControlsToList(controls);
    }
    
   public void removeControl(Class<? extends Control> controlType) {
        for (int i = 0; i < controls.size(); i++) {
            if (controlType.isAssignableFrom(controls.get(i).getClass())) {
                Control control = controls.remove(i);
                control.setSpatial(null);
            }
        }
        if(controls.isEmpty()) removeControlsFromList(controls);
    }
    public boolean removeControl(Control control) {
        boolean result = controls.remove(control);
        if (result) {
            control.setSpatial(null);
        }
        if(controls.isEmpty()) removeControlsFromList(controls);
        return result;
    }
    
    protected void addControlsToList(SafeArrayList<Control> list) {
        if(parent != null)
            ((NewNode)parent).addControlsToList(list);
    }
    protected void removeControlsFromList(SafeArrayList<Control> list) {
        if(parent != null)
            ((NewNode)parent).removeControlsFromList(list);
    }
}

public class NewNode extends Node {
    private static final Logger logger = Logger.getLogger(NewNode.class.getName());

    public int attachChild(Spatial child) {
        return attachChildAt(child, children.size());
    }
    public int attachChildAt(Spatial child, int index) {
        if (child == null)
            throw new NullPointerException();
        if (child.getParent() != this && child != this) {
            if (child.getParent() != null) {
                child.getParent().detachChild(child);
            }
            child.setParent(this);
            children.add(index, child);
            if(parent != null || this instanceof RootNode) {
                if(child instanceof Node) {
                    Node n = (Node)child;
                    n.depthFirstTraversal(new SceneGraphVisitor() {
                        public void visit(Spatial s) {
                            if(!s.controls.isEmpty())
                                addControlsToList(s.controls);
                        }
                    });
                }
                else {
                    if(!child.controls.isEmpty())
                        addControlsToList(child.controls);
                }
            }
            
            child.setTransformRefresh();
            child.setLightListRefresh();
            if (logger.isLoggable(Level.FINE)) {
                logger.log(Level.FINE,"Child ({0}) attached to this node ({1})",
                        new Object[]{child.getName(), getName()});
            }
        }
        return children.size();
    }
    public Spatial detachChildAt(int index) {
        Spatial child =  children.remove(index);
        if ( child != null ) {
            child.setParent( null );
            logger.log(Level.FINE, "{0}: Child removed.", this.toString());

            if(parent != null || this instanceof RootNode) {
                if(child instanceof Node) {
                    Node n = (Node)child;
                    n.depthFirstTraversal(new SceneGraphVisitor() {
                        public void visit(Spatial s) {
                            if(!s.controls.isEmpty())
                                removeControlsFromList(s.controls);
                        }
                    });
                }
                else {
                    if(!child.controls.isEmpty())
                        removeControlsFromList(child.controls);
                }
            }
            // since a child with a bound was detached;
            // our own bound will probably change.
            setBoundRefresh();

            // our world transform no longer influences the child.
            // XXX: Not neccessary? Since child will have transform updated
            // when attached anyway.
            child.setTransformRefresh();
            // lights are also inherited from parent
            child.setLightListRefresh();
        }
        return child;
    }
    
    protected void addControlsToList(SafeArrayList<Control> list) {
        if(parent != null)
            ((NewNode)parent).addControlsToList(list);
    }
    protected void removeControlsFromList(SafeArrayList<Control> list) {
        if(parent != null)
            ((NewNode)parent).removeControlsFromList(list);
    }
}

public class RootNode extends NewNode {
    SafeArrayList<SafeArrayList<Control>> controlsList;
    protected void addControlsToList(SafeArrayList<Control> list) {
        controlsList.add(list);
    }
    protected void removeControlsFromList(SafeArrayList<Control> list) {
        controlsList.remove(list);
    }
    public void updateLogicalState(float tpf) {
        SafeArrayList<Control> cList[];
        Control[] c;
        cList = controlsList.getArray();
        for(int i = 0; i < cList.length; i++) {
            c = cList[i].getArray();
            for(int j = 0; j < c.length; j++)
                c[i].update(tpf);
        }
    }
}

I hope I did not miss any important thing. If not, this should alleviate the updateLogicalState overhead for spatials with no controls. The price to pay for this change is travering subnodes when attaching detaching nodes.

Tell me what you think!

pspeed · November 16, 2014, 4:41pm

There are general solutions we could employ that complicate the engine a little and have side-effects that we have to deal with. Your solution is not an optimal one because it does a lot of potentially unnecessary work on attach/detach, requires a special root node class (ugh), and some other distasteful things. There is a way to do it better without all of that but as a team we’d need to decide that we want to handle what has previously been considered oversized scenes. This is why I’ve not proposed any solutions before now.

Your solution is probably the best you can do if you don’t want to modify the core engine but we are talking about potentially modifying the core engine. We can get away with a few different things that way (such as keeping the list of spatials with controls instead of a list of lists because we can call runControlUpdate() directly if made protected, etc.)

The advice in the past has always been “make smaller scenes”… so I kept these ideas to myself. Because it’s not really bad advice either. For example, one reason that control updates rank so highly in your scene is because you have “too many nodes” and your game isn’t otherwise doing very much. The strategy that your pursuing has a scalability limit because if you get the engine to handle 800 spatials ok then add 10 more tomorrow you might now be over the hump again. Alternatively, there are ways to better organize your scene so that you can add/remove entire sections based on distance or whatever. Not “several times a second” but “when the player crosses a threshold”. A solution like that could be scaled far higher. Attaching three nodes and detaching three others when crossing a grid boundary is not going to have a high cost but it opens up scalability to the point that you will hit other limits before you are plagued with update issues again.

Do I think the engine should avoid traversing the whole scene graph three times per frame? Yes. Do I think you should look at reorganizing your scene? Also yes.

nehon · November 16, 2014, 5:46pm

Oh you’re right about the updateBounds. Forgot about that…I’m not sure why it’s needed though since we traverse top to bottom on update…anyway.

I’m not fond either of the idea of a special RootNode.

@pspeed we should talk about it then. Even if @The Leo should have some partitioning scheme in his scene, he does have a point.
I’m not sure this would have a lot of benefit on desktop though, since there is more chance that you are GPU bound than CPU. Android is special in that matter…

pspeed · November 16, 2014, 6:00pm

@nehon said: Oh you're right about the updateBounds. Forgot about that...I'm not sure why it's needed though since we traverse top to bottom on update...anyway.
I’m not fond either of the idea of a special RootNode.

@pspeed we should talk about it then. Even if @The Leo should have some partitioning scheme in his scene, he does have a point.
I’m not sure this would have a lot of benefit on desktop though, since there is more chance that you are GPU bound than CPU. Android is special in that matter…

Maybe I’ll write something up and e-mail it to the group.

zzuegg · November 16, 2014, 6:29pm

Since you are on this track, would it be possible to disable frustum culling completely?

As for most 2d games you would already have code that manages the scene, removing/reusing every single object out there. In most cases everything attached to the scenegraph is drawn too, making the culling checks only a waste of computing time.

Might only be a small benefit, but on android every nanosecond counts

Add: to be more clear, i would just be nice if there would be a off switch somewhere. Of course enabled by default.

pspeed · November 16, 2014, 7:18pm

@zzuegg said: Since you are on this track, would it be possible to disable frustum culling completely?
As for most 2d games you would already have code that manages the scene, removing/reusing every single object out there. In most cases everything attached to the scenegraph is drawn too, making the culling checks only a waste of computing time.

Might only be a small benefit, but on android every nanosecond counts

Add: to be more clear, i would just be nice if there would be a off switch somewhere. Of course enabled by default.

For 2D, use the GUI node and you will not have normal frustum culling but just a simple screen bounds check. Unless you are really talking about 2.5d.

zzuegg · November 17, 2014, 9:53am

In my case, i am working on a 3d game, but i am 100% sure that every object on the scenegraph is in the frustum too. Otherwise i would have to check my object reusing code.

AFAIK 2.5d games are quite common too. Having persepective really helps, and it should not cost that much performance compared to flat 2d

Rutar · July 5, 2015, 1:51pm

800 objects and 60 fps - impossible!
Test code gives 17 fps - 343 objects


package classes;

import com.jme3.app.SimpleApplication;
import com.jme3.material.Material;
import com.jme3.math.Vector3f;
import com.jme3.scene.Geometry;
import com.jme3.scene.Spatial;
import com.jme3.scene.shape.Box;

public class Java_SE extends SimpleApplication {

float size = 7;
    
public static void main(String[] args) { new Java_SE().start(); }

@Override
public void simpleInitApp() {

flyCam.setMoveSpeed(15);
cam.setLocation(new Vector3f(0, 0, 18f));

Geometry geometry = new Geometry("", new Box(0.3f, 0.3f, 0.3f));
Material mat = new Material(assetManager, "Common/MatDefs/Misc/ShowNormals.j3md");
geometry.setMaterial(mat);

for (float z = 0; z < size; z++) {
    for (float y = 0; y < size; y++) {
        for (float x = 0; x < size; x++) {
            Spatial s = geometry.clone();
            s.setLocalTranslation(((-size+1)/2f)+x,
                                  ((-size+1)/2f)+y,
                                  ((-size+1)/2f)+z);
            rootNode.attachChild(s);
        }
    }
}

}

@Override
public void simpleUpdate(float tpf) { rootNode.rotate(tpf/5, tpf/5, tpf/5); }

//@Override
//public void update() {
//
//super.update();
//
//if (this.speed == 0.0f || this.paused) { return; }
//
//float tpf = this.timer.getTimePerFrame() * this.speed;
//
//this.stateManager.update(tpf);
//this.simpleUpdate(tpf);
//
//this.rootNode.updateLogicalState(tpf);
//this.guiNode.updateLogicalState(tpf);
//
//this.rootNode.updateGeometricState();
//this.guiNode.updateGeometricState();
//
//this.stateManager.render(this.renderManager);
//this.renderManager.render(tpf, this.context.isRenderable());
//this.simpleRender(this.renderManager);
//this.stateManager.postRender();
//
//}

}

If I comment updateLogicalState(tpf) game don’t stop.

pspeed · July 5, 2015, 3:19pm

Since this thread got necro’d, I thought I’d follow up on this…

I did more than write something up for the core devs, I actually implemented it. In the latest 3.1 code, updateLogicalState() no longer traverses the scene graph. It collects the spatials that are detected to need updateLogicalState() in a list and just updates those. (Presuming you follow best practices and don’t extend spatial then you get this for free automatically… else you have to override a method on Spatial to avoid backwards-compatibility mode.)

The_Leo · July 5, 2015, 5:36pm

@pspeed: thanks for implementing that

@Rutar: I think you didn’t read the part where I cull far away objects, thus at any moment 30-80 are visible. If you’re struggling with android performance, I suggest to use the NetBeans Profiler.

atomix · July 6, 2015, 3:04pm

@The_Leo Cool game you got there.

I’ve never ask my self if there any thing bad can happen in the updateLogicalState(), but if it get improved now, so nice…

Regard optimization for such runner game. I think you need a Pool of spatials that made up the scene. In the past, I’ve involved in making Despicable Me 2 (the Minions at the top of Android game for a while).

Beside of very good optimized engine, we had use several tricks to make sure the game “ready” for every upcoming models that load down the road where Minions go. Those tricks can also be apply to JME game for Android and mobile for example:

Simplify physics collision of the scenes. We just have very few faces and boxes at the nearby of player’s position. Simplify interactions too, but it can affect game design, so consider optimizate interactions later…
Pools of everything and reuse them as much as possible. As in your game, I see walls can made of tiles, tree can be rotated to be varied, even reuse particle emitters, shader, lights… Remember to count every single thing that you use!!!
3 )For paging, I think in general toggling a spatial should be in the form of show/hide or move in/out spatials but not add/ remove them in scenegraph.
In low memory device, keep a minimum of spatials to be updated. In my other JME games I usually keep a list of updatable (out of the scene graph) has less than 50 objects . This list can be called the EntityManager and work like what the update loop to with spatial and their controls. Also use Controls where needed and try to share controls between instances. A small trick is to let Control “operate” Spatials by their userdata.
Make smaller, less tris version of models from the start (if your team has artists), then switch them by their quality. If you has a good LOD code, it will gain some more cycle per sec. If the device is too weak, try to use a lower quality or even consider re-modelling if need.

Those are some tips in general and it’s nothing special, but for low GPU budget such as Android phones, is all you can do even after genius coding.

Cheers,

Rutar · July 8, 2015, 11:28am

@The_Leo: Thank you for your reply, I really missed this point. To implement your approach I have trouble, as I write in JME 3.0, while the code on Github other (probably 3.1).Could you give a sample of your code (own implementation class) completely (or even partially)? I only recently started to program for Android and can not itself implement those parts of the code that you have filed. I think your experience will be very useful for many novice programmers.
P.S. Excuse me please for my English