How does jMonkeyEngine do Geometry Instancing?

pspeed · February 15, 2022, 12:59pm

Instancing will use a single draw call. It’s not as good as one big mesh because the driver/GPU still has to do a bunch of separate internal draws but it’s definitely better than a bunch of separate draw calls.

If it were me, I’d probably use a custom mesh and update the vertexes myself. It’s going to be waaaaaay more efficient than JME’s automatic batching. But the automatic instancing stuff might get you where you want to go.

Kroart · February 15, 2022, 1:36pm

It’s too hard stuff for me) It’s my first experience with 3D so I’ll pick more easiest solutions.

Can you please clarify what do you mean by automatic instancing? As I understand its not implemented in the engine right now?

tonihele · February 15, 2022, 2:01pm

In very short; use InstancedNode.

Ali_RS · February 15, 2022, 2:12pm

You can take a look at the instancing examples here

You need to create an InstanceNode and attach all the instances to it and then call the instance() method on it. Note that every time you add a new object to the node you should call the instance() method again.

Note that instances must have the same material.

Kroart · February 15, 2022, 2:15pm

Wow! Didn’t know about it! It’s a piece of great news for me! Thanks! It does all the magic)

It seems that it gives a lag when adding a big amount of geometries into InstanceNode, will try to add them in smaller chunks.

Thank again!

Ali_RS · February 15, 2022, 2:28pm

To clarify it, I mean the exact same reference (==), not a clone. So if you are cloning the tree parts make sure to use clone(false) so it won’t clone the material.

The same goes for meshes as well.

Ali_RS · February 15, 2022, 5:21pm

Also might worth mentioning that, JME will cull the instances (not render) that are outside of camera view when sending instance data buffer to GPU.

Kroart · February 15, 2022, 7:00pm

Yeah, I get this. Thanks!

kevinba99 · February 16, 2022, 5:00am

Does InstanceNode use glDrawElementsInstanced calls or what does it use?

I miss instance drawing. For simple Particle instance, it is far better. I could push out over 100k meshes with no performance issues but with JME not doing it, I couldn’t do 20% of that without it effecting performance.

In my own engine I was doing the following for instancing batching.

    private void renderChunkInstanced(List<GameItem> gameItems, Transformation transformation, Matrix4f viewMatrix, boolean view3d) 
    {
        this.modelViewBuffer.clear();
        this.colorMatrixBuffer.clear();
        this.modelPosBuffer.clear();
        this.textureAtlasBuffer.clear();
        
        int i = 0;
        Texture text = gameItems.get(0).getTexture();
        for (GameItem gameItem : gameItems) {
            // Update projection Matrix
        	Matrix4f modelViewMatrix;
        	if (view3d)
            	modelViewMatrix = transformation.getModelViewMatrix(gameItem, viewMatrix);
        	else
        		modelViewMatrix = transformation.getOrtoProjModelMatrix(gameItem, viewMatrix);
            modelViewMatrix.get(MATRIX_SIZE_FLOATS * i, modelViewBuffer);
            i++;

            Vector4f color = new Vector4f(gameItem.getColor().x,gameItem.getColor().y, gameItem.getColor().z,gameItem.getTranslusentLevel());
            color.get(FLOAT_SIZE_BYTES * i, colorMatrixBuffer );

            Vector4f pos = new Vector4f(gameItem.getPosition().x,gameItem.getPosition().y, gameItem.getPosition().z,1.0f);
            pos.get(FLOAT_SIZE_BYTES * i, modelPosBuffer );
            
            if (text != null)
            {
                int col = gameItem.getTextPos() % text.getNumCols();
                int row = gameItem.getTextPos() / text.getNumCols();
                float textXOffset = (float) col / text.getNumCols();
                float textYOffset = (float) row / text.getNumRows();
                Vector2f pos2f = new Vector2f(textXOffset,textYOffset);
                pos2f.get(2 * i, textureAtlasBuffer );
            } else {
                Vector2f pos2f = new Vector2f(1.0f, 1.0f);
                pos2f.get(2 * i, textureAtlasBuffer );
            }
        }

        glBindBuffer(GL_ARRAY_BUFFER, modelViewVBO);
        glBufferData(GL_ARRAY_BUFFER, modelViewBuffer, GL_DYNAMIC_DRAW);

        glBindBuffer(GL_ARRAY_BUFFER, colorMatrixVBO);
        glBufferData(GL_ARRAY_BUFFER, colorMatrixBuffer, GL_DYNAMIC_DRAW);

        glBindBuffer(GL_ARRAY_BUFFER, modelPosVBO);
        glBufferData(GL_ARRAY_BUFFER, modelPosBuffer, GL_DYNAMIC_DRAW);

        glBindBuffer(GL_ARRAY_BUFFER, textureAtlasVBO);
        glBufferData(GL_ARRAY_BUFFER, textureAtlasBuffer, GL_DYNAMIC_DRAW);
        
        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, i_id);
    	glDrawElementsInstanced(GL_TRIANGLES,   draw_count, GL_UNSIGNED_INT, 0, gameItems.size());

    	
        glBindBuffer(GL_ARRAY_BUFFER, 0);
    }

It would allow 100k+ instances without performance hits. If I could convert particles emitter to use InstanceNode if it used the DrawElementInstance, it would be worth it.

kevinba99 · February 16, 2022, 5:11am

I just tried “TestInstanceNode”.

It is set to 30 instances. If you change it to 300. it basically crashes JME. It has a frame rate of <1fps.

Even 100 instance really effects the FPS around 20fps.

It must not be using glDrawElementsInstanced.

pspeed · February 16, 2022, 6:22am

github.com

jMonkeyEngine/jmonkeyengine/blob/master/jme3-core/src/main/java/com/jme3/renderer/opengl/GLRenderer.java#L3089

    
      
          int curOffset = 0;
          for (int i = 0; i < elementLengths.length; i++) {
              if (i == stripStart) {
                  elMode = convertElementMode(Mode.TriangleStrip);
              } else if (i == fanStart) {
                  elMode = convertElementMode(Mode.TriangleFan);
              }
              int elementLength = elementLengths[i];
          
          
    if (useInstancing) {
                  glext.glDrawElementsInstancedARB(elMode,
                          elementLength,
                          fmt,
                          curOffset,
                          count);
              } else {
                  gl.glDrawRangeElements(elMode,
                          0,
                          vertCount,
                          elementLength,
                          fmt,

I can’t speak to the efficient of TestInstanceNode but at the mesh level it works fine.

Ali_RS · February 16, 2022, 6:28am

It uses glDrawArraysInstancedARB

github.com

jMonkeyEngine/jmonkeyengine/blob/e7086c28a871e89c063d9ea026179de5288e4794/jme3-core/src/main/java/com/jme3/renderer/opengl/GLRenderer.java#L3023


      
              }
          }
          
          public void setVertexAttrib(VertexBuffer vb) {
              setVertexAttrib(vb, null);
          }
          
          public void drawTriangleArray(Mesh.Mode mode, int count, int vertCount) {
              boolean useInstancing = count > 1 && caps.contains(Caps.MeshInstancing);
              if (useInstancing) {
                  glext.glDrawArraysInstancedARB(convertElementMode(mode), 0,
                          vertCount, count);
              } else {
                  gl.glDrawArrays(convertElementMode(mode), 0, vertCount);
              }
          }
          
          public void drawTriangleList(VertexBuffer indexBuf, Mesh mesh, int count) {
              if (indexBuf.getBufferType() != VertexBuffer.Type.Index) {
                  throw new IllegalArgumentException("Only index buffers are allowed as triangle lists.");
              }

github.com

LWJGL/lwjgl3/blob/e4a6cc863f469ea8acfe3c2158f2c77d0c0aa95d/modules/lwjgl/opengl/src/generated/java/org/lwjgl/opengl/ARBDrawInstanced.java#L50


      
          // --- [ glDrawArraysInstancedARB ] ---
          
          /**
           * Draw multiple instances of a range of elements.
           *
           * @param mode      the kind of primitives to render. One of:<br><table><tr><td>{@link GL11#GL_POINTS POINTS}</td><td>{@link GL11#GL_LINE_STRIP LINE_STRIP}</td><td>{@link GL11#GL_LINE_LOOP LINE_LOOP}</td><td>{@link GL11#GL_LINES LINES}</td><td>{@link GL11#GL_TRIANGLE_STRIP TRIANGLE_STRIP}</td><td>{@link GL11#GL_TRIANGLE_FAN TRIANGLE_FAN}</td><td>{@link GL11#GL_TRIANGLES TRIANGLES}</td></tr><tr><td>{@link GL32#GL_LINES_ADJACENCY LINES_ADJACENCY}</td><td>{@link GL32#GL_LINE_STRIP_ADJACENCY LINE_STRIP_ADJACENCY}</td><td>{@link GL32#GL_TRIANGLES_ADJACENCY TRIANGLES_ADJACENCY}</td><td>{@link GL32#GL_TRIANGLE_STRIP_ADJACENCY TRIANGLE_STRIP_ADJACENCY}</td><td>{@link GL40#GL_PATCHES PATCHES}</td><td>{@link GL11#GL_POLYGON POLYGON}</td><td>{@link GL11#GL_QUADS QUADS}</td></tr><tr><td>{@link GL11#GL_QUAD_STRIP QUAD_STRIP}</td></tr></table>
           * @param first     the starting index in the enabled arrays
           * @param count     the number of indices to be rendered
           * @param primcount the number of instances of the specified range of indices to be rendered
           */
          public static native void glDrawArraysInstancedARB(@NativeType("GLenum") int mode, @NativeType("GLint") int first, @NativeType("GLsizei") int count, @NativeType("GLsizei") int primcount);
          
          // --- [ glDrawElementsInstancedARB ] ---
          
          /**
           * Unsafe version of: {@link #glDrawElementsInstancedARB DrawElementsInstancedARB}
           *
           * @param count the number of elements to be rendered
           * @param type  the type of the values in {@code indices}. One of:<br><table><tr><td>{@link GL11#GL_UNSIGNED_BYTE UNSIGNED_BYTE}</td><td>{@link GL11#GL_UNSIGNED_SHORT UNSIGNED_SHORT}</td><td>{@link GL11#GL_UNSIGNED_INT UNSIGNED_INT}</td></tr></table>
           */
          public static native void nglDrawElementsInstancedARB(int mode, int count, int type, long indices, int primcount);

Note, 30 is not the number of instances, it will create 3600 instances. Setting it to 300 will create 360000 instances.

kevinba99 · February 16, 2022, 10:35pm

Thanks on the count, Didn’t notice the looping any the math of “NEGATIVE” so everything is doubled and times by the second row numbers.

Do you know what controls the “num” to draw at a time? I can’t locate this. I see on my machine it is doing at a range of 2,000 object at a time. Which is very low.

Can you control this? Increase the batch draw size.

When I do the 90 = 32,400 instance

kevinba99 · February 16, 2022, 11:06pm

Sorry, just noticed there is not limit, it was limited because of the random material usage.
If you do one material, it renders all of them at the same time.

300 = did a GPU call for 90,000+ geom.

The FPS was very poor, 3FPS.

But I see that on every frame there is so many looping through all the items, that take a hugs hit.

pspeed · February 17, 2022, 1:21am

One issue with having the scene graph do auto-instancing is that there will always be scene graph overhead. For a super-large number of objects, that’s going to be significant… especially if all of them are moving and stuff. It’s convenient but it comes with those “everything is duplicated and we still have to manage a million objects” limitations.

But the real instancing support is down in the Mesh and it’s pretty easy to construct a raw Mesh that works with JME’s materials and uses instancing. Then it’s always just one object in the scene graph and you just have to edit the transform buffers yourself. A trade off between performance and simpler code. Though for me, I’ve never found the raw buffer updates to be particular onerous.

Ali_RS · February 17, 2022, 4:35am

One thing I noticed in the TestInstanceNode, is that if the objects do not move then fps goes up. In my case from 170 fps (when objects are moving) to 230 fps (when objects are not moving). I can not explain why?

kevinba99 · February 17, 2022, 5:14am

Thanks, I found Shaderblow using an example of instancing and drawing over 90k items with 200+ fps.

I’m going to use that technique that for my particle emitter for my Rain/Snow.

Thanks for helping. This is basically like what I was doing in my old engine.

Ali_RS · February 17, 2022, 5:24am

Can you share the link to that example?

kevinba99 · February 17, 2022, 5:26am

Sure no problem.

He is doing is own instancing, that is outside of JME architecture. He just builds his own vertex buffers and then updates the vertex data on simpleupdate. Then uses a simple shader to draw them.

kevinba99 · February 17, 2022, 5:42am

This is with 80k instances, doing a simple update to move the rain drops to the ground and then reposition them to the top again to simulate rain.

Still gettings 500+ fps on this. This is a SIMPLE QUAD using a sprite sheet, so I altered the text coords.