Geometry morphing / interpolating

Hey everyone, thought I’d share my latest personal ‘achievement’ if you will.

I suspected it should be possible to morph a Geometry’s Mesh based on it’s vertex position data into another shape, but I hadn’t really seen that much of it online. So I gave it a shot, and here’s the result:

[video]http://www.youtube.com/watch?v=4TyKBrE9YZc[/video]

(note: these are exactly the same, except one has it’s material wireframe property set)

I made eight different shapes, all based off of the same 10x10x10 faceted cube. I then use the Position data to interpolate between the different shapes. I’ve cleaned up the code a bit, and wrapped it in a util class. At this point the code I’ve attached below works quite well, but I highly suspect things might turn out really weird if the shapes don’t have the same number of vertices - which is why I exclude these beforehand. I also added support (I think) for IntBuffer, but the other ones I haven’t used until now so I can’t really test that bit.

In any case, I’m quite curious what others think of this, and if there’s anything I can improve.

[java]
package nl.pixelsmash.utils;

import com.jme3.math.FastMath;
import com.jme3.scene.Geometry;
import com.jme3.scene.Mesh;
import com.jme3.scene.VertexBuffer;
import com.jme3.scene.VertexBuffer.Type;
import java.nio.FloatBuffer;
import java.nio.IntBuffer;

/**
*

  • @author PixelSmash
    */
    public class GeometryUtils {

    public static boolean interpolate(float scale, Geometry target, Geometry startGeom, Geometry endGeom, Type type) {
    return interpolate(scale, target, startGeom.getMesh(), endGeom.getMesh(), type);
    }

    public static boolean interpolate(float scale, Geometry target, Mesh startMesh, Mesh endMesh, Type type) {
    VertexBuffer startBuffer = startMesh.getBuffer(type);
    VertexBuffer endBuffer = endMesh.getBuffer(type);
    VertexBuffer buffer = target.getMesh().getBuffer(type);

     if (startMesh.getVertexCount() != endMesh.getVertexCount() || startBuffer == null || endBuffer == null || buffer == null) {
         return false;
     }
    
     if (startBuffer.getFormat() == VertexBuffer.Format.Float) {
         interpolateBuffer(scale, (FloatBuffer) buffer.getData(), (FloatBuffer) startBuffer.getData(), (FloatBuffer) endBuffer.getData());
     } else if (startBuffer.getFormat() == VertexBuffer.Format.Int) {
         interpolateBuffer(scale, (IntBuffer) buffer.getData(), (IntBuffer) startBuffer.getData(), (IntBuffer) endBuffer.getData());
     } else {
         return false;
     }
    
     target.getMesh().getBuffer(type).setUpdateNeeded();
     target.updateModelBound();
    
     return true;
    

    }

    private static void interpolateBuffer(float scale, FloatBuffer buffer, FloatBuffer startBuffer, FloatBuffer endBuffer) {
    for (int i = 0; i < startBuffer.capacity(); i++) {
    buffer.put(i, FastMath.interpolateLinear(scale, startBuffer.get(i), endBuffer.get(i)));
    }
    }

    private static void interpolateBuffer(float scale, IntBuffer buffer, IntBuffer startBuffer, IntBuffer endBuffer) {
    for (int i = 0; i < startBuffer.capacity(); i++) {
    buffer.put(i, (int) FastMath.interpolateLinear(scale, startBuffer.get(i), endBuffer.get(i)));
    }
    }
    }[/java]

4 Likes

The concept looks good. The last version of herodex (in fact version currently online) does a few similar things with meshes…

Execution wise you are constantly recreating buffers. You would be better off modifying the existing buffers in the mesh based on the two meshes being interpolated between.

1 Like

Good point, updated the first post to reflect that - now it uses buffer.put as opposed to creating a list of floats / ints and converting those to new Buffers. So in effect your suggestion helped fix two unneeded object creations - both the number array and the buffer. Thanks for that! I’ve also added an updateModelBounds call at the end, something I accidentally removed while cleaning up the code.

I’ve been thinking about if there’s a way to replace the if/else block that checks for the buffer’s format with something more elegant, but so far I’ve got nothing - it seems the base Buffer class doesn’t specify a put method.

Why do you need to reset local transforms? And why in so strange way, you are resetting everything over and over - setTransform would be enough, even if it is really needed (which I doubt).

It’s a double win as well since you are doing less data copying as well as less allocation.

One more change - put(i,x) does a seek to i each time which is actually quite slow in itself with native buffers.

Rewind all the buffers before entering the for loop and you can just use
[java]
buffer.put(FastMath.interpolateLinear(scale, startBuffer.get(), endBuffer.get()))
[/java]

I don’t think you can do much about the if/else - the buffer provides no generic way to do it so there is no way for you to do it generically.

nice :slight_smile: reminds me of Adobe Flash’s Shape tweening ^^. I guess next steps would be to have “easing”, so its not completely a linear transformation

<cite>@abies said:</cite> Why do you need to reset local transforms? And why in so strange way, you are resetting everything over and over - setTransform would be enough, even if it is really needed (which I doubt).
I did this because of an issue I encountered, I'll try to reproduce the steps: - create a standard geometry, with mesh, material etc. - scale the geometry (to say .1) - run the interpolation on the geometry The result was that the geometry appeared as having scale 1, as opposed to scale .1 - the only way to reset this was to reapply the localScale. I figured since the issue was present at localscale, it would be present at the other ones as well. I however reran the interpolation without using those calls, and for some reason it looks fine now - no scaling issues whatsoever, which leads me to think I messed something else up before - and this was a fix for my own error, rather than something weird about jME. Fixed this in the first post, if not for you I probably wouldn't have looked at this twice, so thanks!

I’ve looked at your latest suggestion as well, zarch. I did a little benchmark to see what kind of performance increase your suggestion meant by running 100.000 calls to each version. Surprisingly, using put(i, x) the time taken was 5453 ms, and using put(x), the time was 5735 ms - multiple runs showed consistently that put(i, x) was faster than put(x).

<cite>@wezrule said:</cite> nice :) reminds me of Adobe Flash's Shape tweening ^^. I guess next steps would be to have "easing", so its not completely a linear transformation
Haha yes, including the weird in-between shapes if you didn't set the correct guides. Easing would be a nice next step indeed, I'm guessing I'll need to take a look at FastMath's interpolateBezier for that. Another nice thing could be support for meshes that have different vertex counts.

Did you warm up the relevant methods in the jvm before doing the tests?

As you say your results are…surprising…

I’m afraid I simply have no idea how to do that - I’m not even sure I’ve ever heard of it. What does it entail?

Make sure every code path has been executed a good number of times (at least 50 but within reason more is better) before doing any timed tests.

That causes hotspot to do any compiling/optimising it might need to do during the warm up rather than during the timed tests.

So before doing your two loops of 100k calls do a loop of each of 1k and throw away those results…then run your test for 100k and time that.

put(i, x) will always be slower than put(x)… and if it weren’t then put(x) would just use it.

So, if you are seeing it the other way around then your test is flawed somehow (as zarch suggests).

I’d have to see your test code to know for sure. Warming up, using nanoTime properly, etc… are all necessary to make a non-useless microbenchmark.

As I said, surprising - I had the idea as well that Buffer was more akin to a LinkedList than ArrayList (I know it’s not like those exactly, but it’s the best way I can explain - fast on next(), slow on index-based stuff)

I’ve added the code I use to test below. Additionally, the models I’ve used can be downloaded from here
GeometryUtils2 is exactly the same as the GeometryUtils from the first post, except I call rewind on all buffers and use put(x). I use currentTimeMillis, but the results (put(i,x) faster than put(x)) remain the same when using nanoTime().
[java]
package nl.pixelsmash.test;

import com.jme3.asset.AssetManager;
import com.jme3.material.Material;
import com.jme3.material.RenderState.FaceCullMode;
import com.jme3.scene.Geometry;
import com.jme3.scene.Mesh;
import com.jme3.scene.Node;
import com.jme3.scene.VertexBuffer.Type;
import com.jme3.util.BufferUtils;
import nl.pixelsmash.utils.GeometryUtils;
import nl.pixelsmash.utils.GeometryUtils2;

/**
*

  • @author PixelSmash
    */
    public class TestNode extends Node {

    protected final AssetManager assetManager;
    protected final boolean wireframe;
    protected float update = 0;
    private int currentMesh = 0;
    private int targetMesh = 1;
    private final int numMeshes = 8;
    private Geometry child;
    private Material mat;
    private Mesh mesh = new Mesh();
    private Mesh[] meshes = new Mesh[numMeshes];

    public TestNode(AssetManager assetManager, boolean wireframe) {
    super(“testNode”);
    this.assetManager = assetManager;
    this.wireframe = wireframe;
    init();
    }

    private void init() {
    for (int i = 0; i < numMeshes; i++) {
    child = (Geometry) assetManager.loadModel(“Models/0” + (i + 1) + “.j3o”);
    meshes[i] = child.getMesh();
    }

     mat = child.getMaterial();
     if (wireframe) {
         mat.getAdditionalRenderState().setWireframe(true);
     }
     mat.getAdditionalRenderState().setFaceCullMode(FaceCullMode.Off);
    
     mesh.setBuffer(Type.Position, 3, BufferUtils.clone(meshes[0].getFloatBuffer(Type.Position)));
     mesh.setBuffer(Type.TexCoord, 2, BufferUtils.clone(meshes[0].getFloatBuffer(Type.TexCoord)));
     mesh.setBuffer(Type.Index, 3, BufferUtils.clone(meshes[0].getShortBuffer(Type.Index)));
    
     child.setMesh(mesh);
     child.setLocalScale(0.05f);
     attachChild(child);
    
     if (wireframe) {
         //warmup
         for (int i = 0; i &lt; 1000; i++) {
             GeometryUtils.interpolate((i / 100f) % 1f, child, meshes[currentMesh], meshes[targetMesh], Type.Position);
         }
         //timing
         long time = System.currentTimeMillis();
         for (int i = 0; i &lt; 100000; i++) {
             GeometryUtils.interpolate((i / 100f) % 1f, child, meshes[currentMesh], meshes[targetMesh], Type.Position);
         }
         System.out.println("Time passed for put(i, x): " + (System.currentTimeMillis() - time));
    
         //warmup
         for (int i = 0; i &lt; 1000; i++) {
             GeometryUtils2.interpolate((i / 100f) % 1f, child, meshes[currentMesh], meshes[targetMesh], Type.Position);
         }
         //timing
         time = System.currentTimeMillis();
         for (int i = 0; i &lt; 100000; i++) {
             GeometryUtils2.interpolate((i / 100f) % 1f, child, meshes[currentMesh], meshes[targetMesh], Type.Position);
         }
         System.out.println("Time passed for put(x): " + (System.currentTimeMillis() - time));
         System.out.println();
     }
    

    }

    public final void simpleUpdate(float tpf) {
    update += tpf * 0.25f;

     if (update &gt;= 1.5f) {
         update -= 1.5f;
         currentMesh = targetMesh;
         targetMesh++;
         targetMesh %= meshes.length;
     }
     GeometryUtils.interpolate(Math.min(update, 1), child, meshes[currentMesh], meshes[targetMesh], Type.Position);
    

    }
    }[/java]

1 Like

You should warm up at least 3 times… sometimes 5 to 10 depending on what’s being run. And go ahead and take the timings in the warm ups and print them so that you can see if the times are dropping/converging.

Also, use nanoTime() instead of currentTimeMillis() and then divide by 1000000.0 to get milliseconds.

Also, can we see the interpolate() methods? They may be doing other things that are skewing the results. For example, the fact that you have to call it 1000 times to get significant results means that the setup inside the method is also taking a lot of the time. For a more accurate benchmark, you should use a larger mesh if possible… then you will be timing the internal loop more and the setup code less. For example, make the mesh 100x bigger and then only run the loop 10 times.

@pixelsmash said: As I said, surprising - I had the idea as well that Buffer was more akin to a LinkedList than ArrayList (I know it's not like those exactly, but it's the best way I can explain - fast on next(), slow on index-based stuff)

By the way, these buffers just use an array internally… if it’s a direct buffer then the array is in native memory. Given that, I’d expect the difference to be very small anyway.

Oh, and I just noticed that you are timing 1000 different meshes. This will also mess up the micro benchmark. If you can run your test timings with just one big mesh it will be better… and if you can do it without using AssetManager somehow then you could make the benchmark even better by not having all of JME loaded and running in the background.

Getting a microbenchmark to a point of usefulness is a tricky exercise.

Actually, to follow up on my own posts again… the only good microbenchmark is probably to create some direct buffers explicitly for testing and then just shove an array of random values into them… one loop using put(x) and one using put(i, x)… otherwise there are too many varying factors.

That’s a good point you made in your last post pspeed, I hope I implemented that correctly - results are, once more, surprising. Using a bare Eclipse Java project (JDK 7u21) I copy/pasted the relevant bits from BufferUtils, recreated just buffer section using 10 cycles of 1000 warmup runs, and 100000 runs after that for benchmarking purposes. Here’s the output I got from that:

put(i, x) warmup #1 timing: 111.993843
put(i, x) warmup #2 timing: 108.65068
put(i, x) warmup #3 timing: 111.055456
put(i, x) warmup #4 timing: 107.599988
put(i, x) warmup #5 timing: 108.442833
put(i, x) warmup #6 timing: 107.16837
put(i, x) warmup #7 timing: 107.437957
put(i, x) warmup #8 timing: 107.699722
put(i, x) warmup #9 timing: 107.32174
put(i, x) warmup #10 timing: 106.08555
put(i, x) timing: 9290.176824

put(x) warmup #1 timing: 96.137917
put(x) warmup #2 timing: 95.751555
put(x) warmup #3 timing: 95.063478
put(x) warmup #4 timing: 94.982183
put(x) warmup #5 timing: 95.278031
put(x) warmup #6 timing: 95.492583
put(x) warmup #7 timing: 95.543987
put(x) warmup #8 timing: 95.101193
put(x) warmup #9 timing: 95.415478
put(x) warmup #10 timing: 95.866374
put(x) timing: 9418.311266

Apparently, the ‘warmup’ definitely runs slower for put(i, x), but once it gets started on the big thing, it’s faster. Here’s the code I used:
[java]package nl.pixelsmash;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.FloatBuffer;
import java.util.Random;

public class Test {
public static void main(String[] args) {
FloatBuffer buffer;
float[] randomFloats = new float[1234];
Random r = new Random();
int warmup_iterations = 10;
int timed = 50000;
int amt = 10000;
int counter = 0;
long nanos;

for (int i = 0; i &lt; randomFloats.length; i++)
    randomFloats[i] = r.nextFloat();

buffer = createFloatBuffer(new float[amt]);
for (int j = 0; j &lt; warmup_iterations; j++) {
    System.out.print("put(i, x) warmup #" + (j + 1) + " timing: ");
    nanos = System.nanoTime();
    for (int k = 0; k &lt; timed; k++) {
	buffer.rewind();
	for (int i = 0; i &lt; amt; i++) {
	    buffer.put(i, randomFloats[counter++]);
	    counter %= randomFloats.length;
	}
    }
    System.out.println((System.nanoTime() - nanos) / 1000000.0);
}

System.out.print("put(i, x) timing: ");
nanos = System.nanoTime();
for (int k = 0; k &lt; timed; k++) {
    buffer.rewind();
    for (int i = 0; i &lt; amt; i++) {
	buffer.put(i, randomFloats[counter++]);
	counter %= randomFloats.length;
    }
}
System.out.println((System.nanoTime() - nanos) / 1000000.0);
System.out.println();

for (int j = 0; j &lt; warmup_iterations; j++) {
    System.out.print("put(x) warmup #" + (j + 1) + " timing: ");
    nanos = System.nanoTime();
    for (int k = 0; k &lt; timed; k++) {
	buffer.rewind();
	for (int i = 0; i &lt; amt; i++) {
	    buffer.put(randomFloats[counter++]);
	    counter %= randomFloats.length;
	}
    }
    System.out.println((System.nanoTime() - nanos) / 1000000.0);
}

System.out.print("put(x) timing: ");
nanos = System.nanoTime();
for (int k = 0; k &lt; timed; k++) {
    buffer.rewind();
    for (int i = 0; i &lt; amt; i++) {
	buffer.put(randomFloats[counter++]);
	counter %= randomFloats.length;
    }
}
System.out.println((System.nanoTime() - nanos) / 1000000.0);
}

public static FloatBuffer createFloatBuffer(float... data) {
if (data == null) {
    return null;
}
FloatBuffer buff = createFloatBuffer(data.length);
buff.clear();
buff.put(data);
buff.flip();
return buff;
}

public static FloatBuffer createFloatBuffer(int size) {
FloatBuffer buf = ByteBuffer.allocateDirect(4 * size)
	.order(ByteOrder.nativeOrder()).asFloatBuffer();
buf.clear();
return buf;
}

}[/java]

The warm up needs to be the same as the real run or it isn’t really testing the right stuff. You confuse the heck out of hotspot and the warm up is basically useless.

…also, cache the buffer.capacity() value instead of having it in every loop iteration. That extra method call could be close to 30% of your loop time (for non-native buffers)… probably closer to 10% for native buffers but it’s just a guess.

Ok, updated the code of my last post to reflect your remarks - tossed out the different warmup run amount, and because buffer.capacity is the same as the amt variable I used those. Also decreased the run amount a bit because it would take quite a lot of time - running loops for 5 mins to test something could be a bit much :wink:

Anyway, here’s the new output:
put(i, x) warmup #1 timing: 5391.720177
put(i, x) warmup #2 timing: 5402.808737
put(i, x) warmup #3 timing: 5393.76932
put(i, x) warmup #4 timing: 5409.782529
put(i, x) warmup #5 timing: 5405.543721
put(i, x) warmup #6 timing: 5409.633906
put(i, x) warmup #7 timing: 5388.026132
put(i, x) warmup #8 timing: 5385.17102
put(i, x) warmup #9 timing: 5492.817663
put(i, x) warmup #10 timing: 5410.235659
put(i, x) timing: 4695.805193

put(x) warmup #1 timing: 4654.496032
put(x) warmup #2 timing: 4615.957894
put(x) warmup #3 timing: 4746.227244
put(x) warmup #4 timing: 4664.664922
put(x) warmup #5 timing: 4712.442224
put(x) warmup #6 timing: 4775.737699
put(x) warmup #7 timing: 4780.725483
put(x) warmup #8 timing: 4790.265522
put(x) warmup #9 timing: 4817.99698
put(x) warmup #10 timing: 4766.748008
put(x) timing: 4736.359789

Only thing surprising me is the ‘radical’ drop for the actual timed put(i, x), since that’s almost 1 sec faster than the last warmup. Even so, the timing of the two is really close - 40ms difference on 50k iterations isn’t all that much.

What happens if you do 11 warm up loops? Something is fishy.

This is why micro-benchmarks are hard. You have to sample a LOT to eliminate all of the other things going on in your system. Every little thing affects them.

For example, it’s interesting to note that some of the put(x) warm-up timings are faster than even the fastest put(i,x) time.

Even for your real runs you will want to several and average them, total them, etc… Just to give a clearer picture of what the deltas are.

Also, for your warm-ups you might also try running both sets of warm-ups first. Or always timing both together in warm-up loops and real runs, ie: interleaved. Right now your second warmup is probably getting the benefit of the first warmups.

If I have time I may try to modify one of my other buffer benchmarks to test this…

But… the other thing is that as I hinted before, the difference between the two approaches will be small in this case. Small enough to be within the margin of error for a micro-benchmark like this.

Well, separated the two runs into their own functions, using their own buffers etc. Also had two separate threads run each function - out of the three program runs I did, twice put(i, x) was faster, and once put(x) was.

Either way, since for whatever reason put(i, x) seems ever so slightly faster in the code for the morphing, I’ll leave that in. Not that it matters too much, saving 250 ms on 100.000 separate calls - if you’re calling this 100.000 times each second, there are probably better optimizations around than looking at this :wink:

Edit; might be useless to mention (or not, who knows?) but the put(i, x) time jumping down still remained regardless of threading, separation and warmup-iterations changing.