Vector3f pooling?

Heya,

  During my coding adventures I noticed that there are quite a few "scratch" vectors declared/constructed as private instance variables. Makes sense, creating objects in the update loop should be limited. As my code got more complex, I started to create more and more of these temp variables also.

  I wondered if an object pool of scratch vectors might clean up all of my private "Vector3f temp1" nice enough so I decided to give it a shot. First I made a thread-local Queue to cache objects (thread-local queue means zero thread synchronization hits, and the pools seem to stay very small for me). I then made some additions to Vector3f to test, and updated my code that used temp vectors to see how it worked out.

  I thought it was kind of cool that all "new Vector3f(…)" code could be replaced with "Vector3f.getInstance(…)" without any ill effects. Could even make the existing Vector3f constructors private after a while, that way a Vector3f would only ever be created when necessary through getInstance. Returning temporary vectors to the pool is one simple call (not sure if I like "done()" but it was the first thing that came to mind :slight_smile: If no vectors are returned it just means the application is behaving exactly as it did before getInstance was used.



Good? Bad? Horrifying? It's 3:30am here so let me know if none of this makes sense

 





Here's the thread-local Queue:



package com.jme.util;

import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;

public class ThreadLocalQueue<E> implements Queue<E> {
   private ThreadLocal<Queue<E>> instancePool = new ThreadLocal<Queue<E>>();

   public Queue<E> getInstance() {
      Queue<E> pool = instancePool.get();
      if (pool == null) {
         pool = new LinkedList<E>();
         instancePool.set(pool);
      }
      return pool;
   }

   public boolean add(E e) {
      return getInstance().add(e);
   }

   public E element() {
      return getInstance().element();
   }

   public boolean offer(E e) {
      return getInstance().offer(e);
   }

   public E peek() {
      return getInstance().peek();
   }

   public E poll() {
      return getInstance().poll();
   }

   public E remove() {
      return getInstance().remove();
   }

   public boolean addAll(Collection<? extends E> c) {
      return getInstance().addAll(c);
   }

   public void clear() {
      getInstance().clear();
   }

   public boolean contains(Object o) {
      return getInstance().contains(o);
   }

   public boolean containsAll(Collection<?> c) {
      return getInstance().containsAll(c);
   }

   public boolean isEmpty() {
      return getInstance().isEmpty();
   }

   public Iterator<E> iterator() {
      return getInstance().iterator();
   }

   public boolean remove(Object o) {
      return getInstance().remove(o);
   }

   public boolean removeAll(Collection<?> c) {
      return getInstance().removeAll(c);
   }

   public boolean retainAll(Collection<?> c) {
      return getInstance().retainAll(c);
   }

   public int size() {
      return getInstance().size();
   }

   public Object[] toArray() {
      return getInstance().toArray();
   }

   public <T> T[] toArray(T[] a) {
      return getInstance().toArray(a);
   }
}




Here are my additions to Vector3f:


   /**
    * the pool of temporary vectors for use during calculations
    */
   private static ThreadLocalQueue<Vector3f> instancePool = new ThreadLocalQueue<Vector3f>();

   /**
    * returns a Vector3f instance from the pool, or creates a new instance if the pool is empty.
    *
    * @return the next available vector instance, set to (0, 0, 0)
    *
    * @see #done()
    */
   public static Vector3f getInstance() {
      Vector3f v = instancePool.poll();
      if (v == null) {
         v = new Vector3f();
      } else {
         v.zero();
      }
      return v;
   }

   /**
    * returns a Vector3f instance from the pool, or creates a new instance if the pool is empty.
    *
    * @param x
    *            the x value of the vector.
    * @param y
    *            the y value of the vector.
    * @param z
    *            the z value of the vector.
    * @return the next available vector instance
    *
    * @see #done()
    */
   public static Vector3f getInstance(float x, float y, float z) {
      Vector3f v = instancePool.poll();
      if (v == null) {
         v = new Vector3f(x, y, z);
      } else {
         v.set(x, y, z);
      }
      return v;
   }

   /**
    * returns a Vector3f instance from the pool, or creates a new instance if the pool is empty.
    *
    * @param copy
    *            The Vector3f to copy
    * @return the next available vector instance
    *
    * @see #done()
    */
   public static Vector3f getInstance(Vector3f copy) {
      Vector3f v = instancePool.poll();
      if (v == null) {
         v = new Vector3f(copy);
      } else {
         v.set(copy);
      }
      return v;
   }

   /**
    * returns this vector to the instance pool.
    *
    * @see #getInstance()
    * @see #getInstance(Vector3f)
    * @see #getInstance(float, float, float)
    */
   public void done() {
      instancePool.offer(this);
   }




My very simple camera code to follow a spatial, using getInstance:


      Vector3f cameraPos = Vector3f.getInstance(target.getWorldTranslation());
      
      Vector3f rotationVec = Vector3f.getInstance();   
      Quaternion targetWorldRot = target.getWorldRotation();         
      if (targetOffset.x != 0) {
         targetWorldRot.getRotationColumn(0, rotationVec);
         cameraPos.addLocal(rotationVec.normalizeLocal().multLocal(targetOffset.x));
      }
      if (targetOffset.y != 0) {
         targetWorldRot.getRotationColumn(1, rotationVec);         
         cameraPos.addLocal(rotationVec.normalizeLocal().multLocal(targetOffset.y));
      }
      if (targetOffset.z != 0) {
         targetWorldRot.getRotationColumn(2, rotationVec);
         cameraPos.addLocal(rotationVec.normalizeLocal().multLocal(targetOffset.z));
      }
         
      camera.setLocation(cameraPos);            
      target.getLocalRotation().getRotationColumn(1, rotationVec);
      camera.lookAt(target.getWorldTranslation(), rotationVec);
      camera.update();
      
      cameraPos.done();
      rotationVec.done();


My usage code isn't that great. I noticed some BoundingSphere.setSphere calls create 5-6 Vector3f instances each run, which was over 10k instances for me. I changed a few lines in the Vector3f.subtract and .cross methods so that this code:



if (result == null)
   result = new Vector3f();



is now:


if (result == null)
        result = Vector3f.getInstance();



Then I broke out some of the BoundingSphere.setSphere algorithm so I could call the "done" method on all temp vectors. 10k+ object creates went into the abyss. Here's one of the setSphere methods post-optimization:


    private void setSphere(Vector3f O, Vector3f A, Vector3f B) {
        Vector3f a = A.subtract(O);
        Vector3f b = B.subtract(O);
        Vector3f acrossB = a.cross(b);       

        float Denominator = 2.0f * acrossB.dot(acrossB);

        if (Denominator == 0) {
            center.set(0, 0, 0);
            radius = 0;
        } else {
           Vector3f acrossBA = acrossB.cross(a);
            Vector3f acrossBB = b.cross(acrossB);
            Vector3f o = acrossBA.multLocal(b.lengthSquared())
                    .addLocal(acrossBB.multLocal(a.lengthSquared()))
                    .divideLocal(Denominator);
            radius = o.length() * radiusEpsilon;
            O.add(o, center);
            acrossBA.done();
            acrossBB.done();
        }
        a.done();
        b.done();
        acrossB.done();       
    }



Of course such an approach looks much cleaner in the code. But compared to the private static final temp fields it is really slow. That's why I wouldn't prefer your solution over the static fields. It can be beneficial though for vectors returned by some method. On the other hand why should the queue be thread local? Just for thread safety?

Yeah, I think we did a test a while back at the performance ramifications of going this route.

irrisor said:

Of course such an approach looks much cleaner in the code. But compared to the private static final temp fields it is really slow. That's why I wouldn't prefer your solution over the static fields. It can be beneficial though for vectors returned by some method. On the other hand why should the queue be thread local? Just for thread safety?


Yeah unfortunately there is a speed trade-off :| I wondered how much slower it was and figured I'd post my results here in case it comes up again (or so I don't forget :P )  Here are the test results:

Time in nanoseconds, Iterations=100000000
Constructors: 1492503657  Per Iteration: 14ns
getInstance:  8630431242  Per Iteration: 86ns

and the test code:


public class Vector3fSpeedTest {
   public static final int ITERATIONS = 100000000;

   public static void main(String[] args) {
      System.out.println("Time in nanoseconds, Iterations=" + ITERATIONS);
      
      long start = System.nanoTime();
      for (int i = 0; i < ITERATIONS; i++) {         
         Vector3f newVec = new Vector3f();                  
         newVec.set(2, 3, 4);         
      }
      long timeDelta = System.nanoTime() - start;
      System.out.println("Constructors: " + timeDelta + "  Per Iteration: " + timeDelta / ITERATIONS + "ns");
      
      start = System.nanoTime();
      for (int i = 0; i < ITERATIONS; i++) {         
         Vector3f newVec = Vector3f.getInstance();
         
         newVec.set(2, 3, 4);
                  
         newVec.done();         
      }
      timeDelta = System.nanoTime() - start;
      System.out.println("getInstance:  " + timeDelta + "  Per Iteration: " + timeDelta / ITERATIONS + "ns");
   }

}



I made it thread local for safety to avoid weirdness if multiple threads call getInstance, and also using thread local avoids memory barriers so the JVM can (hopefully) optimize most of it away. This made me wonder what a synchronized version would cost, although I'm running on Sun 1.6 which has nice synchronize optimizations so YMMV. In my Vector3f I switched the instancePool line to this:


private static Queue<Vector3f> instancePool = new ConcurrentLinkedQueue<Vector3f>();



Here are the new test results:

Time in nanoseconds, Iterations=100000000
Constructors: 1524151533  Per Iteration: 15ns
getInstance:  18566884897  Per Iteration: 185ns


Ouch! Ok, now to take out thread-safety:


private static Queue<Vector3f> instancePool = new LinkedList<Vector3f>();



Time in nanoseconds, Iterations=100000000
Constructors: 1481748100  Per Iteration: 14ns
getInstance:  4211297246  Per Iteration: 42ns


3x increase in cost per iteration still isn't very good, but it's better than the rest. Using this under heavy load via multiple threads would go bad quickly. Eh well, so absolute best case it looks to be 3x the cost to use getInstance instead of constructors, but I think it was a worthwhile experiment.

darkfrog said:

Yeah, I think we did a test a while back at the performance ramifications of going this route.


Doh! Well looks like I confirmed what you guys knew, but I had fun hacking around and learned a little more about JME (so thhaaat's what the difference is with subtract and subtractLocal :)

hehe, it was a fun experiment then as well. :wink: