I did some work on a vertex shader that did camera aligned billboard rotations, and after looking at the jME code that does similar calculations, I discovered that I could contribute some code that will improve performance.
Here's the current method in jME to rotate a camera aligned BillboardNode:
private void rotateCameraAligned(Camera camera) {
look.set(camera.getLocation()).subtractLocal(worldTranslation);
look.normalizeLocal();
float el = FastMath.asin(look.y);
float az = FastMath.atan2(look.x, look.z);
float elCos = FastMath.cos(el);
float azCos = FastMath.cos(az);
float elSin = FastMath.sin(el);
float azSin = FastMath.sin(az);
// compute the local orientation matrix for the billboard
orient.m00 = azCos;
orient.m01 = azSin * -elSin;
orient.m02 = azSin * elCos;
orient.m10 = 0;
orient.m11 = elCos;
orient.m12 = elSin;
orient.m20 = -azSin;
orient.m21 = azCos * -elSin;
orient.m22 = azCos * elCos;
// The billboard must be oriented to face the camera before it is
// transformed into the world.
worldRotation.apply(orient);
}
Here's a replacement method that uses an additional sqrt but eliminates the 6 trig functions. (I added Vector3f xzp as a member variable.) The calculated orientation matrix is the same (well, to about 7 decimal places).
private void rotateCameraAligned(Camera camera) {
look.set(cameraLocation).subtractLocal(worldTranslation);
// The xzp vector is the projection of the look vector on the xz plane
xzp.set(look.x, 0, look.z);
look.normalizeLocal();
xzp.normalizeLocal();
// Calculate the cosine of the elevation angle
float cosp = look.dot(xzp);
// compute the local orientation matrix for the billboard
orient.m00 = xzp.z;
orient.m01 = xzp.x * -look.y;
orient.m02 = xzp.x * cosp;
orient.m10 = 0;
orient.m11 = cosp;
orient.m12 = look.y;
orient.m20 = -xzp.x;
orient.m21 = xzp.z * -look.y;
orient.m22 = xzp.z * cosp;
// The billboard must be oriented to face the camera before it is
// transformed into the world.
worldRotation.apply(orient);
}