[Solved in year 2026] Optimizing a transparent mesh?

MoffKalast · September 13, 2016, 9:38pm

Okay so let me start with a small demonstration. Looking at this transparent nebula from afar renders it at 144 fps (capped).

But, getting up close and personal drops the framerate by more than half.

image_20160913_222042.png1920×1080 1.59 MB

Disregard the few new objects created by a particle emitter.

#Question

So the question is: Why does a single transparent geometry with one material and one mesh render so badly up close and so well from afar? Is there any way to make this work better? Should I just give up and wait for 2026 when GPUs can finally render this without breaking down in tears?

I know that going up close makes the frag shader run more times than usual, but it’s unshaded.frag! Why is the difference so damn large?

#FAQ

Alpha discarding? Helps slightly, but not much.
Custom shader slowdown? The additional calculations I’m making have minuscule impact on performance and switching to an unshaded.j3md yelds about the same fps.
Depth testing slowdown? Disabling it has no impact on framerate, just messes up the rendering order.
Too large textures? Nearly the same fps from 16x16 to 2048x2048.
You are rendering so many quads, you should use GeometryBatchFactory to batch them! I already told you that the nebula is a single batched mesh running one material. Read the question above the FAQ.
Are you using any filters that could slow it down? Only the bloomfilter essentially, but moving it to translucent gives little to no performance boost and the mcve still has the problem without any filters.

#MCVE

So since this is just about making me go insane in the brain I’ve made a MCVE for you guys to mess around in, no external assets required.

It spawns a single nebula (running unshaded.j3md with vertex colors) with WAAAY too many quads, but since there isn’t anything else to render we need to do that to show the difference.

What it does is:

makes the transparent material with the default jme flame.png texture as the cloud, sets same parameters as the in-game nebula materials have
generates the quads with randomized vertex colors and places them into a node
optimizes the node using GeometryBatchFactory, extracts the batched geometry and attaches it to rootnode

The class:

import com.jme3.app.SimpleApplication;
import com.jme3.material.Material;
import com.jme3.material.RenderState.BlendMode;
import com.jme3.material.RenderState.FaceCullMode;
import com.jme3.math.ColorRGBA;
import com.jme3.math.FastMath;
import com.jme3.math.Vector3f;
import com.jme3.renderer.queue.RenderQueue.Bucket;
import com.jme3.renderer.queue.RenderQueue.ShadowMode;
import com.jme3.scene.Geometry;
import com.jme3.scene.Mesh;
import com.jme3.scene.Node;
import com.jme3.scene.VertexBuffer.Type;
import com.jme3.scene.shape.Quad;
import com.jme3.util.BufferUtils;
import jme3tools.optimize.GeometryBatchFactory;

public class MCVE extends SimpleApplication{

public static void main(String[] args) {
	MCVE app = new MCVE();
	app.start();
}

@Override
public void simpleInitApp(){
	boolean batchQuads = true;
	float size = 80000;
	
	flyCam.setMoveSpeed(size*2f+1000);
	cam.setFrustumPerspective(70f, (float) cam.getWidth() / cam.getHeight(), 1f, 800000f);
	cam.setLocation(Vector3f.UNIT_Z.mult(size*4f));

	ColorRGBA template = new ColorRGBA(0.0f,0.0f,0.5f,1.2f);
	
	Quad[] plates = new Quad[4];
	
	Quad side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,0.5f,     0.5f,0.5f,     0.5f,0,    0,0});
  	plates[0]= side;
  	
	side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,1,     1,1,     1,0.5f,    0.5f,0.5f});
  	plates[1]= side;
  	
	side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,0.5f,     1,0.5f,     1,0,    0.5f,0});
  	plates[2]= side;
  	
	side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,1,     0.5f,1,     0.5f,0.5f,    0,0.5f});
  	plates[3]= side;
  	
  	
  	Material tex = new Material(assetManager,"assets/shaders/rings/Unshaded.j3md");
	tex.setTexture("ColorMap", assetManager.loadTexture("Effects/Explosion/flame.png"));
	tex.setTransparent(true);
	tex.setBoolean("VertexColor", true);
	tex.getAdditionalRenderState().setDepthWrite(false);
	tex.getAdditionalRenderState().setDepthTest(true);
	tex.getAdditionalRenderState().setBlendMode(BlendMode.AlphaAdditive);
	tex.getAdditionalRenderState().setFaceCullMode(FaceCullMode.Off);
	tex.setFloat("AlphaDiscardThreshold", 0.01f);
	
	Node batch = new Node();
	for (int i = 0; i < size/100; i++) {
		float rand1 = (float)(Math.random()*size);
		float rand2 = (float)(Math.random()*size);
		float rand3 = (float)(Math.random()*size);
		
		Mesh m = plates[FastMath.nextRandomInt(0, 3)].clone();
		
		ColorRGBA dif = template.clone();				
		float [] vertices = new float[16];
		for(int k = 0, j = 0; j < 4; k+=4, j++) {
			
			ColorRGBA add = new ColorRGBA(FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f,0);
			add = dif.add(add.mult(0.85f));
			
			vertices[k] = add.r;
			vertices[k+1] = add.g;
			vertices[k+2] = add.b;
			vertices[k+3] = add.a;
		}
		
		m.setBuffer(Type.Color, 4, BufferUtils.createFloatBuffer(vertices));
		
		Geometry plate = new Geometry("nb",m);
		plate.setMaterial(tex);
		plate.setQueueBucket(Bucket.Transparent);
		plate.setLocalTranslation(-size*0.5f+rand1, -size*0.5f+rand2, rand3);
		plate.setShadowMode(ShadowMode.Off);
		Node n = new Node();
		n.attachChild(plate);
		n.getLocalRotation().fromAngleAxis(i*36, randomVector3f());
		batch.attachChild(n);
	}
	
	if(batchQuads)
	{
		GeometryBatchFactory.optimize(batch);
		
		Geometry part = (Geometry) batch.getChild(batch.getChildren().size()-1);
		part.setQueueBucket(Bucket.Transparent);
		part.setShadowMode(ShadowMode.Off);
		rootNode.attachChild(part);
		
		batch.detachAllChildren();
	}
	else
		rootNode.attachChild(batch);
}

public static Vector3f randomVector3f() {
	Vector3f rand = new Vector3f(FastMath.rand.nextFloat()*2f-1f,FastMath.rand.nextFloat()*2f-1f,FastMath.rand.nextFloat()*2f-1f);
	return rand.normalizeLocal();
    }
}

Or on pastebin for your copying convenience:

#Instructions:

Run without vsync or you won’t notice the fps drop.
Adjust float size = 80000; to something that has your system running the starting scene at ~200 fps. Larger is more demanding.
Use the flycam to go inside the nebula and observe the sudden fps drop or back out to see it restored.

Just for interest:

Further on, try disabling the batching of these thousands of quads using

boolean batchQuads = false;

Note how the framerate stays pretty much the same. Why are we batching all of this stuff again?

Thanks!

Tryder · September 13, 2016, 10:17pm

I’m pretty sure it’s because of alpha blending. Typically when an object is drawn behind another object the pixels that are obscured by the foreground object are not processed to save time, but when you have multiple overlapping transparent objects the overlapping pixels are processed again and again for each object.

So with opaque objects the fragment shader runs once for each pixel, when an object is bigger on the screen the number of times the shader is executed increases linearly. For multiple overlapping transparent objects when they take up more screen space the number of times the fragment shader, and alpha blending, is executed increases exponentially because you’re reading from, calculating and writing to the same pixels over and over again.

If the scene contains only opaque objects then each pixel on the screen should only be processed once, without any post pro and whatnot. When transparent objects are added and overlap then pixels start to get processed more than once and the number of times depends on how many overlapping transparent faces you have.

MoffKalast · September 13, 2016, 11:07pm

So you’re saying this is as good as it gets?

Hm, but then how does stuff like this manage to run smoothly in a browser, made from what looks like millions of transparent billboards, maintaining the same framerate no matter where the camera is. (especially the 4th one)

Side note, would making quads less transparent help? Not as in making them completely opaque but using alpha values of 1.0 in some places? Or does just tagging them as transparent immediately make the engine render the pixel again?

pspeed · September 13, 2016, 11:08pm

I tried to look at the code but it popped up this huge video and started making noise so I closed it right away. pastbin is really turning to trash.

If it’s a single class then just paste it in here in a code box.

That tends to indicate that you are somehow keeping the unbatched stuff around in the scene… else there would definitely be a difference of some kind. Though JME has gotten a lot better at object management, 1000s of Geometry are bound to be worse than one.

If you have the standard simple app config then you should be able to hit F6 to bring up the frame profiler.

MoffKalast · September 13, 2016, 11:12pm

You know, there is this thing called an adblocker that tends to be useful around the internet. These code blocks tend to mess up allignment a lot, but here you go:

import com.jme3.app.SimpleApplication;
import com.jme3.material.Material;
import com.jme3.material.RenderState.BlendMode;
import com.jme3.material.RenderState.FaceCullMode;
import com.jme3.math.ColorRGBA;
import com.jme3.math.FastMath;
import com.jme3.math.Vector3f;
import com.jme3.renderer.queue.RenderQueue.Bucket;
import com.jme3.renderer.queue.RenderQueue.ShadowMode;
import com.jme3.scene.Geometry;
import com.jme3.scene.Mesh;
import com.jme3.scene.Node;
import com.jme3.scene.VertexBuffer.Type;
import com.jme3.scene.shape.Quad;
import com.jme3.util.BufferUtils;
import jme3tools.optimize.GeometryBatchFactory;
import util.Methods;
import util.Uncolidable;

public class MCVE extends SimpleApplication{

public static void main(String[] args) {
	MCVE app = new MCVE();
	app.start();
}

@Override
public void simpleInitApp(){
	boolean batchQuads = true;
	float size = 80000;
	
	flyCam.setMoveSpeed(size*2f+1000);
	cam.setFrustumPerspective(70f, (float) cam.getWidth() / cam.getHeight(), 1f, 800000f);
	cam.setLocation(Vector3f.UNIT_Z.mult(size*4f));

	ColorRGBA template = new ColorRGBA(0.0f,0.0f,0.5f,1.2f);
	
	Quad[] plates = new Quad[4];
	
	Quad side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,0.5f,     0.5f,0.5f,     0.5f,0,    0,0});
  	plates[0]= side;
  	
	side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,1,     1,1,     1,0.5f,    0.5f,0.5f});
  	plates[1]= side;
  	
	side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,0.5f,     1,0.5f,     1,0,    0.5f,0});
  	plates[2]= side;
  	
	side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,1,     0.5f,1,     0.5f,0.5f,    0,0.5f});
  	plates[3]= side;
  	
  	
  	Material tex = new Material(assetManager,"assets/shaders/rings/Unshaded.j3md");
	tex.setTexture("ColorMap", assetManager.loadTexture("Effects/Explosion/flame.png"));
	tex.setTransparent(true);
	tex.setBoolean("VertexColor", true);
	tex.getAdditionalRenderState().setDepthWrite(false);
	tex.getAdditionalRenderState().setDepthTest(true);
	tex.getAdditionalRenderState().setBlendMode(BlendMode.AlphaAdditive);
	tex.getAdditionalRenderState().setFaceCullMode(FaceCullMode.Off);
	tex.setFloat("AlphaDiscardThreshold", 0.01f);
	
	Node batch = new Node();
	for (int i = 0; i < size/55; i++) {
		float rand1 = (float)(Math.random()*size);
		float rand2 = (float)(Math.random()*size);
		float rand3 = (float)(Math.random()*size);
		
		Mesh m = plates[FastMath.nextRandomInt(0, 3)].clone();
		
		ColorRGBA dif = template.clone();				
		float [] vertices = new float[16];
		for(int k = 0, j = 0; j < 4; k+=4, j++) {
			
			ColorRGBA add = new ColorRGBA(FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f,0);
			add = dif.add(add.mult(0.85f));
			
			vertices[k] = add.r;
			vertices[k+1] = add.g;
			vertices[k+2] = add.b;
			vertices[k+3] = add.a;
		}
		
		m.setBuffer(Type.Color, 4, BufferUtils.createFloatBuffer(vertices));
		
		Uncolidable plate = new Uncolidable("nb",m);
		plate.setMaterial(tex);
		plate.setQueueBucket(Bucket.Transparent);
		plate.setLocalTranslation(-size*0.5f+rand1, -size*0.5f+rand2, rand3);
		plate.setShadowMode(ShadowMode.Off);
		Node n = new Node();
		n.attachChild(plate);
		n.getLocalRotation().fromAngleAxis(i*36, Methods.randomVector3f());
		batch.attachChild(n);
	}
	
	if(batchQuads)
	{
		GeometryBatchFactory.optimize(batch);
		
		Geometry part = (Geometry) batch.getChild(batch.getChildren().size()-1);
		part.setQueueBucket(Bucket.Transparent);
		part.setShadowMode(ShadowMode.Off);
		rootNode.attachChild(part);
		
		batch.detachAllChildren();
	}
	else
		rootNode.attachChild(batch);
    }
}

GeometryBatchFactory.optimize(batch);
		
Geometry part = (Geometry) batch.getChild(batch.getChildren().size()-1);
part.setQueueBucket(Bucket.Transparent);
part.setShadowMode(ShadowMode.Off);
rootNode.attachChild(part);
		
batch.detachAllChildren();

I do that, which pulls out the final geometry and should destroy everything else. I don’t think that any references to the quads are kept.

pspeed · September 13, 2016, 11:14pm

Or you could just make it easiest for us to help without having to click through to sites.

But good luck with your issue.

MoffKalast · September 13, 2016, 11:18pm

Thanks, I bet I’ll need lots of it.

Okay I mean there is a slight difference, the unbatched way gives less fps outside the mesh and more inside and batched the other way around, probably due to culling or something. It pretty much evens out in the average though and the differences are resonably small.

pspeed · September 13, 2016, 11:50pm

When you are inside the mesh, I suspect you are paying the price of rendering all of your quads full screen… at least any that are not fully behind you.

Any quad that is intersecting the view frustum at all is likely being rendered full screen and it no longer matters that ‘only some of the quad is visible’ as each pixel on the screen needs to get rendered, regardless.

So you should be able to get similar performance degradation by just creating 80000 full screen quads and adding them to the GUI node.

Tryder · September 14, 2016, 12:15am

When rendering opaque objects in the opaque queue there is a 1:1 ratio of pixels covered by the object to fragment shader executions, unless it’s wholly or partially obscured in which the fragment shader runs fewer times. 10 pixels, 10 fragment executions.

For transparent objects in the transparent queue the ratio is 1:n where n is the number of overlapping objects that cover said pixel. 10 pixels, 80,000 overlapping quads, 800,000 fragment executions. There’s a big performance gap between 10 fragment executions and 800,000 and that’s just for 10 pixels. Imagine 2,073,600 pixels, 1920x1080.

Tryder · September 14, 2016, 12:31am

And no it will not matter if more pixels have an alpha value of 1. Even if you put opaque objects in the transparent queue the ratio will be 1:n, if you put transparent objects in the opaque queue the ratio will be 1:1 so long as depth check and depth write are on.

The opaque queue renders from front to back ensuring that the when an object is rendered it can check the depth buffer and if something is closer to the camera on a particular pixel the fragment shader is not executed for that pixel, alpha blended or not.

The transparent queue renders from back to front so when an object is rendered there will not be anything in front of it in the depth buffer save for whatever was previously rendered in the opaque queue because it is rendered before the transparent queue.

That’s why the two queues exist. It’s more efficient to render opaque objects front to back, but doing so for transparent objects won’t render anything behind the transparent object so you want to render them back to front.

MoffKalast · September 14, 2016, 9:01am

Well I suppose that pretty much clears it up as to why this happens, now the question is how to improve it. Here are some ideas for you to shoot down and I’m open to more:

1: Using less quads but make them larger since the larger problem is the quanitity, not the size. It would make it less good looking but I suppose I can live with that to an extent.

2: Making nebulas more spaced out so that there isn’t as much overlapping. Some of the larger ones that are assembled using an exponential distance scaler are easier to render than medium ones despite having more quads since there isn’t a “core” of quads.

3: Adding opaque meshes into the mix. Something like opaque cloud meshes to obstruct the view of other transparent quads so that they don’t need to be rendered. Would this even help?

That’s just an abstract size value by the way, the number of quads with that size is about 1500 and the ones I’m running in-game are at about 250.

pspeed · September 14, 2016, 10:18am

Four. Subdivide your quads and batch the sub-quads by location. Instead of one giant mesh you end up with a handful of smaller, cullable ones.

Really, I’m kind of surprised you are still using JME’s batching. Your code is almost ideally suited to just building up the mesh yourself and the batching is just doing lots of extra unnecessary work.

MoffKalast · September 14, 2016, 12:09pm

Okay, but are you sure that quadrupling the amount of quads while still having to essentially render the exact same thing on the screen will make this any better? It’s not like the gpu is drawing offscreen stuff, is it?

Oh, right! I was thinking about that yesterday, specifically to separate the mesh into 27 submeshes like a rubik’s cube depending on the location.

You overestimate my power. Besides it only runs once and I’d rather use something that is proven to work right instead of adding more of my questionable code.

MegaWolf · September 14, 2016, 12:21pm

Why quadrupling?
His four was the answer to your
1.
2
3.

^^

And well, not drawing offscreen stuff but maybe calculating

MoffKalast · September 14, 2016, 12:34pm

Top right vs top left. It quadruples the number of quads and vertex shader calculations.

The nebula vert shader does dot product with the normal and camera vector so doing that 9 times instead of 4 for every cloud isn’t really ideal. And I have no idea how to connect the vertices right in code.

MegaWolf · September 14, 2016, 1:51pm

what is Uncolidable and i guess random vector is just a normed vector with random x,y,z ?

Ogli · September 14, 2016, 1:58pm

I think there’s not much you can do about this.
The devil’s name in this situation is “overdraws”. Overdraws means that you render a pixel more than once. It’s “fill rate” (or fragment operations) which is today the biggest performance eater in 3D graphics. Vertext operations are quite cheap, but fragment operations are expensive. Drawing one pixel multiple times bypasses the idea of the z-buffer which only really works for opaque objects. Hence overdrawing eats processing power.

Here is one thing that would work: Render your nebula to a “billboard” when it’s quite far away. You use render-to-texture and a Quad that displays this texture. So instead of 250 pieces you get only one piece.
The downside of this: When you come closer to the nebula or fly around it at large distance, the billboard is still a flat 2D object and the player will notice it. So for shorter distances to the nebula you must render it as a 3D nebula with 250 pieces (or see idea below) and for flying around you will need to rebuild the billboard too.
The gain of this: If you have, say, 10 nebulas and 9 of them are quite far away, then you render the far away nebulas as one quad and a billboard texture. Problem is: Generating this texture costs a render-to-texture. So you should not do this very often, but only every 100 frames or so.

Note, that this will not help you when your one nebula is very close to the camera. You only may reduce the number of pieces in that case or use some really really clever shader technique which uses e.g. Perlin noise or turbulence noise and fast raytracing to render the whole nebula using only one quad which is very close to the camera. See “volumetric fog”. The trick is to do it in as few as possible passes (i.e. only one piece, one full-screen quad).

MoffKalast · September 14, 2016, 2:14pm

Oops, it seems that I forgot to remove some stuff from the utils I usually use. Fixed now in the main post. Sorry bout that.

The Uncolidable is a Geometry that can’t be hit by any raycasts or anything to speed up collision detection in cases where a geometry is deep inside the scene graph. Aka the collideWith method always returns 0.

This is pointless. As the first image shows, rendering nebulas from outside is resonably easy work and not demanding at all. It’s the up close that’s the problem.

I even have cases in the game where like 3 nebulas or so may be rendered one on top of the other from afar and it doesn’t tank the fps at all in my experience, as long as you’re far away.

And I’m not that smart. I can’t even imagine how this would work at all.

I once tried to enhance the fog filter with perlin noise and it came out as just about the ugliest thing I’ve ever seen. Also how does one do raytracing in shaders? It’s not like you can do Ray r = new Ray() lol.

pspeed · September 14, 2016, 2:23pm

What’s this part about? What’s the dot product for? Are your actual quads billboarded in shader or something?

I was going from the unshaded example that was still slow.

And yeah subdividing may not have much of a benefit since it’s theoretically all about pixel overdraw… which would happen anyway.

Ogli · September 14, 2016, 2:27pm

Currently me neither. My last shader coding was years ago. But there was this one guy posting his sandstorm shader thingy in the monthly images. So it can be done somehow and hopefully faster than those giant transparent nebula textures which are overlapping each other. Maybe ask that other guy. Or pspeed did something with raytracing in shader too he said.

I know that this must be frustrating, but transparent stuff is almost always a pain - even when using raytracer (hitting an opaque object is like “okay, let’s stop computation here”). There’s concepts like the A-buffer and stuff but it stays a pain. Even games like Starcraft 2 use 1-bit-transparency when rendering the beard of the hero - which essentially turns transparency into opacity. Looked quite ugly but worked.

[Solved in year 2026] Optimizing a transparent mesh?

But, getting up close and personal drops the framerate by more than half. image_20160913_222042.png1920×1080 1.59 MB Disregard the few new objects created by a particle emitter.

But, getting up close and personal drops the framerate by more than half.

image_20160913_222042.png1920×1080 1.59 MB

Disregard the few new objects created by a particle emitter.