[Solved in year 2026] Optimizing a transparent mesh?

Okay so let me start with a small demonstration. Looking at this transparent nebula from afar renders it at 144 fps (capped).

But, getting up close and personal drops the framerate by more than half.

Disregard the few new objects created by a particle emitter.

#Question

So the question is: Why does a single transparent geometry with one material and one mesh render so badly up close and so well from afar? Is there any way to make this work better? Should I just give up and wait for 2026 when GPUs can finally render this without breaking down in tears?

I know that going up close makes the frag shader run more times than usual, but itā€™s unshaded.frag! Why is the difference so damn large?


#FAQ

  • Alpha discarding? Helps slightly, but not much.

  • Custom shader slowdown? The additional calculations Iā€™m making have minuscule impact on performance and switching to an unshaded.j3md yelds about the same fps.

  • Depth testing slowdown? Disabling it has no impact on framerate, just messes up the rendering order.

  • Too large textures? Nearly the same fps from 16x16 to 2048x2048.

  • You are rendering so many quads, you should use GeometryBatchFactory to batch them! I already told you that the nebula is a single batched mesh running one material. Read the question above the FAQ.

  • Are you using any filters that could slow it down? Only the bloomfilter essentially, but moving it to translucent gives little to no performance boost and the mcve still has the problem without any filters.


#MCVE

So since this is just about making me go insane in the brain Iā€™ve made a MCVE for you guys to mess around in, no external assets required.

It spawns a single nebula (running unshaded.j3md with vertex colors) with WAAAY too many quads, but since there isnā€™t anything else to render we need to do that to show the difference.

What it does is:

  • makes the transparent material with the default jme flame.png texture as the cloud, sets same parameters as the in-game nebula materials have

  • generates the quads with randomized vertex colors and places them into a node

  • optimizes the node using GeometryBatchFactory, extracts the batched geometry and attaches it to rootnode

The class:

import com.jme3.app.SimpleApplication;
import com.jme3.material.Material;
import com.jme3.material.RenderState.BlendMode;
import com.jme3.material.RenderState.FaceCullMode;
import com.jme3.math.ColorRGBA;
import com.jme3.math.FastMath;
import com.jme3.math.Vector3f;
import com.jme3.renderer.queue.RenderQueue.Bucket;
import com.jme3.renderer.queue.RenderQueue.ShadowMode;
import com.jme3.scene.Geometry;
import com.jme3.scene.Mesh;
import com.jme3.scene.Node;
import com.jme3.scene.VertexBuffer.Type;
import com.jme3.scene.shape.Quad;
import com.jme3.util.BufferUtils;
import jme3tools.optimize.GeometryBatchFactory;

public class MCVE extends SimpleApplication{

public static void main(String[] args) {
	MCVE app = new MCVE();
	app.start();
}

@Override
public void simpleInitApp(){
	boolean batchQuads = true;
	float size = 80000;
	
	flyCam.setMoveSpeed(size*2f+1000);
	cam.setFrustumPerspective(70f, (float) cam.getWidth() / cam.getHeight(), 1f, 800000f);
	cam.setLocation(Vector3f.UNIT_Z.mult(size*4f));

	ColorRGBA template = new ColorRGBA(0.0f,0.0f,0.5f,1.2f);
	
	Quad[] plates = new Quad[4];
	
	Quad side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,0.5f,     0.5f,0.5f,     0.5f,0,    0,0});
  	plates[0]= side;
  	
	side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,1,     1,1,     1,0.5f,    0.5f,0.5f});
  	plates[1]= side;
  	
	side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,0.5f,     1,0.5f,     1,0,    0.5f,0});
  	plates[2]= side;
  	
	side = new Quad(size*2.5f,size*2.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,1,     0.5f,1,     0.5f,0.5f,    0,0.5f});
  	plates[3]= side;
  	
  	
  	Material tex = new Material(assetManager,"assets/shaders/rings/Unshaded.j3md");
	tex.setTexture("ColorMap", assetManager.loadTexture("Effects/Explosion/flame.png"));
	tex.setTransparent(true);
	tex.setBoolean("VertexColor", true);
	tex.getAdditionalRenderState().setDepthWrite(false);
	tex.getAdditionalRenderState().setDepthTest(true);
	tex.getAdditionalRenderState().setBlendMode(BlendMode.AlphaAdditive);
	tex.getAdditionalRenderState().setFaceCullMode(FaceCullMode.Off);
	tex.setFloat("AlphaDiscardThreshold", 0.01f);
	
	Node batch = new Node();
	for (int i = 0; i < size/100; i++) {
		float rand1 = (float)(Math.random()*size);
		float rand2 = (float)(Math.random()*size);
		float rand3 = (float)(Math.random()*size);
		
		Mesh m = plates[FastMath.nextRandomInt(0, 3)].clone();
		
		ColorRGBA dif = template.clone();				
		float [] vertices = new float[16];
		for(int k = 0, j = 0; j < 4; k+=4, j++) {
			
			ColorRGBA add = new ColorRGBA(FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f,0);
			add = dif.add(add.mult(0.85f));
			
			vertices[k] = add.r;
			vertices[k+1] = add.g;
			vertices[k+2] = add.b;
			vertices[k+3] = add.a;
		}
		
		m.setBuffer(Type.Color, 4, BufferUtils.createFloatBuffer(vertices));
		
		Geometry plate = new Geometry("nb",m);
		plate.setMaterial(tex);
		plate.setQueueBucket(Bucket.Transparent);
		plate.setLocalTranslation(-size*0.5f+rand1, -size*0.5f+rand2, rand3);
		plate.setShadowMode(ShadowMode.Off);
		Node n = new Node();
		n.attachChild(plate);
		n.getLocalRotation().fromAngleAxis(i*36, randomVector3f());
		batch.attachChild(n);
	}
	
	if(batchQuads)
	{
		GeometryBatchFactory.optimize(batch);
		
		Geometry part = (Geometry) batch.getChild(batch.getChildren().size()-1);
		part.setQueueBucket(Bucket.Transparent);
		part.setShadowMode(ShadowMode.Off);
		rootNode.attachChild(part);
		
		batch.detachAllChildren();
	}
	else
		rootNode.attachChild(batch);
}

public static Vector3f randomVector3f() {
	Vector3f rand = new Vector3f(FastMath.rand.nextFloat()*2f-1f,FastMath.rand.nextFloat()*2f-1f,FastMath.rand.nextFloat()*2f-1f);
	return rand.normalizeLocal();
    }
}

Or on pastebin for your copying convenience:

#Instructions:

  • Run without vsync or you wonā€™t notice the fps drop.

  • Adjust float size = 80000; to something that has your system running the starting scene at ~200 fps. Larger is more demanding.

  • Use the flycam to go inside the nebula and observe the sudden fps drop or back out to see it restored.

Just for interest:

Further on, try disabling the batching of these thousands of quads using

boolean batchQuads = false;

Note how the framerate stays pretty much the same. Why are we batching all of this stuff again?

Thanks!

Iā€™m pretty sure itā€™s because of alpha blending. Typically when an object is drawn behind another object the pixels that are obscured by the foreground object are not processed to save time, but when you have multiple overlapping transparent objects the overlapping pixels are processed again and again for each object.

So with opaque objects the fragment shader runs once for each pixel, when an object is bigger on the screen the number of times the shader is executed increases linearly. For multiple overlapping transparent objects when they take up more screen space the number of times the fragment shader, and alpha blending, is executed increases exponentially because youā€™re reading from, calculating and writing to the same pixels over and over again.

If the scene contains only opaque objects then each pixel on the screen should only be processed once, without any post pro and whatnot. When transparent objects are added and overlap then pixels start to get processed more than once and the number of times depends on how many overlapping transparent faces you have.

So youā€™re saying this is as good as it gets?

Hm, but then how does stuff like this manage to run smoothly in a browser, made from what looks like millions of transparent billboards, maintaining the same framerate no matter where the camera is. (especially the 4th one)

Side note, would making quads less transparent help? Not as in making them completely opaque but using alpha values of 1.0 in some places? Or does just tagging them as transparent immediately make the engine render the pixel again?

I tried to look at the code but it popped up this huge video and started making noise so I closed it right away. pastbin is really turning to trash.

If itā€™s a single class then just paste it in here in a code box.

That tends to indicate that you are somehow keeping the unbatched stuff around in the sceneā€¦ else there would definitely be a difference of some kind. Though JME has gotten a lot better at object management, 1000s of Geometry are bound to be worse than one.

If you have the standard simple app config then you should be able to hit F6 to bring up the frame profiler.

You know, there is this thing called an adblocker that tends to be useful around the internet. These code blocks tend to mess up allignment a lot, but here you go:

import com.jme3.app.SimpleApplication;
import com.jme3.material.Material;
import com.jme3.material.RenderState.BlendMode;
import com.jme3.material.RenderState.FaceCullMode;
import com.jme3.math.ColorRGBA;
import com.jme3.math.FastMath;
import com.jme3.math.Vector3f;
import com.jme3.renderer.queue.RenderQueue.Bucket;
import com.jme3.renderer.queue.RenderQueue.ShadowMode;
import com.jme3.scene.Geometry;
import com.jme3.scene.Mesh;
import com.jme3.scene.Node;
import com.jme3.scene.VertexBuffer.Type;
import com.jme3.scene.shape.Quad;
import com.jme3.util.BufferUtils;
import jme3tools.optimize.GeometryBatchFactory;
import util.Methods;
import util.Uncolidable;

public class MCVE extends SimpleApplication{

public static void main(String[] args) {
	MCVE app = new MCVE();
	app.start();
}

@Override
public void simpleInitApp(){
	boolean batchQuads = true;
	float size = 80000;
	
	flyCam.setMoveSpeed(size*2f+1000);
	cam.setFrustumPerspective(70f, (float) cam.getWidth() / cam.getHeight(), 1f, 800000f);
	cam.setLocation(Vector3f.UNIT_Z.mult(size*4f));

	ColorRGBA template = new ColorRGBA(0.0f,0.0f,0.5f,1.2f);
	
	Quad[] plates = new Quad[4];
	
	Quad side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,0.5f,     0.5f,0.5f,     0.5f,0,    0,0});
  	plates[0]= side;
  	
	side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,1,     1,1,     1,0.5f,    0.5f,0.5f});
  	plates[1]= side;
  	
	side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0.5f,0.5f,     1,0.5f,     1,0,    0.5f,0});
  	plates[2]= side;
  	
	side = new Quad(size*1.5f,size*1.5f);
  	side.setBuffer(Type.TexCoord, 2, new float[]{0,1,     0.5f,1,     0.5f,0.5f,    0,0.5f});
  	plates[3]= side;
  	
  	
  	Material tex = new Material(assetManager,"assets/shaders/rings/Unshaded.j3md");
	tex.setTexture("ColorMap", assetManager.loadTexture("Effects/Explosion/flame.png"));
	tex.setTransparent(true);
	tex.setBoolean("VertexColor", true);
	tex.getAdditionalRenderState().setDepthWrite(false);
	tex.getAdditionalRenderState().setDepthTest(true);
	tex.getAdditionalRenderState().setBlendMode(BlendMode.AlphaAdditive);
	tex.getAdditionalRenderState().setFaceCullMode(FaceCullMode.Off);
	tex.setFloat("AlphaDiscardThreshold", 0.01f);
	
	Node batch = new Node();
	for (int i = 0; i < size/55; i++) {
		float rand1 = (float)(Math.random()*size);
		float rand2 = (float)(Math.random()*size);
		float rand3 = (float)(Math.random()*size);
		
		Mesh m = plates[FastMath.nextRandomInt(0, 3)].clone();
		
		ColorRGBA dif = template.clone();				
		float [] vertices = new float[16];
		for(int k = 0, j = 0; j < 4; k+=4, j++) {
			
			ColorRGBA add = new ColorRGBA(FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f, FastMath.nextRandomFloat()-0.5f,0);
			add = dif.add(add.mult(0.85f));
			
			vertices[k] = add.r;
			vertices[k+1] = add.g;
			vertices[k+2] = add.b;
			vertices[k+3] = add.a;
		}
		
		m.setBuffer(Type.Color, 4, BufferUtils.createFloatBuffer(vertices));
		
		Uncolidable plate = new Uncolidable("nb",m);
		plate.setMaterial(tex);
		plate.setQueueBucket(Bucket.Transparent);
		plate.setLocalTranslation(-size*0.5f+rand1, -size*0.5f+rand2, rand3);
		plate.setShadowMode(ShadowMode.Off);
		Node n = new Node();
		n.attachChild(plate);
		n.getLocalRotation().fromAngleAxis(i*36, Methods.randomVector3f());
		batch.attachChild(n);
	}
	
	if(batchQuads)
	{
		GeometryBatchFactory.optimize(batch);
		
		Geometry part = (Geometry) batch.getChild(batch.getChildren().size()-1);
		part.setQueueBucket(Bucket.Transparent);
		part.setShadowMode(ShadowMode.Off);
		rootNode.attachChild(part);
		
		batch.detachAllChildren();
	}
	else
		rootNode.attachChild(batch);
    }
}
GeometryBatchFactory.optimize(batch);
		
Geometry part = (Geometry) batch.getChild(batch.getChildren().size()-1);
part.setQueueBucket(Bucket.Transparent);
part.setShadowMode(ShadowMode.Off);
rootNode.attachChild(part);
		
batch.detachAllChildren();

I do that, which pulls out the final geometry and should destroy everything else. I donā€™t think that any references to the quads are kept.

Or you could just make it easiest for us to help without having to click through to sites.

But good luck with your issue.

Thanks, I bet Iā€™ll need lots of it.

Okay I mean there is a slight difference, the unbatched way gives less fps outside the mesh and more inside and batched the other way around, probably due to culling or something. It pretty much evens out in the average though and the differences are resonably small.

When you are inside the mesh, I suspect you are paying the price of rendering all of your quads full screenā€¦ at least any that are not fully behind you.

Any quad that is intersecting the view frustum at all is likely being rendered full screen and it no longer matters that ā€˜only some of the quad is visibleā€™ as each pixel on the screen needs to get rendered, regardless.

So you should be able to get similar performance degradation by just creating 80000 full screen quads and adding them to the GUI node.

When rendering opaque objects in the opaque queue there is a 1:1 ratio of pixels covered by the object to fragment shader executions, unless itā€™s wholly or partially obscured in which the fragment shader runs fewer times. 10 pixels, 10 fragment executions.

For transparent objects in the transparent queue the ratio is 1:n where n is the number of overlapping objects that cover said pixel. 10 pixels, 80,000 overlapping quads, 800,000 fragment executions. Thereā€™s a big performance gap between 10 fragment executions and 800,000 and thatā€™s just for 10 pixels. Imagine 2,073,600 pixels, 1920x1080.

And no it will not matter if more pixels have an alpha value of 1. Even if you put opaque objects in the transparent queue the ratio will be 1:n, if you put transparent objects in the opaque queue the ratio will be 1:1 so long as depth check and depth write are on.

The opaque queue renders from front to back ensuring that the when an object is rendered it can check the depth buffer and if something is closer to the camera on a particular pixel the fragment shader is not executed for that pixel, alpha blended or not.

The transparent queue renders from back to front so when an object is rendered there will not be anything in front of it in the depth buffer save for whatever was previously rendered in the opaque queue because it is rendered before the transparent queue.

Thatā€™s why the two queues exist. Itā€™s more efficient to render opaque objects front to back, but doing so for transparent objects wonā€™t render anything behind the transparent object so you want to render them back to front.

Well I suppose that pretty much clears it up as to why this happens, now the question is how to improve it. Here are some ideas for you to shoot down and Iā€™m open to more:

1: Using less quads but make them larger since the larger problem is the quanitity, not the size. It would make it less good looking but I suppose I can live with that to an extent.

2: Making nebulas more spaced out so that there isnā€™t as much overlapping. Some of the larger ones that are assembled using an exponential distance scaler are easier to render than medium ones despite having more quads since there isnā€™t a ā€œcoreā€ of quads.

3: Adding opaque meshes into the mix. Something like opaque cloud meshes to obstruct the view of other transparent quads so that they donā€™t need to be rendered. Would this even help?

Thatā€™s just an abstract size value by the way, the number of quads with that size is about 1500 and the ones Iā€™m running in-game are at about 250.

Four. Subdivide your quads and batch the sub-quads by location. Instead of one giant mesh you end up with a handful of smaller, cullable ones.

Really, Iā€™m kind of surprised you are still using JMEā€™s batching. Your code is almost ideally suited to just building up the mesh yourself and the batching is just doing lots of extra unnecessary work.

Okay, but are you sure that quadrupling the amount of quads while still having to essentially render the exact same thing on the screen will make this any better? Itā€™s not like the gpu is drawing offscreen stuff, is it?

Oh, right! I was thinking about that yesterday, specifically to separate the mesh into 27 submeshes like a rubikā€™s cube depending on the location.

You overestimate my power. Besides it only runs once and Iā€™d rather use something that is proven to work right instead of adding more of my questionable code.

Why quadrupling?
His four was the answer to your
1.
2
3.

^^

And well, not drawing offscreen stuff but maybe calculating

Top right vs top left. It quadruples the number of quads and vertex shader calculations.

The nebula vert shader does dot product with the normal and camera vector so doing that 9 times instead of 4 for every cloud isnā€™t really ideal. And I have no idea how to connect the vertices right in code.

what is Uncolidable and i guess random vector is just a normed vector with random x,y,z ?

I think thereā€™s not much you can do about this.
The devilā€™s name in this situation is ā€œoverdrawsā€. Overdraws means that you render a pixel more than once. Itā€™s ā€œfill rateā€ (or fragment operations) which is today the biggest performance eater in 3D graphics. Vertext operations are quite cheap, but fragment operations are expensive. Drawing one pixel multiple times bypasses the idea of the z-buffer which only really works for opaque objects. Hence overdrawing eats processing power.

Here is one thing that would work: Render your nebula to a ā€œbillboardā€ when itā€™s quite far away. You use render-to-texture and a Quad that displays this texture. So instead of 250 pieces you get only one piece.
The downside of this: When you come closer to the nebula or fly around it at large distance, the billboard is still a flat 2D object and the player will notice it. So for shorter distances to the nebula you must render it as a 3D nebula with 250 pieces (or see idea below) and for flying around you will need to rebuild the billboard too.
The gain of this: If you have, say, 10 nebulas and 9 of them are quite far away, then you render the far away nebulas as one quad and a billboard texture. Problem is: Generating this texture costs a render-to-texture. So you should not do this very often, but only every 100 frames or so.

Note, that this will not help you when your one nebula is very close to the camera. You only may reduce the number of pieces in that case or use some really really clever shader technique which uses e.g. Perlin noise or turbulence noise and fast raytracing to render the whole nebula using only one quad which is very close to the camera. See ā€œvolumetric fogā€. The trick is to do it in as few as possible passes (i.e. only one piece, one full-screen quad).

Oops, it seems that I forgot to remove some stuff from the utils I usually use. Fixed now in the main post. Sorry bout that.

The Uncolidable is a Geometry that canā€™t be hit by any raycasts or anything to speed up collision detection in cases where a geometry is deep inside the scene graph. Aka the collideWith method always returns 0.

This is pointless. As the first image shows, rendering nebulas from outside is resonably easy work and not demanding at all. Itā€™s the up close thatā€™s the problem.

I even have cases in the game where like 3 nebulas or so may be rendered one on top of the other from afar and it doesnā€™t tank the fps at all in my experience, as long as youā€™re far away.

And Iā€™m not that smart. I canā€™t even imagine how this would work at all.

I once tried to enhance the fog filter with perlin noise and it came out as just about the ugliest thing Iā€™ve ever seen. Also how does one do raytracing in shaders? Itā€™s not like you can do Ray r = new Ray() lol.

Whatā€™s this part about? Whatā€™s the dot product for? Are your actual quads billboarded in shader or something?

I was going from the unshaded example that was still slow.

And yeah subdividing may not have much of a benefit since itā€™s theoretically all about pixel overdrawā€¦ which would happen anyway.

Currently me neither. My last shader coding was years ago. But there was this one guy posting his sandstorm shader thingy in the monthly images. So it can be done somehow and hopefully faster than those giant transparent nebula textures which are overlapping each other. Maybe ask that other guy. Or pspeed did something with raytracing in shader too he said.

I know that this must be frustrating, but transparent stuff is almost always a pain - even when using raytracer (hitting an opaque object is like ā€œokay, letā€™s stop computation hereā€). Thereā€™s concepts like the A-buffer and stuff but it stays a pain. Even games like Starcraft 2 use 1-bit-transparency when rendering the beard of the hero - which essentially turns transparency into opacity. Looked quite ugly but worked. :frowning: