I came to a point where I am wondering for the best practice (or is it possible at all) to realize big (huge) tree-like structures in JME. When saying tree-like, I don’t mean our green blooming friends, I mean GUI-like trees, the most basic be the file browser.
To give an example of what I’d like to do, here is a video of a wonderful upcoming indie game, which is pretty close to it.
If you look at the video, the datastructure is big, and can be even bigger. It is also highly dynamic. The guy makes his game in C++ with OpenGL and I can imagine that such visuals are possible there. However, that must mean that the same is possible in JME. So how do I approach?
Some of my thoughts:
Do I make the underlayer (the transperant lines which constiture the frame of the schematics) as single mesh? Looking at the amount of vertices in the video, I recall that such meshes will drop the framerate significantly when updated in real-time.
So do I maybe make them entirely with a shader? But is it ok passing such a big amount of data to a shader in realtime? The nodes (the green circles) are clickable and must somehow be synched with the schematics…
How do I better realize the control points? Separate quads with mouse-picking can quickly bang the limit when the tree is expanding, showing more and more nodes.
Do I better render them with a shader too? But hey, it will require much data be passed to the shader each frame too! The coordinates, the values for each node, the hierarchy…
Will doing this totally with shaders bring some serious problems? I don’t think he uses shaders entirely. Earlier in the video there is also a similar science tree, which exhibits the graph-like behavior too and provides a robust interaction. So I doubt that that is entirely made in a shader…
So does someone have some ideas on how to better do this kind of stuff?
It looks cool… but it’s not really that many lines when you think about it.
You’d have to get the data to the GPU somehow… whether mesh or texture. And as said, compared to a half-dozen bone-animated characters, you are still sending much less data, I bet.
My guess is that on his system, it’s probably interactive without much optimization. If it were me, I’d batch sections of the hierarchy but that’s about it. I’ve rendered this many points in swing before at interactive rates… not as smoothly, but I didn’t have the GPU helping me either.
What I saw is about 25 subnodes per node.
That’s an impressive fan-out, but most nodes don’t have subnodes; I’d estimate that the number of nodes per screen is four-digit, maybe five-digit.
That’s nothing that Java or a GPU would break out into sweat over. I bit they spent more time on the animations than on optimization.
I don’t know how large the entire graph is. If it’s a few tens of thousands, it’s easily kept in memory.
At millions, I’d start considering loading stuff on demand.
To keep the transitions smooth, you’d probably want to keep all nodes in memory that are one hop away from being displayed - i.e. the parent of the current display root and all its immediate childrend, and the children of all display leaves.
Not too tricky, though I’d want to spend some time shaking out bugs.
I wouldn’t worry about the GPU anywhere. You never have more than a few hundred nodes on the screen, simply because screen space is so limited.
I’d probably not do it as a shader, simply because Java is easier to debug and performance isn’t a big issue.
GPUs sweat over details and illumination.
This demo does neither, so I wouldn’t expect any performance problems. (Okay, you can always goof up )
Thanks for the answers! But I have some more questions:
Saying about batching, you say that you’d batch “sections of the hierarchy”, but why not the whole hierarchy? Why is not it simpler?
- you’re saying that 10s of thousands of objects (presumably, quads), is ok, but take a look at this thread, there I was doing like 3000 objects and it looked pretty upset. Probably you mean something different or some special technique like “batching”, mentioned by pseed?
- In this video, the whole graph is alive, it is animated. The nodes are moving and the tree-like structure made of lines is moving with it. If I am to make this structure a dynamic mesh, then how many vertces can a dynamic VBO take, updated on each frame, on an average videocard?
- what do you mean when you say “keep memory”, is it video card memory or RAM?
As far as I can see, not the whole thing is displayed at once and that some of its parts get pre-calculated and then they “develop” visually when they are ready.
Thanks for the answers! But I have some more questions:
Saying about batching, you say that you’d batch “sections of the hierarchy”, but why not the whole hierarchy? Why
Because not all of the hierarchy is expanded at once and it is easier to group things into subsections.
Your 3000 object thread is the prime example of how not to do something, really. You have 3000 cull operations, 3000 round-trips to the GPU, etc… it’s the single most expensive way you can use your hardware.
But that doesn’t mean that one giant batch is better. There is a balance.
As far as what can be updated per frame, I will once again refer to bone animation… which is updating every vertex of the animated character every frame.
Well, I have seen a scene with a three-digit framerate that has millions of cubes. No exaggeration and no crazy hardware involved.
I guess you’d have to do more/different batching to make it work in your scene - yes, batching is your friend. Organize stuff that needs the same shader&texture into one mesh, for example.
Don’t worry too much about the bandwidth for resending meshes.
Meshes submit a huge payload for their size. Just three floats per vertex, that’s 3x4 = 12 bytes per vertex. Say you’re sending quads and not optimizing anythiing, that’s 24 bytes per quad; at 100 fps and with 10 MB/s of bandwidth, that’s 100,000 bytes / frame or roughly 4,000 quads that you can update.
And you typically have more than 10 MB/s on a typical video card. PCI starts at 133 MB/s, PCI Express at 250 MB/s - you won’t get the full nominal data transfer rate since you don’t own the bus, but you’ll get a substantial fraction of that. Oh, and high-end graphics cards can transfer up to 128 GB/s, just so you know
If you still run into trouble, you can cut down on the data size.
Sending quads instead of triangles means you need just 4 instead of 6 vertices, cutting out 33% of required bandwidth. (I’m not sure whether JME can do quads directly; OpenGL can.)
If your particles don’t affect the depth buffer (i.e. they’re transparent), you can send just two vectors, position and orientation, and let the shader draw the particle. If they’re intransparent, you can still let the shader fill them into the depth buffer, it’s just more complicated.
“Kept in memory” was meant as opposed to “kept on disk and loaded on demand”.
In such scenarios, you typically have a tree in CPU RAM, and the mesh constructed in CPU RAM and transferred to the GPU. So essentially it’s both video card memory and RAM.