I’m working on a project that is less of a game, and more about data visualization done in a 3D space. Does anyone have any recommendations on good approaches to D3 like data visualization techniques in Java using JME3? I’d love to be able to do node-link force directed layouts that can open up and collapse, pulling DOM data from say wikipedia or google maps. It doesn’t seem like bullet physics can really do magnetism/repelling between nodes right? Are there any turn-key java libraries that would be a good fit for this… I’ve been starting to look at DEX: http://www.dexvis.com/doku.php

Sorry if there’s an obvious implementation I’m missing, but I’ve been impressed by the power of JMonkey and think there’s a cool opportunity to use it for all sorts of education and data visualization applications. I used to do a lot of work in Processing, but then jumped to JMonkey when it was clear Processing didn’t have a lot of flexibility when it came to camera manipulation and working with a scene graph.

Directly representing data in the scenegrah is probably not a good idea as you shouldn’t go over about 1000-3000 single geometry objects in the scene. If you’re prepared to make your own meshes and geometry though you can get as efficient as OpenGL gets. Add some shaders to display extended data and you’re good to go. If you think along the line of “one data node, one box and a few other primitives as geometry” then nope-jME is not for you.

If you’re thinking about a force-spring style layout these are pretty trivial to write. It only gets even remotely complicated if you want simulated annealing… and then it’s only a little tricky because of 3D (must easier to detect things like orbits in 2D).

For performance, you definitely want to generate your own mesh. As Normen says, mirroring your graph in the scene graph would limit you to 1000 nodes or so. Batching them into a single large mesh (or multiple batched meshes) could get you 10s of thousands easily.

Thanks for the insight everyone. Working with a mesh definitely seems to make a lot of sense, and I like the particle approach too.

I’m going to try to play with some other physics libraries and see where the limits are. In particular, try to link up the verlet physics library from toxiclibs into jmonkey. It looks relatively straightforward and could give a lot of powerful tools for us to play with.

A physics engine is approximately 52% collision detection and 51% collision resolution. (over 100% to prove a point) The other parts are nearly insignificant in comparison… especially if you don’t need rotation (as I suspect you don’t). So if you are implementing a classic spring layout then you need the -3% of the physics engine. You don’t need collision detection and you don’t need collision resolution. In fact, you will likely have to implement repulsion on top of all of that… though springs should at least be covered.

But the math for a spring layout is pretty simple. Pseudo-code:
[java]
for( int i = 0; i < nodeCount; i++ ) {
n1 = nodes[i];
for( int j = i + 1; j < nodeCount; j++ ) {
n2 = nodes[j];
// calculate repulsive force and apply to both nodes
// generally something like 1 / distanceSquared
}
}
for( int i = 0; i < linkCount; i++ ) {
// calculate spring force using hooks law perhaps weighted by link
// apply half to each endpoint
// spring force = springConstant * linkLength
}
for( int i = 0; i < nodeCount; i++ ) {
nodes[i].pos += nodes[i].force;
}
…repeat…

Where node is something like:
class node {
Vector3f pos;
Vector3f force;
}
[/java]

Though I agree with pspeed and his solution is most likely what you need, it also depends on the kind of data you want to visualize. If you have truly large data sets, his approach might get you into trouble. Solving a spring system like this is a specific instance of what is generally called a n-body-problem, which is really tough to do efficiently for a large number of particles. In this case, you’re probably best off just using a library that does this for you.

I’m just bringing this up because you mentioned pulling data from wikipedia or google. But if you only need to visualize a few thousand nodes at a time, you’ll be fine.

@christophtraut said:
Though I agree with pspeed and his solution is most likely what you need, it also depends on the kind of data you want to visualize. If you have truly large data sets, his approach might get you into trouble. Solving a spring system like this is a specific instance of what is generally called a n-body-problem, which is really tough to do efficiently for a large number of particles. In this case, you're probably best off just using a library that does this for you.

I’m just bringing this up because you mentioned pulling data from wikipedia or google. But if you only need to visualize a few thousand nodes at a time, you’ll be fine.

Yeah, true… it works up to several 10s of thousands of nodes the way I wrote it. There is no magic for larger data sets, though, short of spatial indexing. And that’s not particularly hard, either. A sparse fixed grid has almost no overhead and can let you scale up to what will fit in RAM… (and if you have days for the layout will work if you want to cache to disk beyond RAM, too… though I don’t know how you’d display it. :))

You’re right of course. In this case, a simple grid is probably all you need. But in the general case, a simple grid is not enough and solving such a system efficiently can indeed get incredibly complicated.
A fixed grid approach generally only works for relatively sparsely populated simulations, or in general a simulation where you can guarantee an upper bound on the particle density. Otherwise, you might get situations where all particles cluster in a few cells of your grid and you get the same performance as if there were no grid at all. In this case, you need some form of apdaptive spatial subdivision which can be quite difficult to get right and still be performant.
The other thing is that you can’t always rule out far field interactions. If the forces between particles are still non-negligible over large distances in your simulation domain, then a naive grid approach will also not help since there just aren’t any particle pairs that you can ignore. In this case, things get really difficult. But there are still algorithms for solving such a system in linear time. One popular example that I know of is the fast multipole method, which essentially uses so-called multipole expansions to cluster several contributions of particles in the far-field into one contribution. And to do this efficiently, you need to impose an adaptive hierarchical subdivision on the domain and nothing is simple anymore.

But of course, this is not at all relevant to the original question, so sorry for hijacking the discussion. But I just can’t ignore it when people say n-body-problems are not too difficult, because they really are. xD

@christophtraut said:
You're right of course. In this case, a simple grid is probably all you need. But in the general case, a simple grid is not enough and solving such a system efficiently can indeed get incredibly complicated.
A fixed grid approach generally only works for relatively sparsely populated simulations, or in general a simulation where you can guarantee an upper bound on the particle density. Otherwise, you might get situations where all particles cluster in a few cells of your grid and you get the same performance as if there were no grid at all. In this case, you need some form of apdaptive spatial subdivision which can be quite difficult to get right and still be performant.
The other thing is that you can't always rule out far field interactions. If the forces between particles are still non-negligible over large distances in your simulation domain, then a naive grid approach will also not help since there just aren't any particle pairs that you can ignore. In this case, things get really difficult. But there are still algorithms for solving such a system in linear time. One popular example that I know of is the fast multipole method, which essentially uses so-called multipole expansions to cluster several contributions of particles in the far-field into one contribution. And to do this efficiently, you need to impose an adaptive hierarchical subdivision on the domain and *nothing* is simple anymore.

I have done this before and there are many down sides. The forces become an approximation and you get some strange behavior… in some cases for certain types of data it better organizes them, for others it does the opposite. It works very well for graphs that are nearly in hierarchy form already.

But really, the larger the node count the worse a spring-embedder type layout works. Especially in the cases where repulsion has longer effects since essentially everything in the middle of the graph becomes in ‘free fall’ as all influences balance out and basically overwhelm any local influence. I’ve seen twisted up clusters never properly resolve because they happen to be in the dead center of a really large layout.

P.S.: Generally, you can pick a good value for a sparse grid based on the properties of your data set. Lots and lots of edges, you need a smaller resolution. Best size is to pick something based on what the average expected repulsion will be. It’s even possible to dynamically split/merge if you mix a sparse grid with a quad tree (or octree)… but it does start to get more complicated. Still, by far the easiest is to just dynamically adjust the spring constant. Basically, set it based on how many node to node interactions there are and what their relative strength is. Initial starting position is also important if you want to avoid some really nasty initial frames.

Yeah, the more I think about it, the more I realize what a fun topic this is, though. The basic idea is so beautifully simple and you can get results almost immediately, but there is yet so much emergent complexity. I really need to find some time to play around with this.