Node-to-GPU data flow?

I’m currently thinking about design alternatives, and since I’m not too sure about what data flows from RAM to GPU under what circumstances, I’m having a hard time shooting down idiotic ideas.
This boils down to two questions:

  1. What’s being transferred under what circumstances?
  2. What are the typical latencies and data rates that I can expect?

For (1), I have a mental model and would just like where I err.
My model is like this:

  • Application prepares Node objects.
  • Application also prepares Materials, which define what shader to use, and material parameters, which get sent.
  • JME sends meshes, transforms, shaders, and parameters.
  • JME is smart about changes and does not resend stuff that hasn’t changed. E.g. modify just a mesh from one frame to the next, and only that mesh gets resent, the parameters, shaders, transforms, and other meshes are just kept in the GPU.
    How much of that is correct, or incomplete?

For (2), I guess the most important thing I’d need to know is what data rate the RAM->GPU connection can sustain, on a typical configuration.
For whatever set of typical configurations one would consider; e.g. I’m currently targetting mid-range PCs with decent but not-so-great 3D performance, and might start thinking about mid-range Android smartphones in a year or two (or five).
I guess latencies don’t come into play since 3D gaming is about keeping up a sustained data stream - I’d see any latency as a framerate drop.

Any takers?

It seems about right to me.

Other things to consider: JME tries to sort objects by material in the opaque bucket to avoid that state change (though it does send the uniforms every time).

Per object state setup and draw dispatch seem to take a disproportionate amount of time compared to sending vertex buffers. This is why it’s often better to have one large mesh (even if it’s dynamic) versus lots of smaller meshes.

That being said, both of those two things should be reduced as much as possible, ie: limit the number of objects and limit the amount of buffers sent per frame. Again, where possible. This is one of the reasons hardware skinning is interesting because it’s sending relatively less data per frame and also reusing mesh data. But on that note, on modern systems even software skinning wasn’t so bad and that was resending basically the whole mesh every frame.

The merge of GPU and CPU memory will happen sooner across the board on phones than it will on desktops methinks. The transfer rate depends on the bus its connected to, thats mostly the PCI (or AGP) bus, which sits directly behind the south bridge, so that would be your bandwidth now.

Thanks, these answers have helped a lot.
So if I have large textures that never change, that will eat up GPU memory but not bandwidth, right?
And each time a texture is changed, that will eat CPU-to-GPU bandwidths, once per frame if there’s a change each frame, right?

I’d like to get some numbers - what’s the typical usable PCI bandwidth that I can assume for a mid-range PC?
(I see that we may have bus contention there, networking and disk traffic go over that bus, too. Graphics can easily dwarf that, of course.)

I’m currently thinking about some things that reduce bandwidth if done on the Java side, at the cost of increasing the CPU load (as usualy). If I know the available bandwidth, I know at what resolutions and scene sizes I need to start worrying about that particular trade-off.

Yeah, thats what GPUs are originally made for: Storing maps in its memory and doing parallel computations on them. Its hard to say where the exact bottleneck will be with the variety of hardware available (controllers, memory etc.) but in most cases I it should be the raw PCI bus bandwidth, you should be able to get info about how many “lanes” the graphics chip has available in your setup (also for on-board chips) and derive the effective bandwidth from the PCI specs

Ultimately you could have a bottleneck anywhere from right at the memory stage if you put in cheap (i.e. slow) memory into your PC to at the PCI bus if you connect it to the wrong PCI slot on your board. Bandwidth should be sufficient for some serious processing though, many people work with “sprites” where the whole texture data basically gets uploaded each frame and they seem to run well. It is however much more efficient to do image processing on the GPU in general as thats what they were made for :slight_smile: Mostly you would only do image processing on the CPU if you can reduce the bandwidth use by that, not to take work off the GPU.

I think the most problem is not even the bandwidth, its the time necessary to prepare a transaction/copy from main to gpu memory. In my experience it rarely makes a difference (on somewhat normal desktop pcs) if you upload a 5mb or a 2 gb texture. We will see how the unifed architucture amd is going for will work around this.

Sure, a single to-GPU upload shouldn’t matter much. Though it’s reassuring to know that even 2GB don’t make a real impression :slight_smile:

I doubt that I can upload 2 GB per frame though. PCI is fast, but not THAT fast…

The real question now is: How many lanes at what PCI revision can I expect on a mid-range PC?
Hmm… the article says that “typical AGP replacement slots” carry 16 lanes, and that PCI 2.0 has been delivered since end-2007 so so we can assume that, which means a data rate of 8 GB/sec.
Memory bandwidth is somewhat less, the end of,testberichte-239899-14.html says a bit above 6 GB/s is typical (that’s 2007 data that uses DDR2).
That means about 100 MB per frame on 60 fps. More for newer hardware - no idea how much more though, so I’m sticking with these numbers for now.

I guess that explains why 2 GB of static texture data are not a problem at all, that (pretty old) configuration would upload that texture within 0.1 seconds.
For textures that keep changing on every frame, I wouldn’t want to go much beyond 50 MB. That’s still pretty generous, though not enough to entirely forget about it all.