AI on GPU

I have looked around the web a bit but with no real success. Neither I am proficient enough with GL and shader coding to think of an answer myself or look any better. So I turn to you… Perhaps you can offer some links / articles or just your personal thoughts…



How much do you think it would be possible right now to use the GPU for AI calculations? Has any of you seen any demos doing it or do you have any ideas in what direction I should research?



My game is turn-based and much of the AI stuff is done between turns… when no graphically intensive stuff is rendered. So I thought I should not let all that power go to waste.



All comments and thoughts would be very welcome.

Honestly, it doesn't seem to make much sense to try to use the GPU for such processing.  There's a reason that GPUs are different from CPUs.  They are specifically designed for graphically related work.



Is there a reason you don't want to use the processor to do your AI?

I have heard of several papers on this subject. The main thought is that the GPU is primarily setup to be efficient in a specific type of calculation (Math calculations), not in the general calculations & operations required by AI.



It also requires a good knowledge of multi-thread programming, or better said, synching, if I understand correctly.



http://www.scientific-computing.com/hpcforscience/feature-gpu.html

http://portal.acm.org/citation.cfm?id=1242854.1243238&coll=GUIDE&dl=GUIDE

http://openvidia.sourceforge.net/





But if your AI requires heavy Math calculations, it might be interesting.

For general AI programming, it is not however.



EDIT:

Just beat me to it DarkFrog :stuck_out_tongue:

Thanks for the replies guys.



Well just as Methius said, my interest was just raised by the fact that GPU-s are much faster at some type of things and I was wondering if that could be utilized in a game. After all - there can never be enough calculation power… right?



I will probably not try to implement anything at this point as my AI is not that heavy on Math… but I still wonder how great a benefit it would make to have such an AI. I will check out those links. Thanks again Methius.


About one of the latest demos from ati…

http://ati.amd.com/developer/SIGGRAPH08Chapter03-SBOT-March_of_The_Froblins.pdf



Have a look at the chapter:

"Artificial Intelligence on GPU for Dynamic Pathfinding"



(i would say the gpu's are not good only for graphics related stuff anymore. they are already accelerating tons of other areas from sound processing to physics, like havok+nvidia… http://www.gpgpu.org/ is a good source for stuff going on)

I revoke my statement and humbly defer to MrCoder. :slight_smile:

As far as AI goes, it's probably best to use GPUs only for pathfinding. Other misc AI calculations can be multithreaded instead to take advantage of dual and quad processor systems.

One idea that I've had:

You can maybe use the GPU to calculate influence maps for the AI. (Render them to a texture)

For example maps that say where the AI should build it's defence, where the weakest point in the enemies defence are and what resources are worth taking… and so on.

When they are rendered you could use them to make good decisions then processing the AI on the CPU.

But I've never tried it so don't know if it works or is worth it.

This is kind of old but still I wanted to comment :D. Don't forget Neural Networks, those are represented as Graphs which are in turn convertible to matrixes. I strongly think that a GPUs would rule for Neural Networks where you have hundreds or thousand of nodes performing highly parallelizable operations on matrixes, it's exactly the same kind of op that are performed on graphics where you have a matrix of pixels and apply a function to each. NN have not made it much into gaming industry that I know of, even after all the hype but it sure is interesting.



Remember at the end of the last Matrix movie where a flock of machines formed giant head that talked? Pretty much like particles, imagine the possibilities of managing enemies as a flock where each individual follows some very simple rules that make the all work as a whole. Think of swarms of creatures, a battlefield with 10.000 soldiers fighting at the same time for real where you give orders and affect the result (and try to render them at the same time on the gpu… well whatever, early optimization is the root of all evil, try it out first at least).

I think I'm late for the party… but, have anyone made anything in this area? It really sounds interesting. Also, What about Cuda? I have not tested it already but it seems to be more for general computing and could be used for AI.

joliver82 said:

I think I'm late for the party... but, have anyone made anything in this area? It really sounds interesting. Also, What about Cuda? I have not tested it already but it seems to be more for general computing and could be used for AI.


I haven't seen any other work done on it around here, but CUDA may be barking up the wrong tree..  While it's becoming a more mature API, Cuda is still restricted to Nvidia hardware, which is cutting out a big chunk of folks with ATI boards and an even bigger chunk with Intel's integrated solutions.

OpenCL, which is still forthcoming in implementation but already certified by the Khronos Group, may prove to be the better option for processing on the GPU.  Like jME's java roots, this is/will also be a cross-platform solution..  If you're interested in giving it a shot, Nvidia Developers (you have to apply and be accepted... it's not a big deal, just a questionnaire) have access to drivers with the OpenCL capability in it :D http://developer.nvidia.com/page/home.html

I think if you were to launch individual CUDA kernels that you’d need to implement something along the lines of a register map like in processors using out of order execution… I, like you, am no expert in AI programming or algorithms but I’d hedge a wager that the best AI engines out there are dynamic based on the different possibilities of user interaction and other factors (network data, time, etc.)



I’ve gotta wonder if the launch overhead for all those kernels wouldn’t start to defeat the purpose… I’ve heard of people losing in excess of 50 ms on kernel initialization.  There’s a good topic in regards to this here… I found that interesting, as [in theory] adding 1,000 actors to your equation is only a tad slower than 100… but it’s still in the 80ms range.  Once you add that to an outside factor like network lag, I’m curious to see what kind of toll it takes.



Could be an interesting venture into some complex truth table style thinking regarding what part of the AI computation is possible when which part of data is available though…  Interesting as all hell, no?

sbook said:

I think if you were to launch individual CUDA kernels that you'd need to implement something along the lines of a register map like in processors using out of order execution.. I, like you, am no expert in AI programming or algorithms but I'd hedge a wager that the best AI engines out there are dynamic based on the different possibilities of user interaction and other factors (network data, time, etc.)

I've gotta wonder if the launch overhead for all those kernels wouldn't start to defeat the purpose.. I've heard of people losing in excess of 50 ms on kernel initialization.  There's a good topic in regards to this here.. I found that interesting, as [in theory] adding 1,000 actors to your equation is only a tad slower than 100... but it's still in the 80ms range.  Once you add that to an outside factor like network lag, I'm curious to see what kind of toll it takes.

Could be an interesting venture into some complex truth table style thinking regarding what part of the AI computation is possible when which part of data is available though...  Interesting as all hell, no?


Well I don't know about that guy, we've been seeing really fast kernel launches in our tests. Actually most tests where we had a moderate sample size (e.g. 200 samples per curve, 5 bezier curves, that would be 5 blocks and 200 threads per), with three kernel launches and total execution time would be less than 50 ms. The computations we make per thread is a O(N^2) running time, so it's not exactly fast either. The time to actually launch the kernel were tiny compared to that.

Granted, that's 5 blocks...that guy was launching a pretty large grid size. With 10,000 blocks to schedule for the multi-processors, well that's quite a lot! Also another thing to note is that's an old thread.

While our CUDA implementation took 50 ms to complete the mentioned test, our original C code took nearly 5 seconds. If updating AI was anything comparable, I'd say that's pretty darn good. How long would it normally take to update entity AI's without hardware acceleration?

Anyways, yeah it would be interesting to actually test this, though that's quite a lot of unchartered territory heh.
Starnick said:

While our CUDA implementation took 50 ms to complete the mentioned test, our original C code took nearly 5 seconds. If updating AI was anything comparable, I'd say that's pretty darn good. How long would it normally take to update entity AI's without hardware acceleration?

Anyways, yeah it would be interesting to actually test this, though that's quite a lot of unchartered territory heh.


I think you hit the nail on the head where I got too far ahead of myself...  That's a nice speed boost :D  And yeah, this would be quite interesting..  ::Adds AI to laundry list of OpenCL & CUDA tinker toys::

stumbled upon these while doing research on formations for boids



http://boid.alessandrosilva.com/sbgames08-boids.pdf

http://www.plm.eecs.uni-kassel.de/plm/fileadmin/pm/publications/chuerkamp/Masterarbeiten/Jens_Breitbart_thesis.pdf

http://www.plm.eecs.uni-kassel.de/plm/fileadmin/pm/publications/breitbart/hpcs09.pdf

http://www.unibas.it/erra/Papers/egita06.pdf