Voice Recognition (Proof of Concept)

Just messing around with some voice recognition stuff (using Sphinx-4 if that means anything to anyone)

coded this up in a few hours just to test a proof of concept…



http://www.youtube.com/watch?v=b_rY0rh4Z5A



The recognition is pretty ok (maybe 70-80 percent success rate) Although i could do ALOT more to improve that (using better phrases, using a headset instead of embedded laptop mic etc.) so i am confident it could be accurate enough to use in a game without too much frustration...

EDIT plus if your game somehow included the fact that your communication lines are flaky (ie enemey hacking.old technology etc) you could maintain the in-game immersion even in recognition failure :)
5 Likes

Hehe cool :slight_smile:

Nice!

Hehe, pretty cool and impressive … How can you trigger your command and how you save a command,

eg:

by a sample of wave file

or

using sound ->translate to text → command on the fly ( this is advanced !)

http://lmgtfy.com/?q=Sphinx-4&l=1

@atomix said:
Hehe, pretty cool and impressive ... How can you trigger your command and how you save a command,


Its actually done via a dictionary that has a load of words, and then it matches up phrases...

For example my statement looks something like

(unit one | unit two | all teams) (move to | halt) (alpha | bravo)

then recognizer listens and then returns a string of the full sentence that was recognized

"unit one move to bravo"

This of course is all done in a seperate thread and the recognized sentences are used to update commands given to the models

So really don't think of it as changing your speech into words... it's more like matching your speech against phrases you supply it... this is how i understand it anyway, i just imported the library and hooked it up and played around with it so it is very much a black box to me

Mmm, very interesting idea. I especially like your idea of building glitches into the framework. That sort of thing can improve stuff a lot.



At the very least an in character “*sshhhh gchsd breaking up sdsdsa repeat command” type response from the NPCs would help keep immersion although you’d have to be careful it didn’t get annoying fast.

HaHa i remember doing that on a l2j server XD, players freaked out when they saw npcs moving or attacking xD

@kblender88:

Is your game an open-source project? I’m curious about the voice recording part, are you using pure JMF for sound recording and transfer to Sphinx lexicon ?