Lipsync Tool . Advices needed

In my progress to complete our SDK, I decided to take one more further step in integrating Blender and JME.

Here is one of the components:

The lipsync tool


  1. JME3 and Nifty for 3D and interactive UI
  2. Sphinx for Acoustic feature extraction
  3. Musicg from googlecode for display waveform and spectrogram
  4. YOLO from sourceforge to display phonemem alignment and 2D presentation (print,html…)
  5. Groovy for Scripting and GroovyGraphicsBuilder for Curves, Node Display


    If anyone used Annosoft Lipsync tool, you can see this tool look pretty much the same.

    I decided to design this tool as common Lipsync and facial tool like:
  1. Read 3D Model , display, runtime manipulate
  • Head Blender file rigged with Auto-face-rig plugin ( Finish)
  • Read from Blender file using BlenderLoader to JME (Finish)
  • Mapping pose to phonemem ( Need help)
  • Runtime lib for manipulate face (Need help)

  1. Read the WAV + Display in waveform and spectrum

    -Using musicg from googleCode ( can be replace with my codes or Sphinx anytime)

  2. Produce the phonemes and text transcription, align it again to get the good results.

    -Using Sphinx, YOLO ( need help)

  3. Scripting

    -Using Groovy

  4. Interactive on screen UI

    -Using Nifty

    And as always this code will be open-source under the same license as our SDK. Please comment and help me our if you guys interested. Contact me for the code!

Very nice.

This is a very good work, my only regret is that this could have been the greatest feature ever if you had done it as a plugin for the sdk instead of a stand alone app. Too bad



It WILL be in the SDK. You know, I’m also quite good at Netbean plugin development but for now I just want to make a quick prototype - the skeleton of the program. Then when the core features :

  • multiple bone animation blending
  • run-time facial bone manipulating to synchronize the phonemem in specific time.

    are in place, I will integrate the tool to SDK. I think I got a clear view of how our SDK was coded, so it will not be the hard task.

yes, very nice, indeed.

i could use it, when it will be added to SDK.

also, in blender there is awesome way to do it, via animation curve based on sound files. also mesh keyshapes(so it’s bad they don’t work in JME) needed.

and this is why i would like to see mesh keyshapes in JME. Then even exporter would be possible.

Yep, would love this in the SDK. Also from a workflow standpoint it makes more sense to have it in there, maybe it should use swing for most of the GUI elements then as well, though seeing stuff in the scene is also good… Great work nonetheless, thanks for sharing :slight_smile:

Looks really, really good indeed. Congratulations for your work, I can clearly see the how much effort you put to develop such a great tool.

@atomix said:
It WILL be in the SDK.

Then, it's perfect. Great work.
I played once with facefx which is a commercial lipsync software that was used in dragon age, and it was awesome.
It looked pretty much like what you did, so once again, very good work

Thank a lot all of you guys,

I know Facial animation and Conversation is the most interesting in gaming nowadays ( beside of good gameplay). The framework I working on help developer some difficult job:

  • write the conversation in Groovy Script.
  • Groovy 's Builder build a Decision Tree, which will control the facial and also the camera moves in the cut scene, managed by JME Cinematic class behind the curtain.

    So basicly this tool and the CinematicEditor plugin I have done should work together in designing phrase and also in run-time phrase to procedure a complex but user friendly (devs and artist).

    I will show an state-of-art in-game cinematic with conversation, as soon as I finish the modeling. I’m also excited to hear every advices and feedback such as features or interface from you guys!

Actually whats happening isn’t that complicated, recognizing vowels in speech isn’t exactly hard. Maybe you can reduce the amount of external utilities and applications by reimplementing some stuff?

@normen: As far as I remember you’re very good at sound, or related to sound enginering… Am I right? Yes, the amount of external libs is too high I know,… The tool mainly depend in Sphinx to extract acoustic features from a trainable model, I want to use it for conversation in other language than English… Two other libs are YOLO and Musicg is optional.

  • YOLO provide a good way to procedure phonemem from text and align it manualy, so I prefer to keep YOLO and not to write my own code for that.
  • Musicg was used to provide sound visualization and basic sound manipulate like trim, copy sound data section…

    For now I decide to keep all the libs because I don’t want to write a lot of low level stuff at the first try! Please tell me you opinions I 'd like to hear them!

@atomix: Yeah… Its good to do it this way and get it working in the first place so you get an overview of what parts are actually needed. But if a software that takes an audio track and gives out vowel infos in a certain script format with timing info of the audio track could help, then I could indeed write that for the final application/SDK plugin. Still its pretty easy to “hide” external tools in the SDK actually. So it might not be at all needed. I don’t have the overview, it just sounded like a lot of libs, you have to tell :slight_smile: