Automated lip sync

Emperors Teeth · Dec 2009

I know I know... don't be lazy I hear you calling! But give me a moment to explain:

We're a very small team working on projects with massive amounts of lip-syncing. It's coming to a stage when animation is taking so long that we need to optomise to be able to maintain a good level of output. We're talking upwards of 100 animations involving lip-sync between 60 and 300+ frames long, for each project lasting about 5 weeks. Along with the rest of our workload we're beginning to struggle to output at the level required.

I'm all for quality over quantity, but the simple fact is, we need to produce more, faster. I've been researching automated lip-syncing casually for a while, and came up with these two products that seemed to do the job well:

LifeStudio (www.lifemi.com)
and
Annosoft (www.annosoft.com)

Currently, due to the nature of our final pieces, generating the lip sync inside Maya would be better. However, I am looking into the not-so-distant future where running it at game-time would be better (may be moving to Unreal Engine from XNA).

Does anyone have any knowledge or experience of these products and their output/ease-of-use? Please help!

Gallows · Dec 2009

Honestly nothing can really match that of just keying out the animations yourself and spending time articulating each sound. But in comparison, and this is just speculation on brief research. Annosoft seems like the more professional quality result. Especially in game. I'm currently getting into using UDK right now, and it's ashame FaceFX doesn't really do realtime lip syncing.

Richard Kain · Dec 2009

I've used the Annosoft command-line tool before. It's the cheapest solution. But it's accuracy is a tad bit hit-and miss. Also, it is as bare-bones as they come. You would almost certainly want to develop some kind of front-end for it if you are going to be using it in a production pipeline. It's biggest drawback is that it has a hard time with longer audio files. Anything longer than a single sentence will give it difficulty.

I don't know if you're already using something similar, but I really enjoy using Papagayo. It is not an auto-syncing program. But it does speed up the timing process considerably, and it can export timing sheets. With a simple script, you can import those timing sheets and have them automatically applied to pre-defined animation keys in Maya. Or you can even write a plug-in to integrate them into a game engine.

Mark Dygert · Dec 2009

I agree nothing is going to give you better results than doing it all by hand but what a task that is. I do a few metric tons of lip sync every year. Pretty much the same situation it sounds like.

The two apps you're looking at, don't. They are both horrible. At work we looked into them both extensively and came to the conclusion they both suck. It's not so much the results (they suck just like every automated system) but the work flow.

LifeStudio:
- Is a pain to use, especially if you're using you're own custom mesh.
- If you use one of their pre-generated "easily configurable" heads then its a little better but still it gives horrible results and in general is a huge pain even using their work flow the way they designed.
- The pre-generated heads are a mess for real time and end up looking more or less the same.
- They where pretty reluctant to let us try custom meshes with their set up, and I see why it was crazy and turned us off to the whole process.

Annosoft:
- The results are horrible and pretty much you get whatever it gives and you better be happy with it.
- As an out of the box set up you'll want to put your thumbs in your eyes and push until you feel the back of your skull.
- The work flow is crazy convoluted and is longer than it should be and not really all that flexible. This adds a lot of wasted time that could be better spent animating. Even at top speed you're probably better off animating it by hand.
- If you have a programmer that is willing to put in some extra work and build some support scripts and help with the work flow it MIGHT be manageable.
- The time between making some tweaks and viewing the final result is a big turn off and the results are sub-par.

Voice O Matic:http://www.di-o-matic.com/products/plugins/maya/VoiceOMatic/features.html
- It works with whatever mesh you have. It's blendshape driven but they have a easy write up included in the help file on converting blendshape animation to a bone based rig. Pretty easy and something you'll probably end up doing no matter what system you go with. They all seem to run on blendshapes.
- You specify a sound file and it does a pretty good job of parsing just that, you can also feed it a text file and the results get a little better. It also has a handful of parameters to help. We normally just wing it with the sound file.
- The quality is largely dependent on the blendshapes you plug in which is great because you can tweak your shapes and reparse the sound and get instant feedback.
- Test it out for yourself, the examples they have on their site are horrible and really should be redone.
- Iteration time is great since it is integrated via max/mel script.
- You can have your bone rig in the same scene making it that much faster to check final results. You can also animate the rest of the body while you animate the head.
- Its super easy to use and is used by a lot of studios. We use it from time to time when we are pinched for time. We spend time cleaning it up, but for a first pass its pretty good.

MattW · Dec 2009

We had a 'talking head' come in for some of our training sims that used annosoft's lipsync tool. Not sure what you used that was so complicated Vig, but it was pretty drag and drop. Load the sound clip, paste in the line being said and hit the sync button. Took seconds per clip, and they were usually a few paragraphs each. The results weren't always the best or most accurate, but you could tweak it if need be.

e-freak · Dec 2009

metricminds or pixomondo is working on a fullscale tool for this iirc.

Mark Dygert · Dec 2009

Annosoft isn't that easy. VOM does the same thing but allows you to keep working in your scene.

It's important to see the animation on the head with the body animation, using the blendshapes you created. You can import you're own "head" but that requires exporting all the blendshapes and importing them into Annosoft with none of the body animations. Which is time consuming with even a handful of characters. It makes refining the blends a total chore.

Good body language can more than make up for poor lip sync but not the other way around. If you're going to spend time tweaking anything it should be the body language. Also consider that you're product might be localized but it might not mean a full reprocessing of the animation, just the sound. Which means the lip sync is more than likely way off, which at that point the extra time put into body language will really pay off in all the products.

With VOM I can parse 100-150 audio clips (8min of sound) in 10-20min that includes a few revisions, thats right out of the box and not using their batch processor. When I used Annosoft right out of the box that same amount of work was 3-4 hrs and not as flexible. With some scripting both can be sped up only Annosoft would function a lot more like VOM and VOM would just be faster.

We personally decided to take the extra time and do it all by hand and use VOM when we're pinched for time.

Mark Dygert · Dec 2009

http://www.image-metrics.com/
Image Metrics does some amazing facial animation, but they're serviced based and EXPENSIVE. But holy balls can they produce amazing stuff.

http://usa.autodesk.com/adsk/servlet/pc/index?siteID=123112&id=13571400
Facial Robot will be included in the next version of XSI. It's a DIY facial mocap and animation set up. I honestly can't wait to try it out. Before it was $10k just for the plug-in ontop of XSI's lofty price tag, but now its pretty affordable.

MattW · Dec 2009

99% of what it was used for didn't have a body showing. I'm not sure what version you were using, but ours didn't take much time at all. When we have full acting going on, we animate the faces by hand. Like you said, there's no replacement for that.

Vailias · Dec 2009

If you are going to unreal you can use FaceFX A working version of it comes with UT3 and UDK.

Emperors Teeth · Dec 2009

Holy crap Vig, that was an epic series of helpful posts there! Thanks for the heads-up about Annosoft and Lifestudio. I'll give Voice-o-matic a look into; Image Metrics is out of the question unfortunately...

@Vailias: FaceFX in UDK basically looks like a rig you animate yourself inside UDK? I already made one in Maya that looks like it works pretty similarly:

edit: Having read up about FaceFX I see that it does have many useful features. If we do go into Unreal, then I'll definately look into it. Cheers

(please try to ignore the crappy blob for hair)

I fear with the cost+unreliability+time involved in setting up and using these automated systems, it won't become an option. I understand well, how hand-made lipsync will always be better, thogh in a tremendous example of irony, the amount we have to throw out along with other tasks means little opportunity to apply those personal touches. It has become a routine task, done in a very production-line manner. :whyme:

Thanks again all, especially Vig. I'll post back when I've done a little more research into the other suggestions here, and come to a decision.

Mark Dygert · Dec 2009

I think you'll find VOM pretty quick and easy to use right out of the box unlike the other options. It's pretty cheap too.

Depending on what version of Max you're using you should be able to keep all the sound files and animations in the same scene.
- 3dsmax 2008 and lower requires a 3rd party plug-in called SoundTrax to have more than one audio file active in the scene.
- 2009 + creative extentions included SoundTrax as ProSound.
- 2010 it was standard.

If you do go that route I have a pretty quick work flow for plowing through the sounds using VOM and prosound without any scripting. The next version of VOM will make better use of ProSound and probably have a batch function for sounds added to a list picked from sounds in the scene.

I like that control board layout, we use one made out of splines that drives mostly morphs through reaction manager. There is a bone rig you can't see in that is also used to wrangle the face into the right shape.

Constructing the control board manually and rigging it up.
http://www.3dtotal.com/team/Tutorials_3/video_facecontrol/facecontrol.php

Same method only not as manual.
http://www.3dtotal.com/team/Tutorials/face_rig/face_rig.php
There is a script linked to in this tutorial that automates the creation of the control board pieces.
There are a few things that bug me about this script but its really quick and fast so I forgive it =P
- The circles local up axis is Y.
- The piece stops at the edge of the spline box but if you keep dragging it technically it keeps moving. This really isn't the scripts fault but the way its rigged up. So if the box is 5x5 and you drag it up 7 units it visually stops at 5 but when you drag it back down, it won't move for the first 2 units. So be careful not to over drag it.

To control the speech morphs instead of rigging up more control board pieces I use this script:
http://www.aaachooo.com/Maxscript_Details/maxscripts_FacialMorphControls.html
It has some nice features, like zero all, and you can click drag in the visual color bar to adjust the morph. You could use the script to control all the morphs but I use it just for speech. It is name dependent so I tend to group my morphs to follow its naming conventions.

I use this instead of control board pieces because it leaves the morphs open to be controlled by other things like VOM where as if you lock the morphs down with reaction manager or wire parameters it won't allow access any other way. So the script is great at assisting you while you clean up automated sync.

Automated lip sync

Replies