For the last few weeks (actually) I have tried again to improve my workflow for fast facial animation, and now I have reached a quality I believe is sufficient for my project, especially within time limits.
These few weeks I have remade the facial animation structure and bones dozens of times from skratch and tried many different methods for animating, yet only now do I find a quality I like.
My current project has swayed between various rendering styles throughout, ubt because "reasons", I now want to bring it into full 3d and not any prerendered styles.
The only limitation I have is that I am not a programmer, and thus any animation must be simple and bone driven. I do not care much for in-engine animation mixing, lip-sync addons and other cheap solutions. I want to bring the whole of an actors performance into an animation pipeline, bake it out to pure bone driven animation, and play in-engine on demand. Let's see how it goes.
First Testing
It started with an idea to use blender to capture markers from video in 2d and automatically translate to 3d. Everything on Youtube has shown to be awful and useless.
The first step was to analyze how each marker (a point that represent a focal point of deformation in the face) moved in that particular area. This maker was to be linked to the tip of a bone that had its root deeper in the skull and the rotation when the tip followed the marker should be good enough to deform the face properly. Wrong!
Not only is this very lazy, but it doesn't consider the translation from the actor to the character geometry very well, and tons of calibration was needed for each shot of video.
The end result made the face either stiff or with an unnatural rubbery feel.
And I found a real problem. The mouth.
The corners of the mouth and the thickness of the lips vary greatly when a person speaks. This translated poorly.
The tracker rig
A simple facial rig was created and duplicated for 2d tracking.
The tracker rig had bones that directly tracked the markers from video. To this rig was parented "balls" of 3d.
The tracker duplicate did not follow the markers from video but followd the balls. This meant that I could move the balls to calibrate the tracked animation to the chracter and compensate for bad symmetry and proportions.
The character rig followed the rotation from the tracker duplicate rig, and could lower or heighten the influence.
I kept this tech for animating eye brows, lower eye skin, upper cheeks and nose wing (area).
Second Testing
When searching for a way to deform the mouth I (again) looked up the FACS system for learning more about facial muscles. This led to the fact that this system relies on special analyzing software that recognizes facial poses on video and makes a graph for the use of muscle groups that is used by another special solving software. Expensive to say the least. Also it needed facial scans of combined FACS poses for reference so that the character could be deformed properly with "blend shapes" also known as Shape Keys in Blender. Making a dozen facial scans of an actor, cleaning them up and then making blend shapes on the character is an exptremely time consuming job only AAA developers can afford. And yes, the final result is of high quality and realism. I simply could no go there.
FACS tip
If you consider going the FACS route, there is a program called "Agisoft Stereo Scan" which lets you skip renting a Photo Dome to scan your actors.
Simply take two parallel pictures of a pose and Agisoft will make a simple 3d mesh with texture in a few clicks. It isn't highly detailed down to the pores, but it gives a general idea of the shape of the face in that pose. which can be useful I guess. It is also a great way to make character faces for background characters only seen from a distance, without any modeling or texturing at all.
Final Testing
I realised FACS was a little to high on the shelf for me. And besides I only needed Mouth deformation. The rest I could track from video.
I searched the web for a Blender addon or solution that could blend "Pose keys" freely. That did not exist, even though many point to an addon that lets you mix two poses. I need any number of mixes.
So I created the Piano Rig.
This is a rig (armature) that has simple rows of bones where each bone drives contraints on the character rig to deform.
Learning from FACS, I made a list of poses needed for all kinds of mouth deformations.
In a non-linear fashion the bones are not limited to single use. All kinds of poses can be mixed.
Mini discovery
The sound of the norwegian letter "Æ" heard in the american english word "At" looks like a letter made by A and E. To make this deformer, the Piano bone to drive the facial pose "Æ" also influences the pose bone for letter A and E.
This created a perfect Æ apperance. This didn't work for all poses, but was very helpful to limit the number of Piano bones needed. The phoneme list is as follows:
A
E
I
OU
Y
Æ/AE
Ø/OE
Å/AA
MBP
FV
KGDTS
Sticky Lips
Pucker tight
Pucker Kiss
Lip press
These few weeks I have remade the facial animation structure and bones dozens of times from skratch and tried many different methods for animating, yet only now do I find a quality I like.
My current project has swayed between various rendering styles throughout, ubt because "reasons", I now want to bring it into full 3d and not any prerendered styles.
The only limitation I have is that I am not a programmer, and thus any animation must be simple and bone driven. I do not care much for in-engine animation mixing, lip-sync addons and other cheap solutions. I want to bring the whole of an actors performance into an animation pipeline, bake it out to pure bone driven animation, and play in-engine on demand. Let's see how it goes.
First Testing
It started with an idea to use blender to capture markers from video in 2d and automatically translate to 3d. Everything on Youtube has shown to be awful and useless.
The first step was to analyze how each marker (a point that represent a focal point of deformation in the face) moved in that particular area. This maker was to be linked to the tip of a bone that had its root deeper in the skull and the rotation when the tip followed the marker should be good enough to deform the face properly. Wrong!
Not only is this very lazy, but it doesn't consider the translation from the actor to the character geometry very well, and tons of calibration was needed for each shot of video.
The end result made the face either stiff or with an unnatural rubbery feel.
And I found a real problem. The mouth.
The corners of the mouth and the thickness of the lips vary greatly when a person speaks. This translated poorly.
The tracker rig
A simple facial rig was created and duplicated for 2d tracking.
The tracker rig had bones that directly tracked the markers from video. To this rig was parented "balls" of 3d.
The tracker duplicate did not follow the markers from video but followd the balls. This meant that I could move the balls to calibrate the tracked animation to the chracter and compensate for bad symmetry and proportions.
The character rig followed the rotation from the tracker duplicate rig, and could lower or heighten the influence.
I kept this tech for animating eye brows, lower eye skin, upper cheeks and nose wing (area).
Second Testing
When searching for a way to deform the mouth I (again) looked up the FACS system for learning more about facial muscles. This led to the fact that this system relies on special analyzing software that recognizes facial poses on video and makes a graph for the use of muscle groups that is used by another special solving software. Expensive to say the least. Also it needed facial scans of combined FACS poses for reference so that the character could be deformed properly with "blend shapes" also known as Shape Keys in Blender. Making a dozen facial scans of an actor, cleaning them up and then making blend shapes on the character is an exptremely time consuming job only AAA developers can afford. And yes, the final result is of high quality and realism. I simply could no go there.
FACS tip
If you consider going the FACS route, there is a program called "Agisoft Stereo Scan" which lets you skip renting a Photo Dome to scan your actors.
Simply take two parallel pictures of a pose and Agisoft will make a simple 3d mesh with texture in a few clicks. It isn't highly detailed down to the pores, but it gives a general idea of the shape of the face in that pose. which can be useful I guess. It is also a great way to make character faces for background characters only seen from a distance, without any modeling or texturing at all.
Final Testing
I realised FACS was a little to high on the shelf for me. And besides I only needed Mouth deformation. The rest I could track from video.
I searched the web for a Blender addon or solution that could blend "Pose keys" freely. That did not exist, even though many point to an addon that lets you mix two poses. I need any number of mixes.
So I created the Piano Rig.
This is a rig (armature) that has simple rows of bones where each bone drives contraints on the character rig to deform.
Learning from FACS, I made a list of poses needed for all kinds of mouth deformations.
In a non-linear fashion the bones are not limited to single use. All kinds of poses can be mixed.
Mini discovery
The sound of the norwegian letter "Æ" heard in the american english word "At" looks like a letter made by A and E. To make this deformer, the Piano bone to drive the facial pose "Æ" also influences the pose bone for letter A and E.
This created a perfect Æ apperance. This didn't work for all poses, but was very helpful to limit the number of Piano bones needed. The phoneme list is as follows:
A
E
I
OU
Y
Æ/AE
Ø/OE
Å/AA
MBP
FV
KGDTS
Sticky Lips
Pucker tight
Pucker Kiss
Lip press
Jaw drop
Pout
Smile
Anger
(there might be more poses needed in the future but this is a good start)
So each bone in the character face that makes a smile, has a constraint listening to the smile bone for a 0-90 degreee influence, 90 being the strongest expression from neutral.
This has the following advantages:
1.- The character does not need any facial key frame animation. Everything is controlled by the piano rig.
2.- If an animation is made and a pose seems off, the pose constraints can be adjusted on the character rig, and ALL those poses repeated in the animation will be changed. Totally non destructive.
3.- ALL mouth poses can be mixed with various strength. With the rotation of a single bone, the character can turn from normal speaking to shouting or smiling gradually through the animation while expressing words. Although combining A with MBP is rather useless anyways.
In addition to the piano rig, the character Jaw and muscles around the eyes can be tracked directly from video. Also head movement. The eyes are rigged with smooth following eye lids, but the look direction needs to be keyframed by hand. Still simpler that keyframing a full face.
This is how the timeline scrubbing looks. The bones on the left are animated and the armature on the right follows.
Complexity
After all that hard work, setting up a list or tutorial on how to set the constraints on each mouth bone is a job in itself, and wasted development time. One facial bones has up to 11 constraints for rotation and translation from various other bone influences. It takes too long to explain how each of these work I want to make a game!
Final of testing. Much to improve still, but this is a proof of concept at least.