08.04.2013 Views

Paper Title (use style: paper title) - FER

Paper Title (use style: paper title) - FER

Paper Title (use style: paper title) - FER

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Architecture of an Animation System for Human<br />

Characters<br />

T. Pejša * and I.S. Pandžić *<br />

* University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia<br />

(tomislav.pejsa, igor.pandzic)@fer.hr<br />

Abstract—Virtual human characters are found in a broad<br />

range of applications, from movies, games and networked<br />

virtual environments to teleconferencing and tutoring<br />

applications. Such applications are available on a variety of<br />

platforms, from desktop and web to mobile devices. Highquality<br />

animation is an essential prerequisite for realistic<br />

and believable virtual characters. Though researchers and<br />

application developers have ample animation techniques for<br />

virtual characters at their disposal, implementation of these<br />

techniques into an existing application tends to be a<br />

daunting and time-consuming task. In this <strong>paper</strong> we present<br />

visage|SDK, a versatile framework for real-time character<br />

animation based on MPEG-4 FBA standard that offers a<br />

wide spectrum of features that includes animation playback,<br />

lip synchronization and facial motion tracking, while<br />

facilitating rapid production of art assets and easy<br />

integration with existing graphics engines.<br />

I. INTRODUCTION<br />

Virtual characters have long been a staple of the<br />

entertainment industry – namely, motion pictures and<br />

electronic games – but in more recent times they have also<br />

found application in numerous other areas, such as<br />

education, communications, healthcare and business,<br />

where they are found in roles of avatars, virtual tutors,<br />

assistants, companions etc. A category of virtual<br />

characters that has been an exceptionally active topic of<br />

research are embodied conversational agents (ECAs),<br />

characters that interact with real humans in direct, face-toface<br />

conversations.<br />

Virtual character applications are of great potential<br />

interest to the field of telecommunications. Wellarticulated<br />

human characters are a common feature in<br />

networked virtual environments such as Second Life,<br />

Google Lively and World of Warcraft, where they are<br />

found in roles of <strong>use</strong>r avatars and non-player characters<br />

(NPCs). A potential <strong>use</strong> of virtual characters is in video<br />

conferences, where digital avatars can be <strong>use</strong>d to replace<br />

video streams of human participants and thus conserve<br />

bandwidth. Up to recently virtual characters have been<br />

almost exclusive to desktop and browser-based network<br />

applications, but growing processing power of mobile<br />

platforms now allows their <strong>use</strong> in mobile applications as<br />

well.<br />

These developments have resulted in increasing<br />

demand for high-quality visual simulation of virtual<br />

humans. This visual simulation consists of two aspects –<br />

graphical model and animation. The latter encompasses<br />

body animation (locomotion, gestures) and facial<br />

animation (expressions, lip movements, facial gestures).<br />

While many open-source and proprietary rendering<br />

solutions deliver excellent graphical quality, their<br />

animation functionality, particularly facial animation, is<br />

often limited. Moreover, they often offer limited or no<br />

tools for production of characters and animations,<br />

requiring the <strong>use</strong>r to invest a great deal of effort into<br />

setting up a suitable art pipeline.<br />

Our system seeks to address this by delivering greater<br />

animation capabilities, while being general enough to<br />

work with any 3D engine and thus facilitating<br />

development of applications with cutting edge visuals.<br />

Our principal contributions are these:<br />

design of a character animation system architecture<br />

that supports advanced animation features and<br />

provides tools for production of new character<br />

animations assets with minimal expenditure of time<br />

and effort<br />

a model for decoupling animation, asset production<br />

and rendering to enable fast and easy integration of<br />

the system with different graphics engines and<br />

application frameworks<br />

Facial motion tracking, lip synchronization and other<br />

advanced features make visage|SDK especially suited for<br />

applications such as ECAs and low-bandwidth video<br />

communications. Due to simplicity of art asset production<br />

our system is ideal for researchers with limited resources<br />

at their disposal.<br />

We begin with a brief summary of related work and<br />

continue with an overview of our system's features,<br />

followed by a description of the underlying architecture.<br />

Finally, we discuss our future work and planned<br />

improvements to the system.<br />

II. RELATED WORK<br />

Though virtual characters have been a highly active<br />

area of research for years, little effort has been made to<br />

propose a system which would integrate various aspects of<br />

their visual simulation and be easily usable in combination<br />

with different graphics engines and for a broad range of<br />

applications.<br />

The most recent and ambitious effort is SmartBody, a<br />

modular system for animation and behavior modeling of<br />

ECAs [1]. SmartBody sports more advanced low-level<br />

animation than visage|SDK, featuring hierarchies of<br />

customizable, scheduled controllers. SmartBody also<br />

supports behavior modeling through Behavior markup<br />

language (BML) scripts [2]. However, SmartBody lacks<br />

some of visage|SDK's integrated functionality, namely<br />

face tracking, lip sync and visual text-to-speech and has<br />

no built-in capabilities for character model production. It<br />

also features a less common method of interfacing with


the renderer – namely, via TCP – whereas visage|SDK is<br />

statically or dynamically linked with the main engine.<br />

The new visage|SDK system builds upon the earlier<br />

visage framework for facial animation [3], introducing<br />

new features such as body animation support and facial<br />

motion tracking. It also greatly enhances integration<br />

capabilities by enabling easy integration into other<br />

graphics engines.<br />

Engines for simulations and electronic games typically<br />

have modular and extensible architectures, and it is<br />

common for such engines to feature third-party<br />

components. Companies such as Havok and<br />

NaturalMotion even specialize in developing modular<br />

animation and physics systems intended to be integrated<br />

into existing architectures. These architectural concepts<br />

are commonly found in non-science literature on graphics<br />

engine design and we found such resources to be very<br />

suitable references during development of our system [13]<br />

[14] [15].<br />

III. FEATURES<br />

visage|SDK includes the following core features:<br />

animation playback<br />

lip synchronization<br />

visual text-to-speech (VTTS) conversion<br />

facial motion tracking from video<br />

In addition to these, visage|SDK also includes<br />

functionality for automatic off-line production of character<br />

models and their preparation for real-time animation:<br />

face model generation from photographs<br />

morph target cloning<br />

This functionality can be integrated into the <strong>use</strong>r's own<br />

applications and it is also available as full-featured standalone<br />

tools or plug-ins for 3D modeling software.<br />

A. Animation playback<br />

visage|SDK animation system is based on MPEG-4<br />

Face and Body Animation (FBA) standard [4] [5], which<br />

defines a set of animation parameters (FBAPs) needed for<br />

detailed and efficient animation of virtual humans. These<br />

parameters can be divided into the following categories:<br />

body animation parameters (BAPs) – these<br />

parameters control individual degrees of freedom<br />

(DOFs) of the character's skeleton (e.g.,<br />

r_shoulder_abduct)<br />

low-level facial animation parameters (FAPs) –<br />

these control movements of individual facial<br />

features (e.g., open_jaw or raise_l_i_eyebrow; see<br />

Fig. 1)<br />

expression – high-level FAP which controls the<br />

facial expression (e.g., joy or sadness)<br />

viseme – high-level FAP which controls the<br />

shape of the lips during speech (e.g., TH or aa)<br />

Animation in MPEG-4 FBA is nothing more than a<br />

temporal sequence of FBAP value sets. Our system is<br />

capable of loading FBA animations from MPEG-4<br />

standard file format and applying them, frame-by-frame,<br />

to the character model. How each FBAP value is applied<br />

to the model depends on the graphics engine –<br />

visage|SDK doesn't concern itself with details of FBAP<br />

implementation.<br />

Figure 1: MPEG-4 FBA face, marked with facial definition<br />

parameters (FDPs)<br />

Figure 2: Face model imported from FaceGen and animated in<br />

visage|SDK<br />

B. Lip synchronization<br />

visage|SDK features a lip sync component for both online<br />

and off-line applications. Speech signal is analyzed<br />

and classified into visemes using neural networks (NNs).<br />

A genetic algorithm (GA) is <strong>use</strong>d to automatically train<br />

the NNs [6] [8].<br />

Our lip sync implementation is language-independent<br />

and has been successfully <strong>use</strong>d with a number of different<br />

languages, including English, Croatian, Swedish and<br />

Japanese [7].<br />

C. Visual text-to-speech<br />

visage|SDK features a simple visual text-to-speech<br />

(VTTS) system based on Microsoft SAPI. It converts the<br />

SAPI output into a sequence of FBA visemes [9].<br />

D. Facial motion tracking<br />

The facial motion tracker tracks facial movements of a<br />

real person from recorded or live video stream. The<br />

motion tracking algorithm is based on active appearance<br />

models (AAM) and doesn't require markers or special<br />

cameras – a simple, low-cost webcam is sufficient.<br />

Tracked motion is encoded as a sequence of FAP values<br />

and applied to the virtual character in real-time. In<br />

addition to this functionality, the facial motion tracker also<br />

supports automatic feature detection in static 2D images,<br />

which can be <strong>use</strong>d to further automate the process of face<br />

model generation from photographs (see next section)<br />

[10].<br />

Potential applications of the system include humancomputer<br />

interaction and teleconferencing, where it can be


<strong>use</strong>d to drive 3D avatars with the purpose of replacing<br />

video streams of human participants.<br />

E. Face model generation from photos<br />

Face model generator can be <strong>use</strong>d to rapidly generate<br />

3D face models. It takes a collection of orthogonal<br />

photographs of the head as input and <strong>use</strong>s them to deform<br />

a generic template face and produce a face model that<br />

matches the individual in the photographs [11]. Since the<br />

resulting models always have the same topology, the<br />

cloner can automatically generate morph targets for facial<br />

animation.<br />

F. Facial motion cloning<br />

The cloner copies morph targets from a source face<br />

model onto a target model [12]. For arbitrary models it<br />

requires that the <strong>use</strong>r maps a set of feature points (FDPs)<br />

to vertices of the model, though this step can be bypassed<br />

if the target model and the source model have identical<br />

topologies. The cloner also supports fully automated<br />

processing of face models generated by Singular<br />

Inversions FaceGen application (Fig. 2).<br />

visage|SDK<br />

IV. ARCHITECTURE<br />

A. Components<br />

visage|SDK has a multi-layered architecture and is<br />

composed of the following key components:<br />

Scene wrapper<br />

High-level components<br />

Application<br />

Configure<br />

actions &<br />

add them<br />

to the player<br />

LipSync Text-to-Speech<br />

Get FBAP<br />

values &<br />

blend<br />

Animation Player<br />

Apply<br />

FBAP<br />

value set<br />

Scene Wrapper<br />

Set bone<br />

transformations<br />

/ morph weights<br />

Rendering Engine<br />

Facial Motion<br />

Tracker<br />

Figure 3: visage|SDK architecture<br />

Animation player<br />

High-level components – lip sync, TTS, face<br />

tracker, character model production libraries (face<br />

model generator, facial motion cloner)<br />

Scene wrapper provides a common, rendererindependent<br />

interface to the character model in the scene.<br />

Its main task is to interpret animation parameter values<br />

and apply them to the model. Furthermore, it aggregates<br />

information about the character model pertinent to MPEG-<br />

4 FBA – most notably mappings of FBAPs to skeleton<br />

joint transformations and mesh morph targets. This highlevel<br />

model data can be loaded and serialized to an XMLbased<br />

file format called VCM (Visage Character Model).<br />

Finally, scene wrapper also provides direct access to the<br />

model's geometry (meshes and morph targets) and joint<br />

transformations, permitting model production components<br />

to work with any model irrespective of the underlying<br />

renderer.<br />

Animation player is the core runtime component of the<br />

system, tasked with playing generalized FBA actions.<br />

These actions can be animations loaded from MPEG-4<br />

.fba files, but also procedural actions such as gaze<br />

following. Animation player can play the actions in its<br />

own thread or it can be updated manually in every frame.<br />

High-level components include lip sync, text-to-speech<br />

and facial motion tracker. They are implemented as FBA<br />

actions and therefore driven by the animation player.<br />

Character model production components are meant to<br />

be <strong>use</strong>d off-line and so they don't interface with the<br />

visage|SDK<br />

Model production<br />

Facial Motion<br />

Cloner<br />

Application<br />

Get geometry<br />

Update geometry<br />

Scene Wrapper<br />

Get geometry<br />

Update geometry<br />

Rendering Engine<br />

Face Model<br />

Generator


animation player. They access the model's geometry via<br />

the common scene wrapper.<br />

B. Integration with a graphics engine<br />

When it comes to integration with graphics engines,<br />

visage|SDK is highly flexible and places only bare<br />

minimum requirements before the target engine. The<br />

engine should support basic character animation<br />

techniques – skeletal animation and mesh morphing – and<br />

the engine's API should provide the ability to manually set<br />

joint transformations and morph target blend weights.<br />

Animation is possible even if some of these requirements<br />

aren't met – for example, in absence of morph target<br />

support, a facial bone rig can be <strong>use</strong>d for facial animation.<br />

Minimal integration of the system is a trivial endeavor,<br />

amounting to subclassing and implementation of a single<br />

wrapper class representing the character model.<br />

Depending on desired functionality, certain parts of the<br />

wrapper can be left unimplemented – e.g. there is no need<br />

to provide geometry access if the developer doesn't plan to<br />

<strong>use</strong> the cloner of face model generation features in their<br />

application. The 3D model itself is loaded and handled by<br />

the engine, while FBAP mappings and other information<br />

pertaining to MPEG-4 FBA are loaded from VCM files.<br />

VCM files are tied to visage|SDK rather than the graphics<br />

engine, which means they are portable and can be re<strong>use</strong>d<br />

for a character model – or even different models with a<br />

similar structure – regardless of the underlying renderer.<br />

This greatly simplifies model production and reduces<br />

interdependence of the art pipelines.<br />

C. Component interactions<br />

A simplified overview of runtime component<br />

interactions is illustrated in Fig. 3. Animation process<br />

flows in the following manner:<br />

Application adds actions to the animation player.<br />

For example, lip sync coupled with gaze following<br />

and a set of simple repeating facial gestures (e.g.<br />

blinking).<br />

Animation player executes the animation loop.<br />

From each action it obtains the current frame of<br />

animation as a set of FBAP values, blends all the<br />

sets together and applies them to the character<br />

model via the wrapper.<br />

Scene wrapper receives the FBAP value set and<br />

interprets the values depending on the character's<br />

FBAP mappings. Typically, BAPs are converted to<br />

Euler angles and applied to bone transformations,<br />

while FAPs are interpreted as morph target blend<br />

weights.<br />

For cloner and face model generator interactions are<br />

even more straightforward and amount to obtaining and<br />

updating the model's geometry via the model wrapper.<br />

Figure 4: FBAPMapper – an OGRE-based application for<br />

mapping animation parameters<br />

D. Art pipeline<br />

As previously indicated, the art pipeline is very flexible.<br />

Characters are modeled in 3D modeling applications and<br />

exported into the target engine. Naturally, FBAPs need to<br />

be mapped to joints and morph targets of the model. This<br />

is done using a special plug-in for the 3D modeling<br />

application if one is available, otherwise it needs to be<br />

handled by a stand-alone application with appropriate 3D<br />

format support. For animations the pipeline is similar, and<br />

again a plug-in is <strong>use</strong>d for export and import.<br />

We also provide stand-alone face model and morph<br />

target production applications that <strong>use</strong> our production<br />

libraries. These applications rely on intermediate file<br />

formats (currently VRML or OGRE formats, though<br />

support for others will be added in the future) to obtain the<br />

model, while results are output via the intermediate format<br />

in combination with VCM. Fig. 4 shows a screenshot of a<br />

simple application for mapping and testing animation<br />

parameters.<br />

V. EXAMPLES<br />

We have so far successfully integrated our system with<br />

two open-source rendering engines, with more<br />

implementations on the way. The results are presented in<br />

this section.<br />

A. OGRE<br />

OGRE [16] is one of the most popular open-source,<br />

cross-platform rendering engines. Its features include a<br />

powerful object-oriented interface, support for both<br />

OpenGL and Direct3D graphical APIs, shader-driven<br />

architecture, material scripts, hardware-accelerated<br />

skeletal animation with manual bone control, hardwareaccelerated<br />

morph target animation etc. Despite<br />

challenges encountered in implementing a wrapper around<br />

certain features, we have achieves both face and body<br />

animation in OGRE (Fig. 5 and 6).


OGRE is also notable for its extensive art pipeline,<br />

supported by exporters from nearly every modeling suite<br />

in existence. We initially encountered difficulties in<br />

loading complex character models composed of multiple<br />

meshes, beca<strong>use</strong> basic OGRE doesn't support file formats<br />

capable of storing entire scenes. However, this<br />

shortcoming is rectified by the community-made<br />

DotScene loader plug-in, and a COLLADA loader is also<br />

under development by the OGRE community.<br />

B. Irrlicht<br />

Though Irrlicht [17] doesn't boast OGRE's power, it is<br />

nonetheless popular for its small size and ease of <strong>use</strong>. Its<br />

main shortcoming in regard to our system is lack of<br />

support for morph target animation. However, we were<br />

able to alleviate this by creating a face model with a bone<br />

rig and parametrizing it over MPEG-4 FBAPs, with very<br />

promising results (see Fig. 8).<br />

Unlike OGRE's art pipeline, which is based on exporter<br />

plug-ins for 3D modeling applications, Irrlicht's art<br />

pipeline relies on a large number of loaders for various file<br />

formats. We found the loader for Microsoft .x format to be<br />

the most suited to our needs and were able to successfully<br />

import several character models, both with body and face<br />

rig (Fig. 7).<br />

C. Upcoming implementations<br />

We are working on integrating visage|SDK with several<br />

other engines in concurrence. These include:<br />

StudierStube (StbES) [19] – a commercial<br />

augmented reality (AR) kit with a 3D renderer and<br />

support for character animation<br />

Figure 5: Lip sync in OGRE<br />

Figure 6: Body animation in OGRE<br />

Horde3D [18] – a lightweight, open-source<br />

renderer<br />

Panda3D – an open-source game engine known<br />

for its intuitive Python-based API<br />

Of these we find StbES to be the most promising, as it<br />

will enable us to deliver the power of visage|SDK<br />

animation system to mobile platforms and combine it with<br />

StSb's extensive AR features.<br />

VI. CONCLUSIONS AND FUTURE WORK<br />

Our system supports a variety of character animation<br />

features and facilitates rapid application development and<br />

art asset production. Its feature set makes it suitable for<br />

research and commercial applications such as embodied<br />

agents and avatars in networked virtual environments and<br />

telecommunications, while flexibility of its architecture<br />

means it can be <strong>use</strong>d on a variety of platforms, including<br />

mobile devices. We have successfully integrated it with<br />

popular graphics engines and plan to provide more<br />

implementations in near future, while simultaneously<br />

striving to make integration even easier.<br />

Furthermore, we are continually working on enhancing<br />

our system with new features. An upcoming major<br />

upgrade will introduce a new system for interactive<br />

motion controls based on parametric motion graphs and<br />

introduce character behavior modeling capabilities via<br />

BML. Our goal is to develop a universal and modular<br />

system for powerful, yet intuitive modeling of character<br />

behavior and continue using it as a backbone of our<br />

research into high-level character control and applications<br />

involving virtual humans. We plan to release a substantial


portion of our system under an open-source license.<br />

ACKNOWLEDGMENT<br />

This work was partly carried out within the research<br />

project "Embodied Conversational Agents as interface for<br />

networked and mobile services" supported bythe Ministry<br />

of Science, Education and Sports of the Republic of<br />

Croatia. It was also partly supported by Visage<br />

Technologies. Integration of visage|SDK with OGRE,<br />

Irrlicht and other engines was done by Mile Dogan,<br />

Danijel Pobi, Nikola Banko, Luka Šverko and Mario<br />

Medvedec, undergraduate students at the Faculty of<br />

Electrical Engineering and Computing in Zagreb, Croatia.<br />

RE<strong>FER</strong>ENCES<br />

[1] M. Thiebaux, A.N. Marshall, S. Marsella, M. Kallmann,<br />

"SmartBody: behavior realization for embodied conversational<br />

agents," in International Conference on Autonomous Agents,<br />

2008, vol. 1, pp. 151-158.<br />

[2] S. Kopp et al., "Towards a common framework for multimodal<br />

generation: The behavior markup language," in Intelligent Virtual<br />

Agents, 2006, pp. 205-217.<br />

[3] I.S. Pandžić, J. Ahlberg, M. Wzorek, P. Rudol, M. Mošmondor,<br />

"Faces everywhere: towards ubiquitous production and delivery of<br />

face animation," in International Conference on Mobile and<br />

Ubiquitous Multimedia, 2003, pp. 49-55.<br />

[4] I.S. Pandžić, R. Forchheimer, Ed., MPEG-4 Facial Animation -<br />

The Standard, Implementations and Applications, John Wiley &<br />

Sons, 2002.<br />

[5] ISO/IEC 14496 – MPEG-4 International Standard, Moving Picture<br />

Experts Group, www.cselt.it/mpeg<br />

Figure 7: Body animation in Irrlicht<br />

Figure 8: Facial animation in Irrlicht<br />

[6] G. Zorić, I.S. Pandžić, "Real-time language independent lip<br />

synchronization method using a genetic algorithm," in special<br />

issue of Signal Processing Journal on Multimodal Human-<br />

Computer Interfaces, vol. 86, issue 12, pp. 3644-3656, 2006.<br />

[7] A. Čereković et al., "Towards an embodied conversational agent<br />

talking in Croatian," in International Conference on<br />

Telecommunications, 2007, pp. 41-47.<br />

[8] G. Zorić, I.S. Pandžić, "A real-time lip sync system using a<br />

genetic algorithm for automatic neural network configuration," in<br />

IEEE International Conference on Multimedia & Expo, 2005, vol.<br />

6, pp. 1366-1369.<br />

[9] C. Pelachaud, "Visual Text-to-Speech" in MPEG-4 Facial<br />

Animation - The Standard, Implementations and Applications, I.S.<br />

Pandžić, R. Forchheimer, Ed., John Wiley & Sons, 2002.<br />

[10] G. Fanelli, M. Fratarcangeli, "A non-invasive approach for driving<br />

virtual talking heads from real facial movements," in 3DTV<br />

Conference, 2007, pp. 1-4.<br />

[11] M. Fratarcangeli, M. Andolfi, K. Stanković, I.S. Pandžić,<br />

"Animatable face models from uncalibrated input features,"<br />

unpublished<br />

[12] I.S. Pandžić, "Facial Motion Cloning," Graphical Models Journal,<br />

vol. 65, issue 6, pp. 385-404, 2003.<br />

[13] D. Eberly, 3D Game Engine Architecture, Morgan Kaufmann,<br />

Elsevier, 2005.<br />

[14] S. Zerbst, O. Duvel, 3D Game Engine Programming, Course<br />

Technology PTR, 2004.<br />

[15] Havok Physics Animation 6.00 User Guide, Havok, 2008.<br />

[16] OGRE Manual v1.6, 2008, http://www.ogre3d.org/docs/manual/<br />

[17] Nicolas Schulz, Horde3D Documentation, 2009,<br />

http://www.horde3d.org/docs/manual.html<br />

[18] Nikolaus Gebhardt, Irrlicht Engine 1.5 API Documentation, 2008,<br />

http://irrlicht.sourceforge.net/docu/index.html<br />

[19] Christian Doppler Laboratory, Graz University of Technology,<br />

"Handheld augmented reality," 2008, http://studierstube.icg.tugraz.ac.at/handheld_ar/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!