Speaker: Louis-Philippe Morency, Research Scientist, Institute for Create Technologies, University of Southern California
Date: Monday, May 5 2008
Time: 2:00PM to 3:00PM
Refreshments: 1:45PM
Location: 32-D507Host: C. Mario Christoudias, Gerald Dalley, MIT CSAIL
Contact: C. Mario Christoudias, Gerald Dalley, 3-4278, 3-6095, cmch@csail.mit.edu , dalleyg@mit.edu
During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this talk I will show how sequential probabilistic models (e.g., Hidden Markov Models (HMMs) or Conditional Random Fields (CRFs)) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this talk are (1) automatic selection of the relevant features and (2) optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.
link
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.