I am a second year master’s student in the Electronic Media Lab. My research is focused on audio-visual automatic speech recognition systems. By using both audio and video of a speaker we hope to enhance the performance of automatic speech recognition systems. The system I’m working on is based on active appearance models (AAMs) and dynamic Bayesian networks (DBNs). AAMs are used for tracking the motion of facial features from which visual speech features are extracted. DBNs allow us to model interesting properties of audio-visual speech such as the asynchrony between the audio and visual stream. I’m also investigating the performance of variational Bayesian learning methods for DBN models.
The technology has a wide application area in video sharing, human-computer interaction, web conferencing, eLearning, eCommerce, gaming, and other immersive and collaborative online environments where speech is a natural component.
With the exponential growth in multimedia content available on the Internet, both produced and user-generated, there is an increasing need for semantic analysis of video. Speech is a central component in a lot of such content.
My supervisor is Professor Ben Herbst.
Overview of an audio-visual automatic speech recognition system
Twitter: @hreikerasMy contact details: firstname.lastname@example.org