About AMP Lab        Projects        Downloads        Publications        People        Links

Audio-Visual Speech Processing

Speech production and perception is inherently bimodal. Of late there has been increased interest in using the visual modality in combination with the normally used acoustic modality for improved speech processing. This field of study has gained the title of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between train and test conditions. In the presence of mismatched conditions (i.e. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. The visual speech modality is independent to most possible degradations in the acoustic modality. This independence, along with the bimodal nature of speech, naturally allows the visual speech modality to act in a complementary capacity to the acoustic speech modality. It is hoped that the integration of these two speech modalities will aid in the creation of more robust and effective speech processing applications in the future.

 

Our research effort is concentrated  on speech and speaker recognition. In particular we have been conducting active research for the visual speech modality concerning feature extraction and classifier design. The problem of audio-visual integration is also an active component of our effort.


(Integration strategies in AVSP)

Related papers:-

  • S. Lucey, T. Chen, S. Sridharan, and V. Chandran, "Integration strategies for audio-visual speech processing: Applied to text
    dependent speaker recognition," IEEE Trans. on Multimedia, 2004. [similar technical report]
  • S. Lucey, "An evaluation of visual speech features for the tasks of speech and speaker recognition," presented at International Conference of Audio- and Video-Based Person Authentication (AVBPA), pp. 260-267, Guildford, U.K., 2003. [similar technical report]
  • S. Lucey and T. Chen, "Improved audio-visual speaker recognition via the use of a hybrid combination strategy," presented at International Conference of Audio- and Video-Based Person Authentication (AVBPA), pp. 929-936, Guildford, U.K., 2003. [similar technical report]
  • S. Lucey, "Audio-visual speech processing," Ph.D. thesis, in School of Electrical & Electronic Systems Engineering. Brisbane: Queensland University of Technology, 2002, pp. 243. [thesis]

(Page is still under construction)