ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Improving detection of acoustic events using audiovisual data and feature level fusion

T. Butko, C. Canton-Ferrer, C. Segura, X. Giró, C. Nadeu, J. Hernando, J. R. Casas

The detection of the acoustic events (AEs) that are naturally produced in a meeting room may help to describe the human and social activity that takes place in it. When applied to spontaneous recordings, the detection of AEs from only audio information shows a large amount of errors, which are mostly due to temporal overlapping of sounds. In this paper, a system to detect and recognize AEs using both audio and video information is presented. A feature-level fusion strategy is used, and the structure of the HMM-GMM based system considers each class separately and uses a one-against-all strategy for training. Experimental AED results with a new and rather spontaneous dataset are presented which show the advantage of the proposed approach.