ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms

David Dean, Sridha Sridharan, Robert Vogt, Michael Mason

The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the corpus and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.