Towards automatic transcription of spontaneous presentations

Shinozaki, Takahiro; Hori, Chiori; Furui, Sadaoki

doi:10.21437/Eurospeech.2001-129

This paper reports various investigations on recognizing spontaneous presentation speech in connection with the "Spontaneous Speech" national project started in 1999. Presentation speech uttered by 10 male speakers of approximately 4.5 hours duration has been recognized. Experimental results show that acoustic and language modeling based on an actual spontaneous speech corpus is far more effective than conventional modeling based on read speech. The recognition accuracy has a wide speaker-to-speaker variability according to the speaking rate, the number of fillers, the number of repairs, etc. It was confirmed that unsupervised speaker adaptation of acoustic models was effective to improve the recognition accuracy. The recognition accuracy for spontaneous speech is, however, still rather low, and there remains a large number of research issues.

Towards automatic transcription of spontaneous presentations

Takahiro Shinozaki, Chiori Hori, Sadaoki Furui