An online incremental speaker adaptation method using speaker-clustered initial models

Zhang, Zhipeng; Furui, Sadaoki

doi:10.21437/ICSLP.2000-630

We previously proposed an incremental speaker adaptation method combined with automatic speaker-change detection for broadcast news transcription where speakers change frequently and each of them utters a series of several sentences. In this method, the speaker change is detected using speakerindependent and speaker-adaptive Gaussian mixture models (GMMs). Both phone HMMs and GMMs are incrementally adapted to each speaker by the combination of MLLR, MAP and VFS methods using speaker-independent (SI) models as initial models. This paper proposes its improvement in which an initial model for speaker adaptation is selected from a set of models made by speaker clustering. Either cluster-dependent phone HMMs or GMMs are used to calculate the likelihood for selecting the best initial model. In a broadcast news transcription task, the proposed method significantly reduces word error rate compared with the method using SI-HMM as an initial model. Online incremental speaker adaptation results show that word error rate is reduced by 11.6% relative to the baseline system with no speaker adaptation. The method using GMMs for cluster selection requires a significantly less number of computations than that using HMMs.

An online incremental speaker adaptation method using speaker-clustered initial models

Zhipeng Zhang, Sadaoki Furui