Abstract
“Listening to several things at once” is a people’s dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Okuno, H.G., Nakatani, T., Kawabata, T.: Interfacing sound stream segregation to speech recognition systems — preliminary results of listening to several things at the same time. In: Proceedings of 13th National Conference on Artificial Intelligence (AAAI 1996), pp. 1082–1089 (1996)
Okuno, H.G., Nakatani, T., Kawabata, T.: Understanding three simultaneous speakers. In: Proc. of IJCAI 1997, pp. 30–35 (1997)
AURORA: http://www.elda.fr/proj/aurora1.html , http://www.elda.fr/proj/aurora2.html
Pearce, D.: Developing the ETSI AURORA advanced distributed speech recognition front-end & what next. In: Proc. of Eurospeech 2001, ESCA (2001)
Lippmann, R.P., Martin, E.A., Paul, D.B.: Multi-style training for robust isolated-word speech recognition. In: Proc. of ICASSP 1987, pp. 705–708. IEEE, Los Alamitos (1987)
Blanchet, M., Boudy, J., Lockwood, P.: Environment adaptation for speech recognition in noise. In: Proc. of EUSIPCO 1992, vol. VI, pp. 391–394 (1992)
Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. Signal Processing Magazine 22(5), 101–116 (2005)
Barker, J., Cooke, M., Green, P.: Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise. In: Proc. of Eurospeech-2001, ESCA, pp. 213–216 (2001)
Cooke, M.P., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34(3), 267–285 (2000)
Renevey, P., Vetter, R., Kraus, J.: Robust speech recognition using missing feature theory and vector quantization. In: Proc. of 7th European Conference on Speech Communication Technology (Eurospeech 2001), vol. 2, pp. 1107–1110. ESCA (2001)
Yamamoto, S., Valin, J.M., Nakadai, K., Ogata, T., Okuno, H.G.: Enhanced robot speech recognition based on microphone array source separation and missing feature theory. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA 2005), pp. 1489–1494. IEEE, Los Alamitos (2005)
Yamamoto, S., Nakadai, K., Valin, J.M., Rouat, J., Michaud, F., Komatani, K., Ogata, T., Okuno, H.G.: Making a robot recognize three simultaneous sentences in real-time. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), pp. 897–902. IEEE, Los Alamitos (2005)
Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice Hall, Inc., Englewood Cliffs (2002)
Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Inc., Chichester (2001)
Nakadai, K., Okuno, H.G., Kitano, H.: Robot recognizes three simultaneous speech by active audition. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA-2003), pp. 398–403. IEEE, Los Alamitos (2003)
Julius, M.: http://www.furui.cs.titech.ac.jp/mband_julius/
Kawahara, T., Lee, A.: Free software toolkit for japanese large vocabulary continuous speech recognition. In: International Conference on Spoken Language Processing (ICSLP), vol. 4, pp. 476–479 (2000)
Murata, N.: An approach to blind source separation based on temporal structure of speech signals. In: Neurocomputing, pp. 1–24 (2001)
Nakadai, K., Hidai, K., Mizoguchi, H., Okuno, H.G., Kitano, H.: Real-time auditory and visual multiple-object tracking for robots. In: Proc. of IJCAI 2001, pp. 1424–1432 (2001)
Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J.M., Takeda, R., Komatani, K., Ogata, T., Okuno, H.G.: Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 207–217. Springer, Heidelberg (2006)
Parra, L.C., Alvino, C.V.: Geometric source separation: Mergin convolutive source separation with geometric beamforming. IEEE Transactions on Speech and Audio Processing 10(6), 352–362 (2002)
Valin, J.M., Rouat, J., Michaud, F.: Enhanced robot audition based on microphone array source separation with post-filter. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2004), pp. 2123–2128. IEEE, Los Alamitos (2004)
Cohen, I., Berdugo, B.: Microphone array post-filtering for non-stationary noise suppression. In: ICASSP 2002, pp. 901–904 (2002)
Ephraim, Y., Malah, D.: Speech enhancement using minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-33(2), 443–445 (1985)
Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Processing 81(2), 2403–2418 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yamamoto, S. et al. (2006). Recognition of Simultaneous Speech by Estimating Reliability of Separated Signals for Robot Audition. In: Yang, Q., Webb, G. (eds) PRICAI 2006: Trends in Artificial Intelligence. PRICAI 2006. Lecture Notes in Computer Science(), vol 4099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36668-3_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-36668-3_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36667-6
Online ISBN: 978-3-540-36668-3
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
- Independent Component Analysis
- Speech Recognition
- Speech Signal
- Independent Component Analysis
- Blind Source Separation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
