Recognition of Simultaneous Speech by Estimating Reliability of Separated Signals for Robot Audition

Yamamoto, Shun’ichi; Takeda, Ryu; Nakadai, Kazuhiro; Nakano, Mikio; Tsujino, Hiroshi; Valin, Jean-Marc; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

doi:10.1007/978-3-540-36668-3_52

Shun’ichi Yamamoto²⁰,
Ryu Takeda²⁰,
Kazuhiro Nakadai²¹,
Mikio Nakano²¹,
Hiroshi Tsujino²¹,
Jean-Marc Valin²²,
Kazunori Komatani²⁰,
Tetsuya Ogata²⁰ &
…
Hiroshi G. Okuno²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4099))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1850 Accesses
3 Citations

Abstract

“Listening to several things at once” is a people’s dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (Netherlands)

eBook: EUR 139.09; Price includes VAT (Netherlands)

Softcover Book: EUR 185.29; Price includes VAT (Netherlands)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Article 21 April 2022

Evaluation of a Multi-speaker System for Socially Assistive HRI in Real Scenarios

A Comparative Analysis of Real Time Open-Source Speech Recognition Tools for Social Robots

References

Okuno, H.G., Nakatani, T., Kawabata, T.: Interfacing sound stream segregation to speech recognition systems — preliminary results of listening to several things at the same time. In: Proceedings of 13th National Conference on Artificial Intelligence (AAAI 1996), pp. 1082–1089 (1996)
Google Scholar
Okuno, H.G., Nakatani, T., Kawabata, T.: Understanding three simultaneous speakers. In: Proc. of IJCAI 1997, pp. 30–35 (1997)
Google Scholar
AURORA: http://www.elda.fr/proj/aurora1.html , http://www.elda.fr/proj/aurora2.html
Pearce, D.: Developing the ETSI AURORA advanced distributed speech recognition front-end & what next. In: Proc. of Eurospeech 2001, ESCA (2001)
Google Scholar
Lippmann, R.P., Martin, E.A., Paul, D.B.: Multi-style training for robust isolated-word speech recognition. In: Proc. of ICASSP 1987, pp. 705–708. IEEE, Los Alamitos (1987)
Google Scholar
Blanchet, M., Boudy, J., Lockwood, P.: Environment adaptation for speech recognition in noise. In: Proc. of EUSIPCO 1992, vol. VI, pp. 391–394 (1992)
Google Scholar
Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. Signal Processing Magazine 22(5), 101–116 (2005)
Article Google Scholar
Barker, J., Cooke, M., Green, P.: Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise. In: Proc. of Eurospeech-2001, ESCA, pp. 213–216 (2001)
Google Scholar
Cooke, M.P., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34(3), 267–285 (2000)
Article Google Scholar
Renevey, P., Vetter, R., Kraus, J.: Robust speech recognition using missing feature theory and vector quantization. In: Proc. of 7th European Conference on Speech Communication Technology (Eurospeech 2001), vol. 2, pp. 1107–1110. ESCA (2001)
Google Scholar
Yamamoto, S., Valin, J.M., Nakadai, K., Ogata, T., Okuno, H.G.: Enhanced robot speech recognition based on microphone array source separation and missing feature theory. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA 2005), pp. 1489–1494. IEEE, Los Alamitos (2005)
Google Scholar
Yamamoto, S., Nakadai, K., Valin, J.M., Rouat, J., Michaud, F., Komatani, K., Ogata, T., Okuno, H.G.: Making a robot recognize three simultaneous sentences in real-time. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), pp. 897–902. IEEE, Los Alamitos (2005)
Google Scholar
Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice Hall, Inc., Englewood Cliffs (2002)
Google Scholar
Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Inc., Chichester (2001)
Book Google Scholar
Nakadai, K., Okuno, H.G., Kitano, H.: Robot recognizes three simultaneous speech by active audition. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA-2003), pp. 398–403. IEEE, Los Alamitos (2003)
Google Scholar
Julius, M.: http://www.furui.cs.titech.ac.jp/mband_julius/
Kawahara, T., Lee, A.: Free software toolkit for japanese large vocabulary continuous speech recognition. In: International Conference on Spoken Language Processing (ICSLP), vol. 4, pp. 476–479 (2000)
Google Scholar
Murata, N.: An approach to blind source separation based on temporal structure of speech signals. In: Neurocomputing, pp. 1–24 (2001)
Google Scholar
Nakadai, K., Hidai, K., Mizoguchi, H., Okuno, H.G., Kitano, H.: Real-time auditory and visual multiple-object tracking for robots. In: Proc. of IJCAI 2001, pp. 1424–1432 (2001)
Google Scholar
Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J.M., Takeda, R., Komatani, K., Ogata, T., Okuno, H.G.: Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 207–217. Springer, Heidelberg (2006)
Chapter Google Scholar
Parra, L.C., Alvino, C.V.: Geometric source separation: Mergin convolutive source separation with geometric beamforming. IEEE Transactions on Speech and Audio Processing 10(6), 352–362 (2002)
Article Google Scholar
Valin, J.M., Rouat, J., Michaud, F.: Enhanced robot audition based on microphone array source separation with post-filter. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2004), pp. 2123–2128. IEEE, Los Alamitos (2004)
Google Scholar
Cohen, I., Berdugo, B.: Microphone array post-filtering for non-stationary noise suppression. In: ICASSP 2002, pp. 901–904 (2002)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-33(2), 443–445 (1985)
Article Google Scholar
Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Processing 81(2), 2403–2418 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Japan
Shun’ichi Yamamoto, Ryu Takeda, Kazunori Komatani, Tetsuya Ogata & Hiroshi G. Okuno
HONDA Research Institute Japan Co., Ltd., Japan
Kazuhiro Nakadai, Mikio Nakano & Hiroshi Tsujino
CSIRO ICT Centre, Australia
Jean-Marc Valin

Authors

Shun’ichi Yamamoto
View author publications
Search author on:PubMed Google Scholar
Ryu Takeda
View author publications
Search author on:PubMed Google Scholar
Kazuhiro Nakadai
View author publications
Search author on:PubMed Google Scholar
Mikio Nakano
View author publications
Search author on:PubMed Google Scholar
Hiroshi Tsujino
View author publications
Search author on:PubMed Google Scholar
Jean-Marc Valin
View author publications
Search author on:PubMed Google Scholar
Kazunori Komatani
View author publications
Search author on:PubMed Google Scholar
Tetsuya Ogata
View author publications
Search author on:PubMed Google Scholar
Hiroshi G. Okuno
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

The Hong Kong University of Science and Technology,, Hong Kong
Qiang Yang
Clayton School of Information Technology, Monash University, P.O. Box, Australia
Geoff Webb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamamoto, S. et al. (2006). Recognition of Simultaneous Speech by Estimating Reliability of Separated Signals for Robot Audition. In: Yang, Q., Webb, G. (eds) PRICAI 2006: Trends in Artificial Intelligence. PRICAI 2006. Lecture Notes in Computer Science(), vol 4099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36668-3_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-36668-3_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36667-6
Online ISBN: 978-3-540-36668-3
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics