Skip to main content

Recognition of Simultaneous Speech by Estimating Reliability of Separated Signals for Robot Audition

  • Conference paper
PRICAI 2006: Trends in Artificial Intelligence (PRICAI 2006)

Abstract

Listening to several things at once” is a people’s dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from €37.37 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
EUR 29.95
Price includes VAT (Netherlands)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 139.09
Price includes VAT (Netherlands)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 185.29
Price includes VAT (Netherlands)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Okuno, H.G., Nakatani, T., Kawabata, T.: Interfacing sound stream segregation to speech recognition systems — preliminary results of listening to several things at the same time. In: Proceedings of 13th National Conference on Artificial Intelligence (AAAI 1996), pp. 1082–1089 (1996)

    Google Scholar 

  2. Okuno, H.G., Nakatani, T., Kawabata, T.: Understanding three simultaneous speakers. In: Proc. of IJCAI 1997, pp. 30–35 (1997)

    Google Scholar 

  3. AURORA: http://www.elda.fr/proj/aurora1.html , http://www.elda.fr/proj/aurora2.html

  4. Pearce, D.: Developing the ETSI AURORA advanced distributed speech recognition front-end & what next. In: Proc. of Eurospeech 2001, ESCA (2001)

    Google Scholar 

  5. Lippmann, R.P., Martin, E.A., Paul, D.B.: Multi-style training for robust isolated-word speech recognition. In: Proc. of ICASSP 1987, pp. 705–708. IEEE, Los Alamitos (1987)

    Google Scholar 

  6. Blanchet, M., Boudy, J., Lockwood, P.: Environment adaptation for speech recognition in noise. In: Proc. of EUSIPCO 1992, vol. VI, pp. 391–394 (1992)

    Google Scholar 

  7. Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. Signal Processing Magazine 22(5), 101–116 (2005)

    Article  Google Scholar 

  8. Barker, J., Cooke, M., Green, P.: Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise. In: Proc. of Eurospeech-2001, ESCA, pp. 213–216 (2001)

    Google Scholar 

  9. Cooke, M.P., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34(3), 267–285 (2000)

    Article  Google Scholar 

  10. Renevey, P., Vetter, R., Kraus, J.: Robust speech recognition using missing feature theory and vector quantization. In: Proc. of 7th European Conference on Speech Communication Technology (Eurospeech 2001), vol. 2, pp. 1107–1110. ESCA (2001)

    Google Scholar 

  11. Yamamoto, S., Valin, J.M., Nakadai, K., Ogata, T., Okuno, H.G.: Enhanced robot speech recognition based on microphone array source separation and missing feature theory. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA 2005), pp. 1489–1494. IEEE, Los Alamitos (2005)

    Google Scholar 

  12. Yamamoto, S., Nakadai, K., Valin, J.M., Rouat, J., Michaud, F., Komatani, K., Ogata, T., Okuno, H.G.: Making a robot recognize three simultaneous sentences in real-time. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), pp. 897–902. IEEE, Los Alamitos (2005)

    Google Scholar 

  13. Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice Hall, Inc., Englewood Cliffs (2002)

    Google Scholar 

  14. Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Inc., Chichester (2001)

    Book  Google Scholar 

  15. Nakadai, K., Okuno, H.G., Kitano, H.: Robot recognizes three simultaneous speech by active audition. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA-2003), pp. 398–403. IEEE, Los Alamitos (2003)

    Google Scholar 

  16. Julius, M.: http://www.furui.cs.titech.ac.jp/mband_julius/

  17. Kawahara, T., Lee, A.: Free software toolkit for japanese large vocabulary continuous speech recognition. In: International Conference on Spoken Language Processing (ICSLP), vol. 4, pp. 476–479 (2000)

    Google Scholar 

  18. Murata, N.: An approach to blind source separation based on temporal structure of speech signals. In: Neurocomputing, pp. 1–24 (2001)

    Google Scholar 

  19. Nakadai, K., Hidai, K., Mizoguchi, H., Okuno, H.G., Kitano, H.: Real-time auditory and visual multiple-object tracking for robots. In: Proc. of IJCAI 2001, pp. 1424–1432 (2001)

    Google Scholar 

  20. Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J.M., Takeda, R., Komatani, K., Ogata, T., Okuno, H.G.: Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 207–217. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Parra, L.C., Alvino, C.V.: Geometric source separation: Mergin convolutive source separation with geometric beamforming. IEEE Transactions on Speech and Audio Processing 10(6), 352–362 (2002)

    Article  Google Scholar 

  22. Valin, J.M., Rouat, J., Michaud, F.: Enhanced robot audition based on microphone array source separation with post-filter. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2004), pp. 2123–2128. IEEE, Los Alamitos (2004)

    Google Scholar 

  23. Cohen, I., Berdugo, B.: Microphone array post-filtering for non-stationary noise suppression. In: ICASSP 2002, pp. 901–904 (2002)

    Google Scholar 

  24. Ephraim, Y., Malah, D.: Speech enhancement using minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-33(2), 443–445 (1985)

    Article  Google Scholar 

  25. Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Processing 81(2), 2403–2418 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yamamoto, S. et al. (2006). Recognition of Simultaneous Speech by Estimating Reliability of Separated Signals for Robot Audition. In: Yang, Q., Webb, G. (eds) PRICAI 2006: Trends in Artificial Intelligence. PRICAI 2006. Lecture Notes in Computer Science(), vol 4099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36668-3_52

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics