Emirati-accented speaker identification in each of neutral and shouted talking environments

Shahin, Ismail; Nassif, Ali Bou; Bahutair, Mohammed

doi:10.1007/s10772-018-9502-0

Emirati-accented speaker identification in each of neutral and shouted talking environments

Published: 28 March 2018

Volume 21, pages 265–278, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Ismail Shahin¹,
Ali Bou Nassif¹ &
Mohammed Bahutair¹

230 Accesses
20 Citations
Explore all metrics

Abstract

This work is devoted to capturing Emirati-accented speech database (Arabic United Arab Emirates database) in each of neutral and shouted talking environments in order to study and enhance text-independent Emirati-accented “speaker identification performance in shouted environment” based on each of “first-order circular suprasegmental hidden Markov models (CSPHMM1s), second-order circular suprasegmental hidden Markov models (CSPHMM2s), and third-order circular suprasegmental hidden Markov models (CSPHMM3s)” as classifiers. In this research, our database was collected from 50 Emirati native speakers (25 per gender) uttering eight common Emirati sentences in each of neutral and shouted talking environments. The extracted features of our collected database are called “Mel-Frequency Cepstral Coefficients (MFCCs)”. Our results show that average Emirati-accented speaker identification performance in neutral environment is 94.0, 95.2, and 95.9% based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the average performance in shouted environment is 51.3, 55.5, and 59.3% based, respectively, on “CSPHMM1s, CSPHMM2s, and CSPHMM3s”. The achieved “average speaker identification performance in shouted environment based on CSPHMM3s” is very similar to that obtained in “subjective assessment by human listeners”.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

An experimental framework for Arabic digits speech recognition in noisy environments

Article 03 February 2017

Advanced Arabic Speech Recognition Through Dilated Convolutional LSTM Networks

Person-Dependent and Person-Independent Arabic Speech Recognition System

References

Al-Dahri, S. S., Al-Jassar, Y. H., Alotaibi, Y. A., Alsulaiman, M. M., & Abdullah-Al-Mamun, K. A. (2008). A word-dependent automatic Arabic speaker identification system. In signal processing and information technology (ISSPIT 2008) (pp. 198–202).
Campbell, W. M., Campbell, J. R., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech and Language, 20, 210–229.
Article Google Scholar
Casale, S., Russo, A., & Serano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10), 801–810.
Article Google Scholar
Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech and Language Processing, 18(1), 90–100.
Article Google Scholar
Farrell, K. R., Mammone, R. J., & Assaleh, K. T. (1994). Speaker recognition using neural networks and conventional classifiers. IEEE Transactions on Speech and Audio Processing, 2, 194–205.
Article Google Scholar
Furui, S. (1991). Speaker-dependent-feature-extraction, recognition and processing techniques. Speech Communication, 10, 505–520.
Article Google Scholar
Grozdić, I. T., Jovičić, S. T., & Subotić, M. (2017). Whispered speech recognition using deep denoising autoencoder. Engineering Applications of Artificial Intelligence, 59, 15–22.
Article Google Scholar
Hong, Q. Y., & Kwong, S. (2005). A genetic classification method for speaker recognition. Engineering Applications of Artificial Intelligence, 18, 13–19.
Article Google Scholar
https://catalog.ldc.upenn.edu/LDC2002S02 (West Point Arabic Speech).
https://catalog.ldc.upenn.edu/LDC2006S43 (Gulf Arabic Conversational Telephone Speech)
Kinnunen, T., Karpov, E., & Franti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 277–288.
Article MATH Google Scholar
Kinnunen, T., & Li, H. (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., Vergyri, D. (2003) Novel approaches to Arabic speech recognition: Report from the 2002 Johns-Hopkins workshop. In proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP) (vol. 1, 2003, pp. 344–347).
Krobba, A., Debyeche, M., Amrouche, A. (2010) Evaluation of speaker identification system using GSMEFR speech data, Proc. 2010 International Conference on Design & Technology of Integrated Systems in Nanoscale Era, Hammamet, March 2010, pp. 1–5.
Mahmood, A., Alsulaiman, M., & Muhammad, G. (2014) Automatic speaker recognition using multi directional local features (MDLF). Arabian Journal for Science and Engineering, 39(5), 3799–3811.
Article Google Scholar
Pavel, M., Ondrej, G., Ondrej, N., Oldrich, P., Frantisek, G., Lukas, B., & Jan, H. C. (2016). Analysis of DNN approaches to speaker identification. In International conference on acoustics, speech and signal processing 2016 (pp. 5100–5104).
Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In 2nd International Conference 1998, CMC, 1998.
Reynolds, D. A. (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108.
Article Google Scholar
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Article Google Scholar
Saeed, K., & Nammous, M. K. (2007). A speech-and-speaker identification system: Feature extraction, description, and classification of speech signal image. IEEE Transactions on Industrial Electrons, 54(2), 887–897.
Article Google Scholar
Shahin, I. (2011). Identifying speakers using their emotion cues. International Journal of Speech Technology, 14(2), 89–98. https://doi.org/10.1007/s10772-011-9089-1.
Article Google Scholar
Shahin, I. (2016). Emirati speaker verification based on HMM1s, HMM2s, and HMM3s. In 13th International Conference on Signal Processing (ICSP 2016), Chengdu, China, November 2016, pp. 562–567, https://doi.org/10.1109/ICSP.2016.7877896.
Shahin, I. (2006). Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Communication, 48(8), 1047–1055.
Article Google Scholar
Shahin, I. (2010). Employing second-order circular suprasegmental hidden Markov models to enhance speaker identification performance in shouted talking environments. EURASIP Journal on Audio, Speech, and Music Processing, 2010(1), 862138. https://doi.org/10.1155/2010/862138.
Article Google Scholar
Shahin, I. (2012). Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs. Journal on Multimodal User Interfaces, 6(1), 59–71. https://doi.org/10.1007/s12193-011-0082-4.
Article Google Scholar
Shahin, I. (2013). Gender-dependent emotion recognition based on HMMs and SPHMMs. International Journal of Speech Technology, 16(2), 133–141. https://doi.org/10.1007/s10772-012-9170-4.
Article Google Scholar
Shahin, I. (2014). Novel third-order hidden Markov models for speaker identification in shouted talking environments. Engineering Applications of Artificial Intelligence, 35, 316–323. https://doi.org/10.1016/j.engappai.2014.07.006.
Article Google Scholar
Shahin, I. (2016). Employing emotion cues to verify speakers in emotional talking environments. Journal of Intelligent Systems, Special Issue on Intelligent Healthcare Systems, 25(1), 3–17.
MathSciNet Google Scholar
Shahin, I. (2016) Speaker identification in a shouted talking environment based on novel third-order circular suprasegmental hidden Markov models. Circuits, Systems and Signal Processing, 35(10), 3770–3792. https://doi.org/10.1007/s00034-015-0220-4.
Article MathSciNet Google Scholar
Shahin, I., & Ba-Hutair, M. N. (2015). Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s. International Journal of Speech Technology, 18(1), 77–90. https://doi.org/10.1007/s10772-014-9251-7.
Article Google Scholar
Shahin, I. (2008). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708.
Article MATH Google Scholar
Shahin, I. (2013). Speaker identification in emotional talking environments based on CSPHMM2s. Engineering Applications of Artificial Intelligence, 26(7), 1652–1659.
Article Google Scholar
Shahin, I., & Ba-Hutair, M. N. (2014). Emarati speaker identification. In 12th International Conference on Signal Processing (ICSP 2014) (pp. 488–493). HangZhou, China.
Shahin, I., & Botros, N., Modeling and analyzing the vocal tract under normal and stressful talking conditions. In IEEE SOUTHEASTCON 2001., Clemson, March 2001, pp. 213–220.
Staroniewicz, P., & Majewski, W. (2004) SVM based text-dependent speaker identification for large set of voices. In 12th European Signal Processing Conference, EUSIPCO 2004, Vienna, Austria, September 2004, pp. 333–336.
Tolba, H. (2011). A high-performance text-independent speaker identification of Arabic speakers using a CHMM-based approach. Alexandria Engineering, 50, 43–47.
Article Google Scholar
Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech and Language Processing Journal, 20(5), 1608–1616.
Article Google Scholar
Zheng, C., & Yuan, B. Z. (1988). Text-dependent speaker identification using circular hidden Markov models. In IEEE International Conference on Acoustics, Speech and Signal Processing, S13.3, pp. 580–582.

Download references

Acknowledgements

The authors of this work wish to thank University of Sharjah for funding their work through the competitive research project entitled “Capturing, Studying, and Analyzing Arabic Emirati-Accented Speech Database in Stressful and Emotional Talking Environments for Different Applications”, No. 1602040349-P. The authors wish also to thank engineers Merah Al Suwaidi, Deema Al Rais, and Hannah Saud for capturing the Emirati-accented speech database.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Sharjah, P. O. Box 27272, Sharjah, United Arab Emirates
Ismail Shahin, Ali Bou Nassif & Mohammed Bahutair

Authors

Ismail Shahin
View author publications
Search author on:PubMed Google Scholar
Ali Bou Nassif
View author publications
Search author on:PubMed Google Scholar
Mohammed Bahutair
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ismail Shahin.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Research involving animal rights

This study does not involve any animal participants.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahin, I., Nassif, A.B. & Bahutair, M. Emirati-accented speaker identification in each of neutral and shouted talking environments. Int J Speech Technol 21, 265–278 (2018). https://doi.org/10.1007/s10772-018-9502-0

Download citation

Received: 03 September 2017
Accepted: 13 March 2018
Published: 28 March 2018
Version of record: 28 March 2018
Issue date: June 2018
DOI: https://doi.org/10.1007/s10772-018-9502-0

Keywords

Profiles

Ali Bou Nassif View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Emirati-accented speaker identification in each of neutral and shouted talking environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An experimental framework for Arabic digits speech recognition in noisy environments

Advanced Arabic Speech Recognition Through Dilated Convolutional LSTM Networks

Person-Dependent and Person-Independent Arabic Speech Recognition System

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Research involving animal rights

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now