{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:27:08Z","timestamp":1773844028784,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2023,7,16]],"date-time":"2023-07-16T00:00:00Z","timestamp":1689465600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Analysis and Investigation of Emirati-Accented Corpus in Emotional and Stressful Talking Environments for Speaker and Emotion Recognition based on Capsule Networks","award":["23020403251"],"award-info":[{"award-number":["23020403251"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The Arabic language has always been an immense source of attraction to various people from different ethnicities by virtue of the significant linguistic legacy that it possesses. Consequently, a multitude of people from all over the world are yearning to learn it. However, people from different mother tongues and cultural backgrounds might experience some hardships regarding articulation due to the absence of some particular letters only available in the Arabic language, which could hinder the learning process. As a result, a speaker-independent and text-dependent efficient system that aims to detect articulation disorders was implemented. In the proposed system, we emphasize the prominence of \u201cspeech signal processing\u201d in diagnosing Arabic mispronunciation using the Mel-frequency cepstral coefficients (MFCCs) as the optimum extracted features. In addition, long short-term memory (LSTM) was also utilized for the classification process. Furthermore, the analytical framework was incorporated with a gender recognition model to perform two-level classification. Our results show that the LSTM network significantly enhances mispronunciation detection along with gender recognition. The LSTM models attained an average accuracy of 81.52% in the proposed system, reflecting a high performance compared to previous mispronunciation detection systems.<\/jats:p>","DOI":"10.3390\/info14070413","type":"journal-article","created":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T00:56:47Z","timestamp":1689555407000},"page":"413","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Arabic Mispronunciation Recognition System Using LSTM Network"],"prefix":"10.3390","volume":"14","author":[{"given":"Abdelfatah","family":"Ahmed","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Eng., Khalifa University of Science Technology and Research, Abu Dhabi 127788, United Arab Emirates"}]},{"given":"Mohamed","family":"Bader","sequence":"additional","affiliation":[{"name":"Department of Electrical Eng., University of Sharjah, Sharjah 27272, United Arab Emirates"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7856-9342","authenticated-orcid":false,"given":"Ismail","family":"Shahin","sequence":"additional","affiliation":[{"name":"Department of Electrical Eng., University of Sharjah, Sharjah 27272, United Arab Emirates"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1570-0897","authenticated-orcid":false,"given":"Ali Bou","family":"Nassif","sequence":"additional","affiliation":[{"name":"Department of Computer Eng., University of Sharjah, Sharjah 27272, United Arab Emirates"}]},{"given":"Naoufel","family":"Werghi","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Eng., Khalifa University of Science Technology and Research, Abu Dhabi 127788, United Arab Emirates"}]},{"given":"Mohammad","family":"Basel","sequence":"additional","affiliation":[{"name":"Department of Computer Eng., University of Sharjah, Sharjah 27272, United Arab Emirates"}]}],"member":"1968","published-online":{"date-parts":[[2023,7,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Calik, S.S., Kucukmanisa, A., and Kilimci, Z.H. (2023). An ensemble-based framework for mispronunciation detection of Arabic phonemes. arXiv.","DOI":"10.1109\/INISTA55318.2022.9894215"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Fu, P., Liu, D., and Yang, H. (2022). LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition. Information, 13.","DOI":"10.3390\/info13050250"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ye, W., Mao, S., Soong, F., Wu, W., Xia, Y., Tien, J., and Wu, Z. (2022, January 23\u201327). An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (Apl) Embeddings. Proceedings of the ICASSP 2022\u20142022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746604"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1109\/TASLP.2016.2621675","article-title":"Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks","volume":"25","author":"Li","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.specom.2019.06.003","article-title":"Anomaly detection based pronunciation verification approach using speech attribute features","volume":"111","author":"Shahin","year":"2019","journal-title":"Speech Commun."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"31","DOI":"10.5815\/ijigsp.2018.04.04","article-title":"A Dataset for Speech Recognition to Support Arabic Phoneme Pronunciation","volume":"10","author":"Arafa","year":"2018","journal-title":"Int. J. Image Graph. Signal Process."},{"key":"ref_7","first-page":"190","article-title":"Comparison between Features Extraction Techniques for Impairments Arabic Speech","volume":"27","author":"Shareef","year":"2022","journal-title":"Al-Rafidain Eng. J."},{"key":"ref_8","first-page":"818","article-title":"On preprocessing of speech signals","volume":"35","author":"Keerio","year":"2009","journal-title":"World Acad. Sci. Eng. Technol."},{"key":"ref_9","first-page":"186","article-title":"Preprocessing technique in automatic speech recognition for human computer interaction: An overview","volume":"15","author":"Ibrahim","year":"2017","journal-title":"Ann. Comput. Sci. Ser."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kaur, M., and Mohta, A. (2019, January 27\u201329). A Review of Deep Learning with Recurrent Neural Network. Proceedings of the 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India.","DOI":"10.1109\/ICSSIT46314.2019.8987837"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Hassan, A., Shahin, I., and Alsabek, M.B. (2020, January 3\u20135). COVID-19 Detection System using Recurrent Neural Networks. Proceedings of the 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), Sharjah, United Arab Emirates.","DOI":"10.1109\/CCCI49893.2020.9256562"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"19143","DOI":"10.1109\/ACCESS.2019.2896880","article-title":"Speech Recognition Using Deep Neural Networks: A Systematic Review","volume":"7","author":"Nassif","year":"2019","journal-title":"IEEE Access"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"235","DOI":"10.2478\/jaiscr-2019-0006","article-title":"Performance Evaluation of Deep neural networks Applied to Speech Recognition: Rnn, LSTM and GRU","volume":"9","author":"Shewalkar","year":"2019","journal-title":"J. Artif. Intell. Soft Comput. Res."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Amberkar, A., Awasarmol, P., Deshmukh, G., and Dave, P. (2018, January 1\u20133). Speech Recognition using Recurrent Neural Networks. Proceedings of the 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), Coimbatore, India.","DOI":"10.1109\/ICCTCT.2018.8551185"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Geiger, J.T., Zhang, Z., Weninger, F., Schuller, B., and Rigoll, G. (2014, January 14\u201318). Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. Proceedings of the Annual Conference on the International Speech Communication Association (Interspeech 2014), Singapore.","DOI":"10.21437\/Interspeech.2014-151"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1016\/j.dsp.2012.10.008","article-title":"Acoustic classification and segmentation using modified spectral roll-off and variance-based features","volume":"23","author":"Kos","year":"2013","journal-title":"Digit. Signal Process."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1007\/s10772-018-9502-0","article-title":"Emirati-accented speaker identification in each of neutral and shouted talking environments","volume":"21","author":"Shahin","year":"2018","journal-title":"Int. J. Speech Technol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1016\/j.engappai.2014.07.006","article-title":"Novel third-order hidden Markov models for speaker identification in shouted talking environments","volume":"35","author":"Shahin","year":"2014","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_19","unstructured":"Shahin, I. (2008, January 18\u201320). Using emotions to identify speakers. Proceedings of the 5th International Workshop on Signal Processing and Its Applications (WoSPA 2008), Sharjah, United Arab Emirates."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/s10772-011-9089-1","article-title":"Identifying Speakers Using Their Emotion Cues","volume":"14","author":"Shahin","year":"2011","journal-title":"Int. J. Speech Technol."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"2575","DOI":"10.1007\/s00521-018-3760-2","article-title":"Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments","volume":"32","author":"Shahin","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Alsabek, M.B., Shahin, I., and Hassan, A. (2020, January 3\u20135). Studying the Similarity of COVID-19 Sounds based on Correlation Analysis of MFCC. Proceedings of the 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), Sharjah, United Arab Emirates.","DOI":"10.1109\/CCCI49893.2020.9256700"},{"key":"ref_23","first-page":"197","article-title":"Analysis of feature extraction techniques for speech recognition system","volume":"8","author":"Ranjan","year":"2019","journal-title":"Int. J. Innov. Technol. Explor. Eng."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.specom.2009.08.009","article-title":"An overview of text-independent speaker recognition: From features to supervectors","volume":"52","author":"Kinnunen","year":"2010","journal-title":"Speech Commun."},{"key":"ref_25","unstructured":"Atrey, P.K., Maddage, N.C., and Kankanhalli, M.S. (2006, January 14\u201319). Audio based event detection for multimedia surveillance. Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, France."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ayoub, B., Jamal, K., and Arsalane, Z. (April, January 30). Gammatone frequency cepstral coefficients for speaker identification over VoIP networks. Proceedings of the 2016 International Conference on Information Technology for Organizations Development (IT4OD), Fez, Morocco.","DOI":"10.1109\/IT4OD.2016.7479293"},{"key":"ref_27","unstructured":"Liashchynskyi, P., and Liashchynskyi, P. (2019). Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS. arXiv."},{"key":"ref_28","unstructured":"Sokolova, M., Japkowicz, N., and Szpakowicz, S. (2006, January 4\u20138). Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. Proceedings of the 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia. WS-06-06."},{"key":"ref_29","unstructured":"Bahador, M., and Ahmed, W. (2018). The Accuracy of the LSTM Model for Predicting the S&P 500 Index and the Difference between Prediction and Backtesting. [Bachelor\u2019s Thesis, KTH Royal Institute of Technology]."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Azzouni, A., and Pujolle, G. (2017). A long short-term memory recurrent neural network framework for network traffic matrix prediction. arXiv.","DOI":"10.1109\/NOMS.2018.8406199"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/7\/413\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:13:02Z","timestamp":1760127182000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/7\/413"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,16]]},"references-count":30,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["info14070413"],"URL":"https:\/\/doi.org\/10.3390\/info14070413","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,16]]}}}