Abstract
Emotions are a vital semantic part of human correspondence. Emotions are significant for human correspondence as well as basic for human–computer cooperation. Viable correspondence between people is possibly achieved when both the importance and the emotion of the correspondence are perceived by all groups included. Understanding the significance of language has generally been concentrated on in natural language processing (NLP) as a semantic examination. In NLP, the text can be handled appropriately for classification. Emotion detection from facial emotion is the subfield of social signal processing applied in a wide assortment of regions, explicitly for human and PC collaboration. Many researchers have proposed various approaches, generally utilizing machine learning concepts. Automatic emotion recognition (AER) is significant for working with consistent intuitiveness between a person and a smart device toward fully acknowledging an intelligent society. Many researchers examined cross-lingual and multilingual speech emotion as a stage toward language-free emotion acknowledgment in natural speech. In the present work, we are proposing a deep learning-based AER system using four openly accessible datasets, namely Basic Arabic Vocal Emotions Dataset (BAVED), Acted Emotional Speech Dynamic Database (AESDD), Urdu written in Latin/Roman Script (URDU), and Toronto Emotional Speech Set (TESS), by utilizing the Jupyter notebook and a Python library for music and audio synthesis named Librosa. The experimental results exhibited that the proposed approach achieves better than the existing approaches, i.e., the accuracy of the proposed system with the URDU dataset is 96.24%, the TESS dataset is 99.10%, the AESDD dataset is 65.97%, and the BAVED dataset is 73.12%.

























Similar content being viewed by others
Change history
23 April 2026
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s11042-026-21620-z
References
Er MB (2020) A novel approach for classification of speech emotions based on deep and acoustic features. IEEE Access 8:221640–221653
Zvarevashe K, Olugbara O (2020) Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms 13(3):70
Hesam Sagha, Pavel Matejka, et al., Enhancing multilingual recognition of emotion in speech by language identification, In 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), pp. 2949–2953.
Bo-Chang Chiou and Chia-Ping Chen, Speech emotion recognition with cross-lingual databases, In 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), pp. 558–561.
Je Hun Jeon, Duc Le, et al., A preliminary study of cross-lingual emotion recognition from speech: automatic classification versus human perception, In 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), pp. 2837–2840.
Neumann, M. (2018, April). Cross-lingual and multilingual speech emotion recognition on english and french. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 5769–5773). IEEE.
Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G (2010) Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Trans Affect Comput 1(2):119–131
Wikipedia contributors. (2022). Urdu. Wikipedia. Retrieved August 24, 2022, from https://en.wikipedia.org/wiki/Urdu
Wu CH, Lin JC, Wei WL (2014) Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Transactions on Signal and Information Processing 3
Latif S, Qayyum A, Usman M, Qadir J (2018) Cross lingual speech emotion recognition: Urdu vs. western languages. In 2018 International Conference on Frontiers of Information Technology (FIT) (pp. 88–93). IEEE
Aspandi D, Sukno F, Schuller B, Binefa X (2021) An enhanced adversarial network with combined latent features for spatio-temporal facial affect estimation in the wild. arXiv preprint arXiv:2102.09150
Zhang Z, Xu S, Cao S, Zhang S (2018) Deep convolutional neural network with mixup for environmental sound classification. In Chinese Conference on Pattern Recognition and Computer Vision (prcv) (pp. 356–367). Springer, Cham
Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A (2009) Acoustic emotion recognition: A benchmark comparison of performances. In 2009 IEEE Workshop on Automatic Speech Recognition & Understanding (pp. 552–557). IEEE.
Mohamed O, Aly SA (2021) Arabic Speech Emotion Recognition Employing Wav2vec2. 0 and HuBERT Based on BAVED Dataset. arXiv preprint arXiv:2110.04425
Alnuaim AA, Zakariah M, Shukla PK, Alhadlaq A, Hatamleh WA, Tarazi H, ... Ratna R (2022) Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier. J Healthcare Eng 2022;6005446
Senthilkumar N, Karpakam S, Devi MG, Balakumaresan R, Dhilipkumar P (2022) Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks. Mater Today: Proc 57:2180–2184
Li LQ, Xie K, Guo XL, Wen C, He JB (2022) Emotion recognition from speech with StarGAN and Dense-DCNN. IET Signal Proc 16(1):62–79
Andayani F, Theng LB, Tsun MT, Chua C (2022) Hybrid LSTM-Transformer Model for Emotion Recognition from Speech Audio Files. IEEE Access 10:36018–36027
Sound. (2022) Science World. Retrieved August 24, 2022, from https://www.scienceworld.ca/resource/sound/
Urdu Emotion Dataset (2021) Kaggle. Retrieved August 24, 2022, from https://www.kaggle.com/datasets/kingabzpro/urdu-emotion-dataset
Toronto Emotional Speech Set (TESS). (2019). Kaggle. Retrieved August 24, 2022, from https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess
kingabzpro/Acted-Emotional-Speech-Dynamic-Database. (n.d.). DAGsHub. Retrieved August 24, 2022, from https://dagshub.com/kingabzpro/Acted-Emotional-Speech-Dynamic-Database
(n.d.). GitHub - 40uf411/Basic-Arabic-Vocal-Emotions-Dataset: Basic Arabic Vocal Emotions Dataset (BAVED) is a datasetthat contains an arabic words spelled in diffrent levels of emotions recorded in an audio/wav format. GitHub. Retrieved August 24, 2022, from https://github.com/40uf411/Basic-Arabic-Vocal-Emotions-Dataset
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) cikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
Pawar MD, Kokate RD (2021) Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimed Tools Appl 80(10):15563–15587
Jahangir R, Teh YW, Hanif F, Mujtaba G (2021) Deep learning approaches for speech emotion recognition: State of the art and research challenges. Multimed Tools Appl 80(16):23745–23812
Valstar MF, Pantic M (2010) Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the 3rd International Workshop on EMOTION (pp. 65–72)
Kaliouby RE, Robinson P (2005) Real-time inference of complex mental states from facial expressions and head gestures. In Proceedings of the 7th international conference on Multimodal interfaces (pp. 1–8)
Zhao L, Zhao Y, Zhang J, Zhang W (2018) Emotion recognition from EEG signals using deep learning with kernel methods. IEEE Trans Affect Comput 9(1):94–105
Liu F, Shen H, Shen Y, Cui L (2020) A survey on deep learning-based emotion recognition: Toward multimodal fusion. IEEE Transactions on Affective Computing 1–1
Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, ... Patras I (2012) DEAP: A database for emotion analysis; using physiological signals. IEEE Trans Affective Comput 3(1);18–31
Wang Y, Zhang Y, Ji Q, Zhang B (2020) Emotion recognition from physiological signals using a multimodal deep belief network. IEEE Trans Affect Comput 11(2):178–191
Singh P, Sahidullah M, Saha G (2023) Modulation spectral features for speech emotion recognition using deep neural networks. Speech Commun 146:53–69
Bashir MF, Javed AR, Arshad MU, Gadekallu TR, Shahzad W, Beg MO (2022) Context aware emotion detection from low resource urdu language using deep neural network. Transactions on Asian and Low-Resource Language Information Processing
Maheshwari D, Ghosh SK, Tripathy RK, Sharma M, Acharya UR (2021) Automated accurate emotion recognition system using rhythm-specific deep convolutional neural network technique with multi-channel EEG signals. Comput Biol Med 134:104428
Nakisa B, Rastgoo MN, Rakotonirainy A, Maire F, Chandran V (2020) Automatic emotion recognition using temporal multimodal deep learning. IEEE Access 8:225463–225474
Mehendale N (2020) Facial emotion recognition using convolutional neural networks (FERC). SN Appl Sci 2(3):446
Sharma LD, Bhattacharyya A (2021) A computerized approach for automatic human emotion recognition using sliding mode singular spectrum analysis. IEEE Sens J 21(23):26931–26940
Tiwari P, Darji AD (2022) A novel S-LDA features for automatic emotion recognition from speech using 1-D CNN. Int J Math, Eng Manag Sci 7(1):49
Singh L, Gupta P, Katarya R, Jayvant P (2020) Twitter data in Emotional Analysis-A study. In 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) (pp. 1301–1305). IEEE
Gupta A, Gupta S, Katarya R (2021) InstaCovNet-19: A deep learning classification model for the detection of COVID-19 patients using Chest X-ray. Appl Soft Comput 99:106859
Gupta G, Katarya R (2021) Research on understanding the effect of deep learning on user preferences. Arab J Sci Eng 46:3247–3286
Funding
No funding has been received for this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s11042-026-21620-z
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sujatha, R., Chatterjee, J.M., Pathy, B. et al. RETRACTED ARTICLE: Automatic emotion recognition using deep neural network. Multimed Tools Appl 84, 33633–33662 (2025). https://doi.org/10.1007/s11042-024-20590-4
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11042-024-20590-4

