HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation

Hassan, Emad S.; Neyazi, Badawi; Seddeq, H. S.; Mahmoud, Adel Zaghloul; Oshaba, Ahmed S.; El-Emary, Atef; Abd El‑Samie, Fathi E.

doi:10.1007/s11042-024-18231-x

HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation

Published: 24 February 2024

Volume 83, pages 77489–77508, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Emad S. Hassan ORCID: orcid.org/0000-0002-1840-4244¹,
Badawi Neyazi²,
H. S. Seddeq³,
Adel Zaghloul Mahmoud⁴,
Ahmed S. Oshaba¹,
Atef El-Emary¹ &
…
Fathi E. Abd El‑Samie ORCID: orcid.org/0000-0001-8749-9518^5,6

140 Accesses
1 Citation
Explore all metrics

Abstract

In the realm of speaker identification, pitch frequency serves as a fundamental feature. However, this feature can be compromised when a speaker records his speech in a closed room, resulting in distorted signal features. This distortion not only reduces the effectiveness of speaker identification systems, but also opens the door for potential deception by hackers who exploit the reverberation effects in closed rooms. To address this concern, the correction of estimated pitch frequencies emerges as an essential step for the success of speaker identification systems. This paper presents a Hybrid Approach for Estimating Pitch Frequency (HAEPF) that integrates both the Zero Crossing Rate (ZCR) and Auto-Correlation Function (ACF) methods. Furthermore, the paper delves into the modeling of reverberant speech using comb filtering, shedding light on how multiple reflections impact the accuracy of pitch frequency estimation. Several simulation experiments were conducted to assess pitch frequency estimation for speech signals, both in the presence and absence of reverberation. The estimation errors were calculated for all three scenarios of reverberation (mild, moderate, and severe). The results clearly indicate that as the degree of reverberation, characterized by the comb filter order, increases, the pitch frequency estimation error also increases. The estimation accuracy of the proposed approach is calculated in terms of Pitch Frequency Estimation Error (PFEE), Gross Pitch Error (GPE) and Octave Error (OER) and is compared with those of several established pitch frequency estimation methods. The proposed approach exhibits a notable enhancement even in noisy environments, reducing PFEE by 43%, and achieving GPE and OER of less than 0.3 and 0.12, respectively, at a Signal-to-Noise Ratio (SNR) of 0 dB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 7

Mitigate the reverberation effect on the speaker verification performance using different methods

Article 18 November 2020

Using combined features to improve speaker verification in the face of limited reverberant data

Article 01 September 2023

Separation of Reverberant Speech Based on Computational Auditory Scene Analysis

Article 01 November 2018

Data availability

Not applicable.

References

Küsel ET, Siderius M (2019) Comparison of propagation models for the characterization of sound pressure fields. IEEE J Oceanic Eng 44(3):598–610. https://doi.org/10.1109/JOE.2018.2884107
Article Google Scholar
Hu Y, Tang J, Zhou H (2018) "A method of sound propagation loss calculation based on Gaussian beams," 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, pp. 1–4. https://doi.org/10.1109/WCSP.2018.8555716
Zhang L, Li XY, Meng CX (2020) "Modeling of high frequency sound propagation characteristics in Shallow Sea," 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Macau, China, pp. 1–4. https://doi.org/10.1109/ICSPCC50002.2020.9259498
Zhou J, Zhang L, He W, Zheng L (2022) "Parameter analysis affecting the characteristics of sound insulation of gradient U-shaped groove structure," 2022 4th International Conference on Intelligent Control, Measurement and Signal Processing (ICMSP), Hangzhou, China, pp. 130–133. https://doi.org/10.1109/ICMSP55950.2022.9859054
Liu Z, Li Y, Huang R (2021) "Analysis of vibration and sound field evaluation and simulation method of main sound source equipment in substation," 2021 IEEE 4th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, pp. 220–223. https://doi.org/10.1109/ICISCAE52414.2021.9590767
Lou W, Jin Z, Zhang C, Hou A, Wang W, Ding L (2023) "Analysis of primary frequency response based on overspeed and pitch control reserve and coordinated control strategy," 2023 IEEE International Conference on Power Science and Technology (ICPST), Kunming, China, pp. 193–198. https://doi.org/10.1109/ICPST56889.2023.10164944
Peng F, McKay CM, Mao D, Hou W, Innes-Brown H (2019) "Cortical pitch response components correlate with the pitch salience of resolved and unresolved components of Mandarin tones," 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, pp. 4682-4685.https://doi.org/10.1109/EMBC.2019.8856565
Lin S (2019) Robust pitch estimation and tracking for speakers based on subband encoding and the generalized labeled multi-bernoulli filter. IEEE/ACM Trans Audio, Speech, Lang Process 27(4):827–841. https://doi.org/10.1109/TASLP.2019.2898818
Article Google Scholar
Wei W, Li P, Yu Y, Li W (2022) "HarmoF0: Logarithmic scale dilated convolution for pitch estimation," 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, pp. 1–6. https://doi.org/10.1109/ICME52920.2022.9858935
Lai JJ, Townsend J (2022) "Developing a noise canceling device for ranged sound suppression," 2022 IEEE Integrated STEM Education Conference (ISEC), Princeton, NJ, USA, pp. 413-413.https://doi.org/10.1109/ISEC54952.2022.10025054
Azarov E, Vashkevich M, Petrovsky A (2012) "Instantaneous pitch estimation based on RAPT framework," 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, pp. 2787-2791
De Cheveigné A, Kawahara H (2002) YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am 111(4):1917–1930
Article Google Scholar
Mauch M, Dixon S (2014) PYIN: A fundamental frequency estimator using probabilistic threshold distributions. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy. pp 659–663. https://doi.org/10.1109/ICASSP.2014.6853678
Chapter Google Scholar
Nakai T, Rachman L, Arias Sarah P, Okanoya K, Aucouturier JJ (2023) Algorithmic voice transformations reveal the phonological basis of language-familiarity effects in cross-cultural emotion judgments. PLoS One 18(5):e0285028. https://doi.org/10.1371/journal.pone.0285028
Article Google Scholar
Kopf LM, Jackson-Menaldi C, Rubin AD, Skeffington J, Hunter EJ, Skowronski MD, Shrivastav R (2017) Pitch strength as an outcome measure for treatment of dysphonia. J Voice 31(6):691–696. https://doi.org/10.1016/j.jvoice.2017.01.016
Article Google Scholar
Guglani J, Mishra AN (2020) Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit. Appl Acoust 167:107386
Article Google Scholar
Xu S, Shimodaira H (2019) Direct F0 estimation with neural-network-based regression. Interspeech 1995–1999. https://api.semanticscholar.org/CorpusID:202714159
Kim JW, Salamon J, Li P, Bello JP (2018) Crepe: A Convolutional Representation for Pitch Estimation. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada. pp 161–165. https://doi.org/10.1109/ICASSP.2018.8461329
Chapter Google Scholar
Dong M, Wu J, Luan J (2019) Vocal pitch extraction in polyphonic music using convolutional residual network. In: 20th Annual Conference of the International Speech Communication Association. pp 2010–2014. http://dx.doi.org/10.21437/Interspeech.2019-2286
Google Scholar
Hung YC, Chen P-H, Ding J-J (2023) "Pitch estimation by denoising preprocessor and hybrid estimation model," 2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), PingTung, Taiwan, pp. 781–782. https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226907
Khadem-hosseini M, Ghaemmaghami S, Abtahi A, Gazor S, Marvasti F (2020) Error correction in pitch detection using a deep learning based classification. IEEE/ACM Trans Audio, Speech, Lang Process 28:990–999. https://doi.org/10.1109/TASLP.2020.2977472
Article Google Scholar
Chhetri AR, Kumar K, Muthyala MP, Shreyas MR, Bangalore RA (2023) "Carnatic music identification of Melakarta ragas through machine and deep learning using audio signal processing," 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, pp. 1-5.https://doi.org/10.1109/INCET57972.2023.10170568
Zhang C, et al (2021) "Denoispeech: denoising text to speech with frame-level noise modeling," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, pp. 7063–7067. https://doi.org/10.1109/ICASSP39728.2021.9413934
Nayem KM, Williamson DS (2021) "Towards An ASR approach using acoustic and language models for speech enhancement," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, pp. 7123–7127. https://doi.org/10.1109/ICASSP39728.2021.9414565
Black D, Rapos EJ, Stephan M (2019) "Voice-driven modeling: software modeling using automated speech recognition," 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C), Munich, Germany, pp. 252–258. https://doi.org/10.1109/MODELS-C.2019.00040
Pal S (2012) Speech signal processing: non-linear energy operator centric review. Int J Electron Eng Res 4(3):205–221
Google Scholar
Abd El-Samie FE (2011) Information security for automatic speaker identification. Springer, Berlin, Germany, pp 1–122
Google Scholar
Shuvo S, et al (2020) "Analog signal processing based hardware implementation of real-time audio visualizer," 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, pp. 1852-1856.https://doi.org/10.1109/TENSYMP50017.2020.9230976
Shahnaz C, Zhu W-P, Ahmad MO (2012) Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme. IEEE Trans Audio Speech Lang Process 20(1):322–335. https://doi.org/10.1109/TASL.2011.2161579
Article Google Scholar
Hosoda Y, Kawamura A, Iiguni Y (2023) Complex-domain pitch estimation algorithm for narrowband speech signals. IEEE/ACM Trans Audio, Speech, Lang Process 31:2067–2078. https://doi.org/10.1109/TASLP.2023.3278488
Article Google Scholar
Hosoda Y, Kawamura A, Iiguni Y (2021) Pitch estimation algorithm for narrowband speech signal using phase differences between harmonics. In: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan. pp 920–925
Google Scholar
Chen G-F, Wu Y-D (2019) "Segmentation of singing, speech and instruments in Kunqu audio based on zero-crossing rate," 2019 12th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, pp. 270-273.https://doi.org/10.1109/ISCID.2019.00069
Pratibha K, Chandrashekar HM (2017) "Estimation and tracking of pitch for noisy speech signals using EMD based autocorrelation function algorithm," 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, pp. 2071–2075. https://doi.org/10.1109/RTEICT.2017.8256964
Bachu RG, Kopparthi S, Adapa B, Barkana BD (2008) Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. Am Soc Eng Educ (ASEE) Zone Conf Proc 1–7
Xu X, Zhang T-Q, Shi S, Zhang Y-J (2014) An improved pitch detection of speech combined with speech enhancement. In: 2014 7th International Congress on Image and Signal Processing, Dalian, China,. pp 778–782. https://doi.org/10.1109/CISP.2014.7003882
Chapter Google Scholar
Vijay K, Krithiga P, Kavirakesh S (2023) "Pitch extraction and notes generation implementation using tensor flow," 2023 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, pp. 1–8. https://doi.org/10.1109/ICCCI56745.2023.10128544
https://www.magicdatatech.com/datasets/tts/mdt-tts-e011-mandarin-chinese-speech-corpus-for-tts-1611045140. Last access (13 Dec. 2023)
https://svr-www.eng.cam.ac.uk/comp.speech/Section1/Data/noisex.html. Last access (13 Dec. 2023)
Wang H, Yue W, Wen S, Xu X, Haasis HD, Su M et al (2022) An improved bearing fault detection strategy based on artificial bee colony algorithm. CAAI Trans Intell Technol 7:570–581
Article Google Scholar
Ksibi A, Hakami NA, Alturki N, Zakariah M, Ayadi M (2023) Voice pathology detection using a two-level classifier based on combined cnn–rnn architecture . Sustainability 15(4):3204. https://doi.org/10.3390/su15043204
Article Google Scholar
Shrikant M, Kumar P, Namasudra S, Tiwary US (2022) Experience replay-based deep reinforcement learning for dialogue management optimisation. ACM Trans Asian Low-Resour Lang Inf Process. https://doi.org/10.1145/3539223
Albakri A, Alabdullah B, Alhayan F (2023) Blockchain-assisted machine learning with hybrid metaheuristics-empowered cyber attack detection and classification model. Sustainability 15:13887. https://doi.org/10.3390/su151813887
Article Google Scholar
Ayoub S, Gulzar Y, Rustamov J, Jabbari A, Reegu FA, Turaev S (2023) Adversarial approaches to tackle imbalanced data in machine learning. Sustainability 15(9):7097. https://doi.org/10.3390/su15097097
Article Google Scholar
Zheng M, Zhi K, Zeng J, Tian C, You L (2022) A hybrid CNN for image denoising. J Artif Intell Technol 2(3):93–99. https://doi.org/10.37965/jait.2022.0101
Article Google Scholar
Manjari K, Verma M, Singal G, Namasudra S (2023) QEST: quantized and efficient scene text detector using deep learning. ACM Trans Asian Low-Resour Lang Inf Process 22(5):18. https://doi.org/10.1145/3526217
Article Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number ISP23-56.

Funding

This research was funded by Deputyship for Research& Innovation, Ministry of Education in Saudi Arabia, grant number ISP23-56.

Author information

Authors and Affiliations

Department of Electrical Engineering, College of Engineering, Jazan University, 45142, Jizan, Saudi Arabia
Emad S. Hassan, Ahmed S. Oshaba & Atef El-Emary
Productivity and Vocational Training Department, Ministry of Industry, Cairo, Egypt
Badawi Neyazi
Acoustic Laboratory, Housing and Building National Research Center, Giza, Egypt
H. S. Seddeq
Electronics and Communications Department, Faculty of Engineering, Zagazig University, Zagazig, Egypt
Adel Zaghloul Mahmoud
Department of Electronics and Electrical Communication, Faculty of Electronic Engineering, Menoufia University, 32952, Menouf, Egypt
Fathi E. Abd El‑Samie
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
Fathi E. Abd El‑Samie

Authors

Emad S. Hassan
View author publications
Search author on:PubMed Google Scholar
Badawi Neyazi
View author publications
Search author on:PubMed Google Scholar
H. S. Seddeq
View author publications
Search author on:PubMed Google Scholar
Adel Zaghloul Mahmoud
View author publications
Search author on:PubMed Google Scholar
Ahmed S. Oshaba
View author publications
Search author on:PubMed Google Scholar
Atef El-Emary
View author publications
Search author on:PubMed Google Scholar
Fathi E. Abd El‑Samie
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed in writing and reviewing this paper.

Corresponding author

Correspondence to Emad S. Hassan.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hassan, E.S., Neyazi, B., Seddeq, H.S. et al. HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation. Multimed Tools Appl 83, 77489–77508 (2024). https://doi.org/10.1007/s11042-024-18231-x

Download citation

Received: 08 September 2023
Revised: 17 December 2023
Accepted: 08 January 2024
Published: 24 February 2024
Version of record: 24 February 2024
Issue date: September 2024
DOI: https://doi.org/10.1007/s11042-024-18231-x

Keywords

Profiles

Emad S. Hassan View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mitigate the reverberation effect on the speaker verification performance using different methods

Using combined features to improve speaker verification in the face of limited reverberant data

Separation of Reverberant Speech Based on Computational Auditory Scene Analysis

Explore related subjects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now