Skip to main content
Log in

HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the realm of speaker identification, pitch frequency serves as a fundamental feature. However, this feature can be compromised when a speaker records his speech in a closed room, resulting in distorted signal features. This distortion not only reduces the effectiveness of speaker identification systems, but also opens the door for potential deception by hackers who exploit the reverberation effects in closed rooms. To address this concern, the correction of estimated pitch frequencies emerges as an essential step for the success of speaker identification systems. This paper presents a Hybrid Approach for Estimating Pitch Frequency (HAEPF) that integrates both the Zero Crossing Rate (ZCR) and Auto-Correlation Function (ACF) methods. Furthermore, the paper delves into the modeling of reverberant speech using comb filtering, shedding light on how multiple reflections impact the accuracy of pitch frequency estimation. Several simulation experiments were conducted to assess pitch frequency estimation for speech signals, both in the presence and absence of reverberation. The estimation errors were calculated for all three scenarios of reverberation (mild, moderate, and severe). The results clearly indicate that as the degree of reverberation, characterized by the comb filter order, increases, the pitch frequency estimation error also increases. The estimation accuracy of the proposed approach is calculated in terms of Pitch Frequency Estimation Error (PFEE), Gross Pitch Error (GPE) and Octave Error (OER) and is compared with those of several established pitch frequency estimation methods. The proposed approach exhibits a notable enhancement even in noisy environments, reducing PFEE by 43%, and achieving GPE and OER of less than 0.3 and 0.12, respectively, at a Signal-to-Noise Ratio (SNR) of 0 dB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from €37.37 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Fig. 1
The alternative text for this image may have been generated using AI.
Fig. 2
The alternative text for this image may have been generated using AI.
Fig. 3
The alternative text for this image may have been generated using AI.
Fig. 4
The alternative text for this image may have been generated using AI.
Fig. 5
The alternative text for this image may have been generated using AI.
Fig. 6
The alternative text for this image may have been generated using AI.
Fig. 7
The alternative text for this image may have been generated using AI.
Fig. 8
The alternative text for this image may have been generated using AI.
Fig. 9
The alternative text for this image may have been generated using AI.
Fig. 10
The alternative text for this image may have been generated using AI.
Fig. 11
The alternative text for this image may have been generated using AI.
Fig. 12
The alternative text for this image may have been generated using AI.
Fig. 13
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Küsel ET, Siderius M (2019) Comparison of propagation models for the characterization of sound pressure fields. IEEE J Oceanic Eng 44(3):598–610. https://doi.org/10.1109/JOE.2018.2884107

    Article  Google Scholar 

  2. Hu Y, Tang J, Zhou H (2018) "A method of sound propagation loss calculation based on Gaussian beams," 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, pp. 1–4. https://doi.org/10.1109/WCSP.2018.8555716

  3. Zhang L, Li XY, Meng CX (2020) "Modeling of high frequency sound propagation characteristics in Shallow Sea," 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Macau, China, pp. 1–4. https://doi.org/10.1109/ICSPCC50002.2020.9259498

  4. Zhou J, Zhang L, He W, Zheng L (2022) "Parameter analysis affecting the characteristics of sound insulation of gradient U-shaped groove structure," 2022 4th International Conference on Intelligent Control, Measurement and Signal Processing (ICMSP), Hangzhou, China, pp. 130–133. https://doi.org/10.1109/ICMSP55950.2022.9859054

  5. Liu Z, Li Y, Huang R (2021) "Analysis of vibration and sound field evaluation and simulation method of main sound source equipment in substation," 2021 IEEE 4th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, pp. 220–223. https://doi.org/10.1109/ICISCAE52414.2021.9590767

  6. Lou W, Jin Z, Zhang C, Hou A, Wang W, Ding L (2023) "Analysis of primary frequency response based on overspeed and pitch control reserve and coordinated control strategy," 2023 IEEE International Conference on Power Science and Technology (ICPST), Kunming, China, pp. 193–198. https://doi.org/10.1109/ICPST56889.2023.10164944

  7. Peng F, McKay CM, Mao D, Hou W, Innes-Brown H (2019) "Cortical pitch response components correlate with the pitch salience of resolved and unresolved components of Mandarin tones," 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, pp. 4682-4685.https://doi.org/10.1109/EMBC.2019.8856565

  8. Lin S (2019) Robust pitch estimation and tracking for speakers based on subband encoding and the generalized labeled multi-bernoulli filter. IEEE/ACM Trans Audio, Speech, Lang Process 27(4):827–841. https://doi.org/10.1109/TASLP.2019.2898818

    Article  Google Scholar 

  9. Wei W, Li P, Yu Y, Li W (2022) "HarmoF0: Logarithmic scale dilated convolution for pitch estimation," 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, pp. 1–6. https://doi.org/10.1109/ICME52920.2022.9858935

  10. Lai JJ, Townsend J (2022) "Developing a noise canceling device for ranged sound suppression," 2022 IEEE Integrated STEM Education Conference (ISEC), Princeton, NJ, USA, pp. 413-413.https://doi.org/10.1109/ISEC54952.2022.10025054

  11. Azarov E, Vashkevich M, Petrovsky A (2012) "Instantaneous pitch estimation based on RAPT framework," 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, pp. 2787-2791

  12. De Cheveigné A, Kawahara H (2002) YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am 111(4):1917–1930

    Article  Google Scholar 

  13. Mauch M, Dixon S (2014) PYIN: A fundamental frequency estimator using probabilistic threshold distributions. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy. pp 659–663. https://doi.org/10.1109/ICASSP.2014.6853678

    Chapter  Google Scholar 

  14. Nakai T, Rachman L, Arias Sarah P, Okanoya K, Aucouturier JJ (2023) Algorithmic voice transformations reveal the phonological basis of language-familiarity effects in cross-cultural emotion judgments. PLoS One 18(5):e0285028. https://doi.org/10.1371/journal.pone.0285028

    Article  Google Scholar 

  15. Kopf LM, Jackson-Menaldi C, Rubin AD, Skeffington J, Hunter EJ, Skowronski MD, Shrivastav R (2017) Pitch strength as an outcome measure for treatment of dysphonia. J Voice 31(6):691–696. https://doi.org/10.1016/j.jvoice.2017.01.016

    Article  Google Scholar 

  16. Guglani J, Mishra AN (2020) Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit. Appl Acoust 167:107386

    Article  Google Scholar 

  17. Xu S, Shimodaira H (2019) Direct F0 estimation with neural-network-based regression. Interspeech 1995–1999. https://api.semanticscholar.org/CorpusID:202714159

  18. Kim JW, Salamon J, Li P, Bello JP (2018) Crepe: A Convolutional Representation for Pitch Estimation. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada. pp 161–165. https://doi.org/10.1109/ICASSP.2018.8461329

    Chapter  Google Scholar 

  19. Dong M, Wu J, Luan J (2019) Vocal pitch extraction in polyphonic music using convolutional residual network. In: 20th Annual Conference of the International Speech Communication Association. pp 2010–2014. http://dx.doi.org/10.21437/Interspeech.2019-2286

    Google Scholar 

  20. Hung YC, Chen P-H, Ding J-J (2023) "Pitch estimation by denoising preprocessor and hybrid estimation model," 2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), PingTung, Taiwan, pp. 781–782. https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226907

  21. Khadem-hosseini M, Ghaemmaghami S, Abtahi A, Gazor S, Marvasti F (2020) Error correction in pitch detection using a deep learning based classification. IEEE/ACM Trans Audio, Speech, Lang Process 28:990–999. https://doi.org/10.1109/TASLP.2020.2977472

    Article  Google Scholar 

  22. Chhetri AR, Kumar K, Muthyala MP, Shreyas MR, Bangalore RA (2023) "Carnatic music identification of Melakarta ragas through machine and deep learning using audio signal processing," 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, pp. 1-5.https://doi.org/10.1109/INCET57972.2023.10170568

  23. Zhang C, et al (2021) "Denoispeech: denoising text to speech with frame-level noise modeling," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, pp. 7063–7067. https://doi.org/10.1109/ICASSP39728.2021.9413934

  24. Nayem KM, Williamson DS (2021) "Towards An ASR approach using acoustic and language models for speech enhancement," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, pp. 7123–7127. https://doi.org/10.1109/ICASSP39728.2021.9414565

  25. Black D, Rapos EJ, Stephan M (2019) "Voice-driven modeling: software modeling using automated speech recognition," 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C), Munich, Germany, pp. 252–258. https://doi.org/10.1109/MODELS-C.2019.00040

  26. Pal S (2012) Speech signal processing: non-linear energy operator centric review. Int J Electron Eng Res 4(3):205–221

    Google Scholar 

  27. Abd El-Samie FE (2011) Information security for automatic speaker identification. Springer, Berlin, Germany, pp 1–122

    Google Scholar 

  28. Shuvo S, et al (2020) "Analog signal processing based hardware implementation of real-time audio visualizer," 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, pp. 1852-1856.https://doi.org/10.1109/TENSYMP50017.2020.9230976

  29. Shahnaz C, Zhu W-P, Ahmad MO (2012) Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme. IEEE Trans Audio Speech Lang Process 20(1):322–335. https://doi.org/10.1109/TASL.2011.2161579

    Article  Google Scholar 

  30. Hosoda Y, Kawamura A, Iiguni Y (2023) Complex-domain pitch estimation algorithm for narrowband speech signals. IEEE/ACM Trans Audio, Speech, Lang Process 31:2067–2078. https://doi.org/10.1109/TASLP.2023.3278488

    Article  Google Scholar 

  31. Hosoda Y, Kawamura A, Iiguni Y (2021) Pitch estimation algorithm for narrowband speech signal using phase differences between harmonics. In: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan. pp 920–925

    Google Scholar 

  32. Chen G-F, Wu Y-D (2019) "Segmentation of singing, speech and instruments in Kunqu audio based on zero-crossing rate," 2019 12th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, pp. 270-273.https://doi.org/10.1109/ISCID.2019.00069

  33. Pratibha K, Chandrashekar HM (2017) "Estimation and tracking of pitch for noisy speech signals using EMD based autocorrelation function algorithm," 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, pp. 2071–2075. https://doi.org/10.1109/RTEICT.2017.8256964

  34. Bachu RG, Kopparthi S, Adapa B, Barkana BD (2008) Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. Am Soc Eng Educ (ASEE) Zone Conf Proc 1–7

  35. Xu X, Zhang T-Q, Shi S, Zhang Y-J (2014) An improved pitch detection of speech combined with speech enhancement. In: 2014 7th International Congress on Image and Signal Processing, Dalian, China,. pp 778–782. https://doi.org/10.1109/CISP.2014.7003882

    Chapter  Google Scholar 

  36. Vijay K, Krithiga P, Kavirakesh S (2023) "Pitch extraction and notes generation implementation using tensor flow," 2023 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, pp. 1–8. https://doi.org/10.1109/ICCCI56745.2023.10128544

  37. https://www.magicdatatech.com/datasets/tts/mdt-tts-e011-mandarin-chinese-speech-corpus-for-tts-1611045140. Last access (13 Dec. 2023)

  38. https://svr-www.eng.cam.ac.uk/comp.speech/Section1/Data/noisex.html. Last access (13 Dec. 2023)

  39. Wang H, Yue W, Wen S, Xu X, Haasis HD, Su M et al (2022) An improved bearing fault detection strategy based on artificial bee colony algorithm. CAAI Trans Intell Technol 7:570–581

    Article  Google Scholar 

  40. Ksibi A, Hakami NA, Alturki N, Zakariah M, Ayadi M (2023) Voice pathology detection using a two-level classifier based on combined cnn–rnn architecture . Sustainability 15(4):3204. https://doi.org/10.3390/su15043204

    Article  Google Scholar 

  41. Shrikant M, Kumar P, Namasudra S, Tiwary US (2022) Experience replay-based deep reinforcement learning for dialogue management optimisation. ACM Trans Asian Low-Resour Lang Inf Process. https://doi.org/10.1145/3539223

  42. Albakri A, Alabdullah B, Alhayan F (2023) Blockchain-assisted machine learning with hybrid metaheuristics-empowered cyber attack detection and classification model. Sustainability 15:13887. https://doi.org/10.3390/su151813887

    Article  Google Scholar 

  43. Ayoub S, Gulzar Y, Rustamov J, Jabbari A, Reegu FA, Turaev S (2023) Adversarial approaches to tackle imbalanced data in machine learning. Sustainability 15(9):7097. https://doi.org/10.3390/su15097097

    Article  Google Scholar 

  44. Zheng M, Zhi K, Zeng J, Tian C, You L (2022) A hybrid CNN for image denoising. J Artif Intell Technol 2(3):93–99. https://doi.org/10.37965/jait.2022.0101

    Article  Google Scholar 

  45. Manjari K, Verma M, Singal G, Namasudra S (2023) QEST: quantized and efficient scene text detector using deep learning. ACM Trans Asian Low-Resour Lang Inf Process 22(5):18. https://doi.org/10.1145/3526217

    Article  Google Scholar 

Download references

Acknowledgements

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number ISP23-56.

Funding

This research was funded by Deputyship for Research& Innovation, Ministry of Education in Saudi Arabia, grant number ISP23-56.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed in writing and reviewing this paper.

Corresponding author

Correspondence to Emad S. Hassan.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassan, E.S., Neyazi, B., Seddeq, H.S. et al. HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation. Multimed Tools Appl 83, 77489–77508 (2024). https://doi.org/10.1007/s11042-024-18231-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11042-024-18231-x

Keywords

Profiles

  1. Emad S. Hassan