Skip to main content
Log in

Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Grapheme-to-phoneme (G2P) conversion technology is currently dominated by two methodologies: knowledge-based and data-based approaches. Knowledge-driven methods struggle to adapt to extensive datasets, while data-driven methods rely heavily on high-quality data and require precise feature selection for model construction. To address these challenges, this research aims to propose an integrated approach that combines prior knowledge with data-driven techniques for automatic G2P conversion in the Korean language. In this work, we extract attributes based on pronunciation rules and phonetic transformations between Korean words to construct a decision tree. Subsequently, the model is trained using a data-driven approach for automated phonetic transcription. The proposed integrated model achieves more accurate alignment between input and output variables, effectively capturing phonological variations in continuous Korean speech, and determining corresponding phonemes for graphemes. Rigorous cross-validation confirms its superiority, with an average accuracy of 94.63% in grapheme-to-phoneme conversion, outperforming existing methodologies. In conclusion, this research demonstrates the effectiveness of an integrated approach combining prior knowledge and data-driven techniques for G2P conversion in Korean. The high accuracy and performance of this method are significant for Korean G2P. Our approach can also be applied to low-resource or endangered languages that already have some linguistic research foundation to improve the accuracy of the pronunciation lexicon of the language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from €37.37 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Fig. 1
The alternative text for this image may have been generated using AI.
Fig. 2
The alternative text for this image may have been generated using AI.
Fig. 3
The alternative text for this image may have been generated using AI.
Fig. 4
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Availability of data and materials

Enquiries about data availability should be directed to the authors.

References

  • Andersen O, Kuhn R, Lazaridès A et al (1996) Comparison of two tree-structured approaches for grapheme-to-phoneme conversion. Spoken Language. In: ICSLP 96. Proceedings. Fourth International Conference. Philadelphia 1996(3):1700–1703

  • Arif Ahmad et al (2019) An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis. Acoust Sci Technol:374–381

  • Bisani M, Ney H (2002) Investigations on joint multigram models for grapheme-to-phoneme conversion. In: The 7th International Conference on Spoken Language Processing (ICSLP), pp. 105–108

  • Bouma G (2000) A finite-state and data-oriented method for grapheme to phoneme conversion. In: The 1st Meeting of the North American Chapter of the Association for Computational Linguistics. Seattle, USA

  • Cherifi E-H, Mhania G (2017) Phonetisaurus-based letter-to-sound transcription for Standard Arabic. In: 2017 5th International Conference on Electrical Engineering-Boumerdes (ICEE-B). IEEE

  • Chunfeng W, et al (2023) LiteG2P: a fast, light and high accuracy model for grapheme-to-phoneme conversion. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE

  • El-Hadi C, Guerti M (2021) Arabic grapheme-to-phoneme conversion based on joint multi-gram model. Int J Speech Technol: 173–182

  • Fadi S et al (2016) Rule-based standard Arabic Phonetization at phoneme, allophone, and syllable level. Int J Comput Linguist (IJCL): 23–37

  • Feng W, Yi M, Ma Y (2018) Tensorflow-based phonetic system for Russian vocabulary. Comput Appl 38(04):971–977

    Google Scholar 

  • Feng W, Mianzhu Y, Yanzhou M (2018) Research on Russian word-sound conversion algorithm based on WFST. J Chin Inform 32(02):87–93

    Google Scholar 

  • Hadj A, Ikbel ZM, Zied L (2020) DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis. Int J Speech Technol 23(3):569–584

    Article  Google Scholar 

  • He Z, Wu Z, Xu G et al (2021) Decision tree for sequences. IEEE Trans Knowl Data Eng 99:1–1

    Google Scholar 

  • Ji X (2014) Research on Korean speech recognition. In: University of Chinese Academy of Sciences

  • Josef R, Novak MN, Hirose K (2012) WFST-based grapheme-to-phoneme conversion: Open source tools for alignment, model-building and decoding. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing

  • Jungjun K et al (2023) Good neighbors are all you need for chinese grapheme-to-phoneme conversion. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE

  • Karalis G (2020) Decision trees and applications. Adv Exp Med Biol 1194:239–242

    Article  Google Scholar 

  • Kumar S, Mishra AK, Choudhary BS (2022) Prediction of back break in blasting using random decision trees. Eng Comput 38(2):1185–1191

    Article  Google Scholar 

  • Kun Z, Weiqian L, Runsheng Liu (2008) A conditional dimensional expansion algorithm for word-sound conversion. J Tsinghua Univ (Natl Sci Ed) 48(10):1629–1631

    Google Scholar 

  • Li P, Xu B (2008) Research on automatic word pronunciation method. J Tsinghua Univ (Natl Sci Ed), pp. 735–740

  • Li C, Wang M, Kim SY et al (2022) Phonological preparation in Korean: phoneme, or syllable or another unit? Lang Speech 65(2):337–353

    Article  Google Scholar 

  • Lim YW, Cho JR, Lee JM et al (2017) The development of grapheme-to-phoneme conversion based on LSTM for Korean language. In: Korea Institute of Information Science and Technology academic proceedings, pp. 2004–2006

  • Manohar K, Jayan AR, Rajan R (2022) Mlphon: a multifunctional grapheme-phoneme conversion tool using finite state transducers. IEEE Access 10:97555–97575

    Article  Google Scholar 

  • Mingay HRF, Hendricusdottir R, Ceross A et al (2022) Using rule-based decision trees to digitize legislation. Prosthesis 4(1):113–124

    Article  Google Scholar 

  • Moshkov M (2022) On the depth of decision trees with hypotheses. Entropy 24(1):116

    Article  MathSciNet  Google Scholar 

  • Paul Taylor (2005) Hidden Markov models for grapheme to phoneme conversion. In: Ninth European Conference on Speech Communication and Technology

  • Praveen N, Kini S (2022) Phoneme based Kannada speech corpus for automatic speech recognition system. In: 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). IEEE, 1-5

  • Priyamvada R, et al (2022) Grapheme to phoneme conversion for malayalam speech using encoder-decoder architecture. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021). Springer Nature Singapore, Singapore

  • Rosati D (2022) Learning to pronounce as measuring cross lingual joint orthography-phonology complexity. arXiv preprint arXiv:2202.00794

  • Stefan-Adrian T, Doru-Petru M (2009) Rule-based automatic phonetic transcription for the Romanian language. In: Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns. IEEE 2009:682–686

  • Talebi S, Waczak J, Fernando BA et al (2022) Data-driven EEG band discovery with decision trees. Sensors 22(8):3048

    Article  Google Scholar 

  • Tomohiro Y (2022) Grapheme-to-phoneme conversion for thai using neural regression models. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

  • Wang YS, Li LG (2019) Decision tree-based phonetic conversion algorithm for German graphemes. Comput Appl Softw 36(01):211–215

    Google Scholar 

  • Wang YS, Chai PQ, Xuan GR (2006) DFGA-based word-sound conversion algorithm in English speech synthesis. Comput Eng Appl 42(13):158–161

    Google Scholar 

  • Wang YC, Tzong R, Han T (2009) Rule-based Korean grapheme to phoneme conversion using sound patterns. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, pp. 843–850

  • Zeroth Kaldi-based Korean ASR open-source project [DB/OL]. [2022-4-30]. https://opensourcelibs.com/lib/zeroth

Download references

Funding

This work was supported by the funds as follows: (1) National Natural Science Foundation of China (NSFC), “Research on end-to-end multi-task learning-based multi-dialect speech recognition method for Tibetan” (61976236); (2) National Natural Science Foundation of China (NSFC), “Research on key technology of footplate water strider robot and its prototype development” (61773416); (3) Minzu University of China, “Research on Sino-Tibetan cross-language speech recognition method based on big data migration learning” (2020MDJC06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Zhao.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Our research work is original and free of plagiarism. All authors have contributed significantly to the research, and any Conflict of interest have been disclosed. Ethical approval, when applicable, has been obtained, and data integrity is ensured. The research complies with relevant laws and publication standards. We acknowledge any support received for this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, D., Zhao, Y. & Wu, L. Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language. Soft Comput 28, 12269–12280 (2024). https://doi.org/10.1007/s00500-024-09934-2

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s00500-024-09934-2

Keywords