Abstract
Grapheme-to-phoneme (G2P) conversion technology is currently dominated by two methodologies: knowledge-based and data-based approaches. Knowledge-driven methods struggle to adapt to extensive datasets, while data-driven methods rely heavily on high-quality data and require precise feature selection for model construction. To address these challenges, this research aims to propose an integrated approach that combines prior knowledge with data-driven techniques for automatic G2P conversion in the Korean language. In this work, we extract attributes based on pronunciation rules and phonetic transformations between Korean words to construct a decision tree. Subsequently, the model is trained using a data-driven approach for automated phonetic transcription. The proposed integrated model achieves more accurate alignment between input and output variables, effectively capturing phonological variations in continuous Korean speech, and determining corresponding phonemes for graphemes. Rigorous cross-validation confirms its superiority, with an average accuracy of 94.63% in grapheme-to-phoneme conversion, outperforming existing methodologies. In conclusion, this research demonstrates the effectiveness of an integrated approach combining prior knowledge and data-driven techniques for G2P conversion in Korean. The high accuracy and performance of this method are significant for Korean G2P. Our approach can also be applied to low-resource or endangered languages that already have some linguistic research foundation to improve the accuracy of the pronunciation lexicon of the language.




Similar content being viewed by others
Availability of data and materials
Enquiries about data availability should be directed to the authors.
References
Andersen O, Kuhn R, Lazaridès A et al (1996) Comparison of two tree-structured approaches for grapheme-to-phoneme conversion. Spoken Language. In: ICSLP 96. Proceedings. Fourth International Conference. Philadelphia 1996(3):1700–1703
Arif Ahmad et al (2019) An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis. Acoust Sci Technol:374–381
Bisani M, Ney H (2002) Investigations on joint multigram models for grapheme-to-phoneme conversion. In: The 7th International Conference on Spoken Language Processing (ICSLP), pp. 105–108
Bouma G (2000) A finite-state and data-oriented method for grapheme to phoneme conversion. In: The 1st Meeting of the North American Chapter of the Association for Computational Linguistics. Seattle, USA
Cherifi E-H, Mhania G (2017) Phonetisaurus-based letter-to-sound transcription for Standard Arabic. In: 2017 5th International Conference on Electrical Engineering-Boumerdes (ICEE-B). IEEE
Chunfeng W, et al (2023) LiteG2P: a fast, light and high accuracy model for grapheme-to-phoneme conversion. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
El-Hadi C, Guerti M (2021) Arabic grapheme-to-phoneme conversion based on joint multi-gram model. Int J Speech Technol: 173–182
Fadi S et al (2016) Rule-based standard Arabic Phonetization at phoneme, allophone, and syllable level. Int J Comput Linguist (IJCL): 23–37
Feng W, Yi M, Ma Y (2018) Tensorflow-based phonetic system for Russian vocabulary. Comput Appl 38(04):971–977
Feng W, Mianzhu Y, Yanzhou M (2018) Research on Russian word-sound conversion algorithm based on WFST. J Chin Inform 32(02):87–93
Hadj A, Ikbel ZM, Zied L (2020) DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis. Int J Speech Technol 23(3):569–584
He Z, Wu Z, Xu G et al (2021) Decision tree for sequences. IEEE Trans Knowl Data Eng 99:1–1
Ji X (2014) Research on Korean speech recognition. In: University of Chinese Academy of Sciences
Josef R, Novak MN, Hirose K (2012) WFST-based grapheme-to-phoneme conversion: Open source tools for alignment, model-building and decoding. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing
Jungjun K et al (2023) Good neighbors are all you need for chinese grapheme-to-phoneme conversion. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Karalis G (2020) Decision trees and applications. Adv Exp Med Biol 1194:239–242
Kumar S, Mishra AK, Choudhary BS (2022) Prediction of back break in blasting using random decision trees. Eng Comput 38(2):1185–1191
Kun Z, Weiqian L, Runsheng Liu (2008) A conditional dimensional expansion algorithm for word-sound conversion. J Tsinghua Univ (Natl Sci Ed) 48(10):1629–1631
Li P, Xu B (2008) Research on automatic word pronunciation method. J Tsinghua Univ (Natl Sci Ed), pp. 735–740
Li C, Wang M, Kim SY et al (2022) Phonological preparation in Korean: phoneme, or syllable or another unit? Lang Speech 65(2):337–353
Lim YW, Cho JR, Lee JM et al (2017) The development of grapheme-to-phoneme conversion based on LSTM for Korean language. In: Korea Institute of Information Science and Technology academic proceedings, pp. 2004–2006
Manohar K, Jayan AR, Rajan R (2022) Mlphon: a multifunctional grapheme-phoneme conversion tool using finite state transducers. IEEE Access 10:97555–97575
Mingay HRF, Hendricusdottir R, Ceross A et al (2022) Using rule-based decision trees to digitize legislation. Prosthesis 4(1):113–124
Moshkov M (2022) On the depth of decision trees with hypotheses. Entropy 24(1):116
Paul Taylor (2005) Hidden Markov models for grapheme to phoneme conversion. In: Ninth European Conference on Speech Communication and Technology
Praveen N, Kini S (2022) Phoneme based Kannada speech corpus for automatic speech recognition system. In: 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). IEEE, 1-5
Priyamvada R, et al (2022) Grapheme to phoneme conversion for malayalam speech using encoder-decoder architecture. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021). Springer Nature Singapore, Singapore
Rosati D (2022) Learning to pronounce as measuring cross lingual joint orthography-phonology complexity. arXiv preprint arXiv:2202.00794
Stefan-Adrian T, Doru-Petru M (2009) Rule-based automatic phonetic transcription for the Romanian language. In: Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns. IEEE 2009:682–686
Talebi S, Waczak J, Fernando BA et al (2022) Data-driven EEG band discovery with decision trees. Sensors 22(8):3048
Tomohiro Y (2022) Grapheme-to-phoneme conversion for thai using neural regression models. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Wang YS, Li LG (2019) Decision tree-based phonetic conversion algorithm for German graphemes. Comput Appl Softw 36(01):211–215
Wang YS, Chai PQ, Xuan GR (2006) DFGA-based word-sound conversion algorithm in English speech synthesis. Comput Eng Appl 42(13):158–161
Wang YC, Tzong R, Han T (2009) Rule-based Korean grapheme to phoneme conversion using sound patterns. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, pp. 843–850
Zeroth Kaldi-based Korean ASR open-source project [DB/OL]. [2022-4-30]. https://opensourcelibs.com/lib/zeroth
Funding
This work was supported by the funds as follows: (1) National Natural Science Foundation of China (NSFC), “Research on end-to-end multi-task learning-based multi-dialect speech recognition method for Tibetan” (61976236); (2) National Natural Science Foundation of China (NSFC), “Research on key technology of footplate water strider robot and its prototype development” (61773416); (3) Minzu University of China, “Research on Sino-Tibetan cross-language speech recognition method based on big data migration learning” (2020MDJC06).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
Our research work is original and free of plagiarism. All authors have contributed significantly to the research, and any Conflict of interest have been disclosed. Ethical approval, when applicable, has been obtained, and data integrity is ensured. The research complies with relevant laws and publication standards. We acknowledge any support received for this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cao, D., Zhao, Y. & Wu, L. Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language. Soft Comput 28, 12269–12280 (2024). https://doi.org/10.1007/s00500-024-09934-2
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s00500-024-09934-2
