Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language

Cao, Dezhi; Zhao, Yue; Wu, Licheng

doi:10.1007/s00500-024-09934-2

Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language

Application of soft computing
Published: 19 August 2024

Volume 28, pages 12269–12280, (2024)
Cite this article

Soft Computing Aims and scope Submit manuscript

Dezhi Cao^1,3,
Yue Zhao¹ &
Licheng Wu²

168 Accesses
Explore all metrics

Abstract

Grapheme-to-phoneme (G2P) conversion technology is currently dominated by two methodologies: knowledge-based and data-based approaches. Knowledge-driven methods struggle to adapt to extensive datasets, while data-driven methods rely heavily on high-quality data and require precise feature selection for model construction. To address these challenges, this research aims to propose an integrated approach that combines prior knowledge with data-driven techniques for automatic G2P conversion in the Korean language. In this work, we extract attributes based on pronunciation rules and phonetic transformations between Korean words to construct a decision tree. Subsequently, the model is trained using a data-driven approach for automated phonetic transcription. The proposed integrated model achieves more accurate alignment between input and output variables, effectively capturing phonological variations in continuous Korean speech, and determining corresponding phonemes for graphemes. Rigorous cross-validation confirms its superiority, with an average accuracy of 94.63% in grapheme-to-phoneme conversion, outperforming existing methodologies. In conclusion, this research demonstrates the effectiveness of an integrated approach combining prior knowledge and data-driven techniques for G2P conversion in Korean. The high accuracy and performance of this method are significant for Korean G2P. Our approach can also be applied to low-resource or endangered languages that already have some linguistic research foundation to improve the accuracy of the pronunciation lexicon of the language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Advancements in Grapheme-to-Phoneme Conversion Models for Speech Synthesis

The Role of Orthography in Lexical Processing of the Phonological Variants in Second Language

Article 28 July 2020

Grapheme-to-Phoneme Transduction for Cross-Language ASR

Availability of data and materials

Enquiries about data availability should be directed to the authors.

References

Andersen O, Kuhn R, Lazaridès A et al (1996) Comparison of two tree-structured approaches for grapheme-to-phoneme conversion. Spoken Language. In: ICSLP 96. Proceedings. Fourth International Conference. Philadelphia 1996(3):1700–1703
Arif Ahmad et al (2019) An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis. Acoust Sci Technol:374–381
Bisani M, Ney H (2002) Investigations on joint multigram models for grapheme-to-phoneme conversion. In: The 7th International Conference on Spoken Language Processing (ICSLP), pp. 105–108
Bouma G (2000) A finite-state and data-oriented method for grapheme to phoneme conversion. In: The 1st Meeting of the North American Chapter of the Association for Computational Linguistics. Seattle, USA
Cherifi E-H, Mhania G (2017) Phonetisaurus-based letter-to-sound transcription for Standard Arabic. In: 2017 5th International Conference on Electrical Engineering-Boumerdes (ICEE-B). IEEE
Chunfeng W, et al (2023) LiteG2P: a fast, light and high accuracy model for grapheme-to-phoneme conversion. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
El-Hadi C, Guerti M (2021) Arabic grapheme-to-phoneme conversion based on joint multi-gram model. Int J Speech Technol: 173–182
Fadi S et al (2016) Rule-based standard Arabic Phonetization at phoneme, allophone, and syllable level. Int J Comput Linguist (IJCL): 23–37
Feng W, Yi M, Ma Y (2018) Tensorflow-based phonetic system for Russian vocabulary. Comput Appl 38(04):971–977
Google Scholar
Feng W, Mianzhu Y, Yanzhou M (2018) Research on Russian word-sound conversion algorithm based on WFST. J Chin Inform 32(02):87–93
Google Scholar
Hadj A, Ikbel ZM, Zied L (2020) DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis. Int J Speech Technol 23(3):569–584
Article Google Scholar
He Z, Wu Z, Xu G et al (2021) Decision tree for sequences. IEEE Trans Knowl Data Eng 99:1–1
Google Scholar
Ji X (2014) Research on Korean speech recognition. In: University of Chinese Academy of Sciences
Josef R, Novak MN, Hirose K (2012) WFST-based grapheme-to-phoneme conversion: Open source tools for alignment, model-building and decoding. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing
Jungjun K et al (2023) Good neighbors are all you need for chinese grapheme-to-phoneme conversion. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Karalis G (2020) Decision trees and applications. Adv Exp Med Biol 1194:239–242
Article Google Scholar
Kumar S, Mishra AK, Choudhary BS (2022) Prediction of back break in blasting using random decision trees. Eng Comput 38(2):1185–1191
Article Google Scholar
Kun Z, Weiqian L, Runsheng Liu (2008) A conditional dimensional expansion algorithm for word-sound conversion. J Tsinghua Univ (Natl Sci Ed) 48(10):1629–1631
Google Scholar
Li P, Xu B (2008) Research on automatic word pronunciation method. J Tsinghua Univ (Natl Sci Ed), pp. 735–740
Li C, Wang M, Kim SY et al (2022) Phonological preparation in Korean: phoneme, or syllable or another unit? Lang Speech 65(2):337–353
Article Google Scholar
Lim YW, Cho JR, Lee JM et al (2017) The development of grapheme-to-phoneme conversion based on LSTM for Korean language. In: Korea Institute of Information Science and Technology academic proceedings, pp. 2004–2006
Manohar K, Jayan AR, Rajan R (2022) Mlphon: a multifunctional grapheme-phoneme conversion tool using finite state transducers. IEEE Access 10:97555–97575
Article Google Scholar
Mingay HRF, Hendricusdottir R, Ceross A et al (2022) Using rule-based decision trees to digitize legislation. Prosthesis 4(1):113–124
Article Google Scholar
Moshkov M (2022) On the depth of decision trees with hypotheses. Entropy 24(1):116
Article MathSciNet Google Scholar
Paul Taylor (2005) Hidden Markov models for grapheme to phoneme conversion. In: Ninth European Conference on Speech Communication and Technology
Praveen N, Kini S (2022) Phoneme based Kannada speech corpus for automatic speech recognition system. In: 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). IEEE, 1-5
Priyamvada R, et al (2022) Grapheme to phoneme conversion for malayalam speech using encoder-decoder architecture. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021). Springer Nature Singapore, Singapore
Rosati D (2022) Learning to pronounce as measuring cross lingual joint orthography-phonology complexity. arXiv preprint arXiv:2202.00794
Stefan-Adrian T, Doru-Petru M (2009) Rule-based automatic phonetic transcription for the Romanian language. In: Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns. IEEE 2009:682–686
Talebi S, Waczak J, Fernando BA et al (2022) Data-driven EEG band discovery with decision trees. Sensors 22(8):3048
Article Google Scholar
Tomohiro Y (2022) Grapheme-to-phoneme conversion for thai using neural regression models. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Wang YS, Li LG (2019) Decision tree-based phonetic conversion algorithm for German graphemes. Comput Appl Softw 36(01):211–215
Google Scholar
Wang YS, Chai PQ, Xuan GR (2006) DFGA-based word-sound conversion algorithm in English speech synthesis. Comput Eng Appl 42(13):158–161
Google Scholar
Wang YC, Tzong R, Han T (2009) Rule-based Korean grapheme to phoneme conversion using sound patterns. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, pp. 843–850
Zeroth Kaldi-based Korean ASR open-source project [DB/OL]. [2022-4-30]. https://opensourcelibs.com/lib/zeroth

Download references

Funding

This work was supported by the funds as follows: (1) National Natural Science Foundation of China (NSFC), “Research on end-to-end multi-task learning-based multi-dialect speech recognition method for Tibetan” (61976236); (2) National Natural Science Foundation of China (NSFC), “Research on key technology of footplate water strider robot and its prototype development” (61773416); (3) Minzu University of China, “Research on Sino-Tibetan cross-language speech recognition method based on big data migration learning” (2020MDJC06).

Author information

Authors and Affiliations

Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China
Dezhi Cao & Yue Zhao
School of Information Engineering Minzu University of China, Beijing, 100081, China
Licheng Wu
School of Chinese Ethnic Languages and Literature, Minzu University of China, Beijing, 100081, China
Dezhi Cao

Authors

Dezhi Cao
View author publications
Search author on:PubMed Google Scholar
Yue Zhao
View author publications
Search author on:PubMed Google Scholar
Licheng Wu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Yue Zhao.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Our research work is original and free of plagiarism. All authors have contributed significantly to the research, and any Conflict of interest have been disclosed. Ethical approval, when applicable, has been obtained, and data integrity is ensured. The research complies with relevant laws and publication standards. We acknowledge any support received for this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, D., Zhao, Y. & Wu, L. Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language. Soft Comput 28, 12269–12280 (2024). https://doi.org/10.1007/s00500-024-09934-2

Download citation

Accepted: 07 May 2024
Published: 19 August 2024
Version of record: 19 August 2024
Issue date: October 2024
DOI: https://doi.org/10.1007/s00500-024-09934-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Advancements in Grapheme-to-Phoneme Conversion Models for Speech Synthesis

The Role of Orthography in Lexical Processing of the Phonological Variants in Second Language

Grapheme-to-Phoneme Transduction for Cross-Language ASR

Explore related subjects

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now