An Approach to Acquire Word Translations from Non-parallel Texts

Otero, Pablo Gamallo; Campos, José Ramom Pichel

doi:10.1007/11595014_59

Pablo Gamallo Otero²¹ &
José Ramom Pichel Campos²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3808))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

1552 Accesses
5 Citations

Abstract

Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now. The current approach, which relies on the previous extraction of bilingual pairs of lexico-syntactic templates from parallel corpora, makes a significant improvement to about 89% of words translations identified correctly.

This work has been supported by Ministerio de Educación y Ciencia of Spain, within the project GARI-COTERM, ref: HUM2004-05658-D02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving neural sentence alignment with word translation

Article 13 August 2020

Probing a Two-Way Parallel T&I Corpus for the Lexical Choices of Translators and Interpreters

Building a Chinese-English Parallel Corpus for Machine Translation: A Big Data-Driven Approach

References

Ahrenberg, L., Andersson, M., Merkel, M.: A simple hybrid aligner for generating lexical correspondences in parallel texts. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 29–35 (1998)
Google Scholar
Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)
Google Scholar
Dejean, H., Gaussier, E., Sadat, F.: Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora. In: COLING 2002, Tapei, Taiwan (2002)
Google Scholar
Diab, M., Finch, S.: A statistical word-level translation model for comparable corpora. In: Proceedings of the Conference on Content-Based Multimedia Information Access, RIAO (2001)
Google Scholar
Fung, P.: Compiling bilingual lexicon entries from a non-parallel english-chinese corpus. In: 14th Annual Meeting of Very Large Corpora, Boston, Massachusettes, pp. 173–183 (1995)
Google Scholar
Fung, P., McKeown, K.: Finding terminology translation frmo non-parallel corpora. In: 5th Annual Workshop on Very Large Corpora, Hong Kong, pp. 192–202 (1997)
Google Scholar
Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Coling 1998, Montreal, Canada, pp. 414–420 (1998)
Google Scholar
Gamallo, P.: Extraction of translation equivalents from parallel corpora using sense-sensitive contexts. In: 10th Conference of the European Association on Machine Translation (EAMT 2005), Budapest, Hungary, pp. 97–102 (2005)
Google Scholar
Gamallo, P., Agustini, A., Lopes, G.: Clustering syntactic positions with similar syntactic requirements. Computational Linguistics 31(1) (2005)
Google Scholar
Gamallo, P., Gasperin, C., Agustini, A., Lopes, G.P.: Syntactic-based methods for measuring word similarity. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 116–125. Springer, Heidelberg (2001)
Chapter Google Scholar
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, USA (1994)
MATH Google Scholar
Harris, Z.: Distributional structure. In: Katz, J.J. (ed.) The Philosophy of Linguistics, pp. 26–47. Oxford University Press, New York (1985)
Google Scholar
Kwong, O.Y., Tsou, B.K., Lai, T.B.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)
Article Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Montreal (1998)
Google Scholar
Melamed, D.: A word-to-word model of translational equivalence. In: 35th Conference of the Association of Computational Linguistics (ACL 1997), Madrid, Spain (1997)
Google Scholar
Nakagawa, H.: Disambiguation of single noun translations extracted from bilingual comparable corpora. Terminology 7(1), 63–83 (2001)
Google Scholar
Rapp, R.: Identifying word translations in non-parallel texts. In: 33rd Conference of the ACL 1995, pp. 320–322 (1995)
Google Scholar
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)
Google Scholar
Schimd, H.: Treetagger. In: A language independent part-of-speech tagger (2002), http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: 11th Nordic Conference of Computational Linguistics, Copenhagen, Denmark (1998)
Google Scholar
Wettler, M., Rapp, R.: Computation of word associations based on the co-occurrences of words in large corpora. In: 1st Workshop on Very Large Corpora, Columbus, Ohio, pp. 84–93 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Department de Língua Espanhola, Faculdade de Filologia, Universidade de Santiago de Compostela, Galiza, Spain
Pablo Gamallo Otero
Department de Tecnologia Linguística da Imaxin, Software, Santiago de Compostela, Galiza
José Ramom Pichel Campos

Authors

Pablo Gamallo Otero
View author publications
Search author on:PubMed Google Scholar
José Ramom Pichel Campos
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

Portugal Telecom Inovação (PTI), Centro de Informatica e Sistemas da Universidade de Coimbra (CISUC),
Carlos Bento
Department of Informatics Engineering, Coimbra University, Portugal
Amílcar Cardoso
Centre of Human Language Technology and Bioinformatics, University of Beira Interior, 6201-001, Covilhã, Portugal
Gaël Dias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Otero, P.G., Campos, J.R.P. (2005). An Approach to Acquire Word Translations from Non-parallel Texts. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_59

Download citation

DOI: https://doi.org/10.1007/11595014_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30737-2
Online ISBN: 978-3-540-31646-6
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics