Skip to main content

An Approach to Acquire Word Translations from Non-parallel Texts

  • Conference paper
Progress in Artificial Intelligence (EPIA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3808))

Included in the following conference series:

  • 1552 Accesses

  • 5 Citations

Abstract

Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now. The current approach, which relies on the previous extraction of bilingual pairs of lexico-syntactic templates from parallel corpora, makes a significant improvement to about 89% of words translations identified correctly.

This work has been supported by Ministerio de Educación y Ciencia of Spain, within the project GARI-COTERM, ref: HUM2004-05658-D02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ahrenberg, L., Andersson, M., Merkel, M.: A simple hybrid aligner for generating lexical correspondences in parallel texts. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 29–35 (1998)

    Google Scholar 

  2. Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)

    Google Scholar 

  3. Dejean, H., Gaussier, E., Sadat, F.: Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora. In: COLING 2002, Tapei, Taiwan (2002)

    Google Scholar 

  4. Diab, M., Finch, S.: A statistical word-level translation model for comparable corpora. In: Proceedings of the Conference on Content-Based Multimedia Information Access, RIAO (2001)

    Google Scholar 

  5. Fung, P.: Compiling bilingual lexicon entries from a non-parallel english-chinese corpus. In: 14th Annual Meeting of Very Large Corpora, Boston, Massachusettes, pp. 173–183 (1995)

    Google Scholar 

  6. Fung, P., McKeown, K.: Finding terminology translation frmo non-parallel corpora. In: 5th Annual Workshop on Very Large Corpora, Hong Kong, pp. 192–202 (1997)

    Google Scholar 

  7. Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Coling 1998, Montreal, Canada, pp. 414–420 (1998)

    Google Scholar 

  8. Gamallo, P.: Extraction of translation equivalents from parallel corpora using sense-sensitive contexts. In: 10th Conference of the European Association on Machine Translation (EAMT 2005), Budapest, Hungary, pp. 97–102 (2005)

    Google Scholar 

  9. Gamallo, P., Agustini, A., Lopes, G.: Clustering syntactic positions with similar syntactic requirements. Computational Linguistics 31(1) (2005)

    Google Scholar 

  10. Gamallo, P., Gasperin, C., Agustini, A., Lopes, G.P.: Syntactic-based methods for measuring word similarity. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 116–125. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, USA (1994)

    MATH  Google Scholar 

  12. Harris, Z.: Distributional structure. In: Katz, J.J. (ed.) The Philosophy of Linguistics, pp. 26–47. Oxford University Press, New York (1985)

    Google Scholar 

  13. Kwong, O.Y., Tsou, B.K., Lai, T.B.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)

    Article  Google Scholar 

  14. Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Montreal (1998)

    Google Scholar 

  15. Melamed, D.: A word-to-word model of translational equivalence. In: 35th Conference of the Association of Computational Linguistics (ACL 1997), Madrid, Spain (1997)

    Google Scholar 

  16. Nakagawa, H.: Disambiguation of single noun translations extracted from bilingual comparable corpora. Terminology 7(1), 63–83 (2001)

    Google Scholar 

  17. Rapp, R.: Identifying word translations in non-parallel texts. In: 33rd Conference of the ACL 1995, pp. 320–322 (1995)

    Google Scholar 

  18. Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)

    Google Scholar 

  19. Schimd, H.: Treetagger. In: A language independent part-of-speech tagger (2002), http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

  20. Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: 11th Nordic Conference of Computational Linguistics, Copenhagen, Denmark (1998)

    Google Scholar 

  21. Wettler, M., Rapp, R.: Computation of word associations based on the co-occurrences of words in large corpora. In: 1st Workshop on Very Large Corpora, Columbus, Ohio, pp. 84–93 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Otero, P.G., Campos, J.R.P. (2005). An Approach to Acquire Word Translations from Non-parallel Texts. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_59

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics