Skip to main content

Advertisement

Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Saved research
Cart
  1. Home
  2. Document Analysis Systems VII
  3. Conference paper

Automatic Keyword Extraction from Historical Document Images

  • Conference paper
  • pp 413–424
  • Cite this conference paper
Document Analysis Systems VII (DAS 2006)
Automatic Keyword Extraction from Historical Document Images
  • Kengo Terasawa18,
  • Takeshi Nagasaki18 &
  • Toshio Kawashima18 

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3872))

Included in the following conference series:

  • International Workshop on Document Analysis Systems
  • 1852 Accesses

  • 4 Citations

Abstract

This paper presents an automatic keyword extraction method from historical document images. The proposed method is language independent because it is purely appearance based, where neither lexical information nor any other statistical language models are required. Moreover, since it does not need word segmentation, it can be applied to Eastern languages where they do not put clear spacing between words. The first half of the paper describes the algorithm to retrieve document image regions which have similar appearance to the given query image. The algorithm was evaluated in recall-precision manner, and showed its performance of over 80–90% average precision. The second half of the paper describes the keyword extraction method which works even if no query word is explicitly specified. Since the computational cost was reduced by the efficient pruning techniques, the system could extract keywords successfully from relatively large documents.

Download to read the full chapter text

Chapter PDF

Similar content being viewed by others

Content-Based Document Image Retrieval Based on Document Modeling

Article 06 June 2020

Unsupervised Automatic Keyphrases Extraction Algorithms

Chapter © 2019

Clustering documents in evolving languages by image texture analysis

Article 26 December 2016

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Automated Pattern Recognition
  • ESCRT
  • Image Processing
  • Machine Translation
  • Object Recognition
  • Reverse engineering
  • Automatic Keyphrase Extraction Techniques in Natural Language Processing

References

  1. Fink, G.A., Plötz, T.: On appearance-based feature extraction methods for writer-independent handwritten text recognition. In: Proc. of International Conference on Document Analysis and Recognition, pp. 1070–1074 (2005)

    Google Scholar 

  2. Gatos, B., Konidaris, T., Ntzios, K., Pratikakis, I., Perantonis, S.: A segmentation-free approach for keyword search in historical typewritten documents. In: Proc. of International Conference on Document Analysis and Recognition, pp. 54–58 (2005)

    Google Scholar 

  3. Lu, Y., Tan, C.L.: Word spotting in Chinese document images without layout analysis. In: Proc. of IEEE International Conference on Pattern Recognition, pp. 30057–30060 (2002)

    Google Scholar 

  4. Manmatha, R., Han, C., Riseman, E.M.: Word Spotting: A New Approach to Indexing Handwriting. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 631–637 (1996)

    Google Scholar 

  5. Marinai, S., Marino, E., Soda, G.: Indexing and retrieval of words in old documents. In: Proc. of International Conference on Document Analysis and Recognition, pp. 223–227 (2003)

    Google Scholar 

  6. Oka, R.: Spotting Method for Classification of Real World Data. The Computer Journal 41(8), 559–565 (1998)

    Article  MATH  Google Scholar 

  7. Rath, T.M., Manmatha, R.: Features for Word Spotting in Historical Manuscripts. In: Proc. of International Conference on Document Analysis and Recognition, pp. 218–222 (2003)

    Google Scholar 

  8. Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 521–527 (2003)

    Google Scholar 

  9. Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proc. of International Conference on Document Analysis and Recognition, pp. 437–441 (2005)

    Google Scholar 

  10. Turk, M.A., Pentland, A.P.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1), 71–86 (1991)

    Article  Google Scholar 

  11. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 586–591 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. School of Systems Information Science, Future University-Hakodate, 116–2 Kamedanakano-cho, Hakodate-shi, Hokkaido, 041–8655, Japan

    Kengo Terasawa, Takeshi Nagasaki & Toshio Kawashima

Authors
  1. Kengo Terasawa
    View author publications

    Search author on:PubMed Google Scholar

  2. Takeshi Nagasaki
    View author publications

    Search author on:PubMed Google Scholar

  3. Toshio Kawashima
    View author publications

    Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Institute of Computer Science and Applied Mathematics, University of Bern, Neubrückstrasse 10, CH-3012, Bern, Switzerland

    Horst Bunke

  2. DocRec Ltd, 34 Strathaven Place, 7001, Atawhai, Nelson, New Zealand

    A. Lawrence Spitz

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Terasawa, K., Nagasaki, T., Kawashima, T. (2006). Automatic Keyword Extraction from Historical Document Images. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_37

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/11669487_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32140-8

  • Online ISBN: 978-3-540-32157-6

  • eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Query Image
  • Document Image
  • Word Segmentation
  • Matching Cost
  • Query Word

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics

Societies and partnerships

  • The International Association for Pattern Recognition
    The International Association for Pattern Recognition (opens in a new tab)

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Footer Navigation

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover

Corporate Navigation

  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

162.0.217.198

Not affiliated

Springer Nature

© 2026 Springer Nature