Skip to main content
Log in

Automatic recognition of handwritten medical forms for search engines

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

A new paradigm, which models the relationships between handwriting and topic categories, in the context of medical forms, is presented. The ultimate goals are: (1) a robust method which categorizes medical forms into specified categories, and (2) the use of such information for practical applications such as an improved recognition of medical handwriting or retrieval of medical forms as in a search engine. Medical forms have diverse, complex and large lexicons consisting of English, Medical and Pharmacology corpus. Our technique shows that a few recognized characters, returned by handwriting recognition, can be used to construct a linguistic model capable of representing a medical topic category. This allows (1) a reduced lexicon to be constructed, thereby improving handwriting recognition performance, and (2) PCR (Pre-Hospital Care Report) forms to be tagged with a topic category and subsequently searched by information retrieval systems. We present an improvement of over 7% in raw recognition rate and a mean average precision of 0.28 over a set of 1,175 queries on a data set of unconstrained handwritten medical forms filled in emergency environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from €37.37 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bahlmann, C., Haasdonk, B., Burkhardt, H.: On-line handwriting recognition with support vector machines—a kernel approach. International Workshop On Frontiers in Handwriting Recognition (2002)

  2. Balasubramanian, A., Meshesha, M., Jawahar, C.V.: Retrieval from document image collections. In: Proceedings of Seventh IAPR Workshop on Document Analysis Systems, pp. 1–12 (2006)

  3. Bayer T., Kressel U., Mogg-Schneider H., Renz I.: Categorizing paper documents. Comput. Vis. Image Understand. 70(3), 299–306 (1998)

    Article  Google Scholar 

  4. Black, P.E. (ed.): Levenshtein distance. Algorithms and Theory of Computation Handbook; CRC Press LLC, dictionary of Algorithms and Data Structures, NIST (1999)

  5. Blum, J.R., Rosenblatt, J.I.: Probability and statistics. Random Variables and Their Distributions, chap. 4. Expectations, Moment Generating Functions, and Quantiles, chap. 6. W.B. Saunders Company, USA (1972)

  6. Blumenstein, M., Verma, S.: A neural based segmentation and recognition technique for handwritten words. IEEE Int. Conf. Neural Netw. (1998)

  7. Byun H., Lee S.W.: Applications of support vector machines for pattern recognition: a survey. Lecture Notes in Computer Science. Springer, Berlin (2002)

    Google Scholar 

  8. Caesar, T., Gloger, J.M., Mandler, E.: Using lexical knowledge for the recognition of poorly written words. In: Third International Conference on Document Analysis and Recognition, vol. 2, pp. 915–918 (1995)

  9. Chu-Carroll, J., Carpenter, B.: Dialogue management in vector-based call routing. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, pp. 256–262 (1999)

  10. Chu-Carroll J., Carpenter B.: Vector-based natural language call routing. Comput. Linguist. 25(3), 361–388 (1999)

    Google Scholar 

  11. Chen M.Y., Jundu A., Zhou J.: Off-line handwritten word recognition using a hidden markov model type stochastic network. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 481–496 (1994)

    Article  Google Scholar 

  12. Cho, S.B., Kim, J.H.: Applications of neural networks to character recognition. Pattern Recognit. (1991)

  13. Cho S.B.: Neural-network classifiers for recognizing totally unconstrained handwritten numerals. IEEE Trans. Neural Netw. 8(1), 43–53 (1997)

    Article  Google Scholar 

  14. Croft, B., Harding, S.M., Taghva, K., Borsack, J.: An evaluation of information retrieval accuracy with simulated OCR output. In: Proceedings of Symposium on Document Analysis and Information Retrieval, pp. 115–126 (1994)

  15. Deerwester S., Dumais S.T., Furnas G.Q., Landauer T.K., Harshman R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  16. Doermann D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Understand. 70(3), 287–298 (1998)

    Article  Google Scholar 

  17. Edwards, J., Forsyth, D.: Searching for character models. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 331–338 (2005)

  18. Edwards, J., Teh, Y.W., Forsyth, D., Bock, R., Maire, M., Vesom, G.: Making Latin manuscripts searchable using (gHMM)’s. In: Proceedings of the 18th Annual Conference on Neural Information Processing Systems, pp. 385–392 (2004)

  19. Fagan J.: The effectiveness of a non-syntactic approach to automatic phrase indexing for document retrieval. J. Am. Soc. Inf. Sci. 40, 115–132 (1989)

    Article  Google Scholar 

  20. Favata J.T.: Offline general handwritten word recognition using an approximate BEAM matching algorithm. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 23(9), 1009–1021 (2001)

    Article  Google Scholar 

  21. Feng, S.L., Manmatha, R.: Classification models for historic manuscript recognition. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR) (2005)

  22. Gader P.D., Keller J.M., Krishnapuram R., Chiang J.H., Mohamed M.A.: Neural and fuzzy methods in handwriting recognition. Computer 30(2), 79–86 (1997)

    Article  Google Scholar 

  23. Goldman, R. Shivakumar, N., Venkatasubramanian, S., Garcia-Molina, H.: Proximity search in databases. IEEE Proc. Int. Conf. Very Large Databases, pp. 26–37 (1998)

  24. Golub G.B., Van Loan C.E.: Matrix Computations, 2nd edn. John Hopkins University Press, Baltimore (1989)

    MATH  Google Scholar 

  25. Govindaraju, V., Xue, H.: Fast handwriting recognition for indexing historical documents. In: First International Workshop on Document Image Analysis for Libraries (DIAL) (2004)

  26. Govindaraju V., Slavik P., Xue H.: Use of lexicon density in evaluating word recognizers. IEEE Trans. PAMI 24(6), 789–800 (2002)

    Google Scholar 

  27. Guillevic, D., Nishiwaki, D., Yamada, K.: Word lexicon reduction by character spotting. In: Seventh International Workshop on Frontiers in Handwriting Recognition, Amsterdam (2000)

  28. Harding, S.M., Croft, W.B., Weir, C.: Probabilistic retrieval of OCR degraded textt using n-grams. In: Research and Advanced Technology for Digital Libraries, pp. 345–359 (1997)

  29. Hersh W.R.: Information Retrieval: A Health and Biomedical Perspective, 2nd edn. Springer-Verlag, New York, Inc. USA (2003)

    Google Scholar 

  30. Howe, N.R., Rath, T. M., Manmatha, R.: Boosted decision trees for word recognition in handwritten document retrieval. In: Proceedings of the 28th Annual Int’l ACM SIGIR Conference, pp. 377–383 (2006)

  31. Hu J., Brown M.K., Turin W.: HMM based online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 18(10), 1039–1045 (1996)

    Article  Google Scholar 

  32. Jones K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28(1), 11–20 (1972)

    Article  Google Scholar 

  33. Jones, K.S., Willet, P.: Readings in Information Retrieval. Morgan Kaufmann, San Francisco (1997)

  34. Kaufmann, G., Bunke, H., Madom, M.: Lexicon reduction in an HMM-Framework based on quantized feature vectors. In: Proceedings of the 4th International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 1097–1101 (1997)

  35. Kim G., Govindaraju V.: Bank check recognition using cross validation between legal and courtesy amounts. IJPRAI 11(4), 657–674 (1997)

    Google Scholar 

  36. Kim G., Govindaraju V.: A lexicon driven approach to handwritten word recognition for real-time applications. IEEE Trans. PAMI 19(4), 366–379 (1997)

    Google Scholar 

  37. Koerich, A.L., Sabourin, R., Suen, C.Y.: Fast two-level HMM decoding algorithm for large vocabulary handwriting recognition. Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR-9), pp. 232–237 (2004)

  38. Koerich A.L., Sabourin R., Suen C.Y.: Large vocabulary off-line handwriting recognition: a survey. Pattern Anal. Appl. 6, 97–121 (2003)

    Article  MathSciNet  Google Scholar 

  39. Larson, R.E., Hostetler, R.P., Edwards, B.H.: Calculus with Analytic Geometry, chap. 13, sect. 13.9, 5th edn. D.C. Heath and Company, USA (1994)

  40. Lopresti, D., Zhou, J.: Retrieval strategies for noisy text. In: Proceedings of Symposium on Document Analysis and Information Retrieval, pp. 255–270 (1996)

  41. Luhn H.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1, 309–317 (1957)

    Article  MathSciNet  Google Scholar 

  42. Madhvanath, S.: The holistic paradigm in handwritten word recognition and its application to large and dynamic lexicon scenarios. Ph.D. Dissertation, University at Buffalo Computer Science and Engineering (1997)

  43. Madvanath, S., Krpasundar, V., Govindaraju, V.: Syntactic methodology of pruning large lexicons in cursive script recognition. J. Pattern Recognit. Soc. Pattern Recognition, vol. 34. Elsevier Science, Amsterdam (2001)

  44. Marti, U.V., Bunke, H.: Using a Statistical Language Model to Improve the Performance of an HMM-based Cursive Handwriting Recognition Systems. World Scientific Series in Machine Perception and Artificial Intelligence Series (2001)

  45. Milewski, R., Govindaraju, V.: Medical word recognition using a computational semantic lexicon. In: Eighth International Workshop on Frontiers in Handwriting Recognition, Canada (2002)

  46. Milewski, R., Govindaraju, V.: Handwriting analysis of pre-hospital care reports. In: IEEE Proceedings of the Seventeenth IEEE Symposium on Computer-Based Medical Systems (CBMS) (2004)

  47. Milewski R., Govindaraju V.: Extraction of handwritten text from carbon copy medical forms. Document Analysis Systems (DAS). Springer, Berlin (2006)

    Google Scholar 

  48. Nakai, M., Akira, N., Shimodaira, H., Sagayama, S.: Substroke approach to HMM-based on-line kanji handwriting recognition. In: Sixth International Conference on Document Analysis and Recognition (2001)

  49. National Library of Medicine. PubMed Stop List

  50. Oh I.-S., Suen C.Y.: Distance features for neural network-based recognition of handwritten characters. Int. J. Doc. Anal. Recognit. (IJDAR) 1(2), 73–88 (2004)

    Article  Google Scholar 

  51. Okuda, T. Tanaka, E. Kasai, T.: A method for the correction of garbled words based on the levenshtein distance. IEEE Trans. Comput., Col. C-25, No. 2 (1976)

  52. Porter M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)

    Google Scholar 

  53. Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of IEEE International Conference on Document Analysis and Recognition, pp. 218–222 (2003)

  54. Rath T.M., Manmatha R.: Word spotting for historical documents. IJDAR 9(2), 139–152 (2007)

    Article  Google Scholar 

  55. Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol.2, pp. 521–527, Madison, WI (2003)

  56. Rath, T.M., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: ACM SIGR, pp. 369–376 (2004)

  57. Rijsbergen, C.J. van, Robertson, S.E., Porter, M.F.: New models in probabilistic information retrieval. British Library, London (1980)

  58. Russell, G., Perrone, M.P., Chee, Y.M.: Handwritten document retrieval. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition, pp. 233–238 (2002)

  59. Salton G.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  60. Sinha R.M.K., Prasada B.: Visual text recognition through contextual processing. Pattern Recognit. 21(5), 463–479 (1988)

    Article  Google Scholar 

  61. Srihari S.N., Hull J.J., Choudhari R.: Integrating diverse knowledge sources in text recognition. ACM Trans. Office Inf. Syst. 1(1), 68–87 (1983)

    Article  Google Scholar 

  62. Suen C.Y.: N-gram statistics for natural language understanding and processing. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 164–172 (1979)

    Article  Google Scholar 

  63. Taghva, K., Narkter, T., Borsack, J., Lumos, S., Condit, A., Young, R.: Evaluating text categorization in the presence of OCR errors. In: Proceedings of IS&T SPIE 2001 International Symposium on Electronic Imaging Science and Technology, pp. 68–74 (2001)

  64. Tan C.L., Huang W., Yu Z., Xu Y.: Imaged document text retrieval without OCR. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 838–844 (2002)

    Article  Google Scholar 

  65. Vinciarelli A.: Application of information retrieval techniques to single writer documents. Pattern Recognit. Lett. 26(14–15), 2262–2271 (2005)

    Article  Google Scholar 

  66. Vinciarelli A.: Noisy text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1882–1295 (2005)

    Article  Google Scholar 

  67. Western Regional Emergency Medical Services. Bureau of Emergency Medical Services. New York State (NYS) Department of Health (DoH). Prehospital Care Report v4

  68. Xue, H., Govindaraju, V.: Stochastic models combining discrete symbols and continuous attributes—application in handwriting recognition. In: Proceedings of 5th IAPR International Workshop on Document Analysis Systems, pp. 70–81 (2002)

  69. Xue H., Govindaraju V.: On the dependence of handwritten word recognizers on lexicons. IEEE Trans. PAMI 24(12), 1553–1564 (2002)

    Google Scholar 

  70. Yates B.R., Ribeiro-Neto B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  71. Zimmermann M., Mao J.: Lexicon reduction using key characters in cursive handwritten words. Pattern Recognit. Lett. 20, 1297–1304 (1999)

    Article  Google Scholar 

  72. Zobel J., Dart P.: FInding approximate matches in large lexicons. Softw. Pract. Experience 25(3), 331–345 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Jay Milewski.

Additional information

This work was supported by the National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milewski, R.J., Govindaraju, V. & Bhardwaj, A. Automatic recognition of handwritten medical forms for search engines. IJDAR 11, 203–218 (2009). https://doi.org/10.1007/s10032-008-0077-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1007/s10032-008-0077-1

Keywords