Abstract
A new paradigm, which models the relationships between handwriting and topic categories, in the context of medical forms, is presented. The ultimate goals are: (1) a robust method which categorizes medical forms into specified categories, and (2) the use of such information for practical applications such as an improved recognition of medical handwriting or retrieval of medical forms as in a search engine. Medical forms have diverse, complex and large lexicons consisting of English, Medical and Pharmacology corpus. Our technique shows that a few recognized characters, returned by handwriting recognition, can be used to construct a linguistic model capable of representing a medical topic category. This allows (1) a reduced lexicon to be constructed, thereby improving handwriting recognition performance, and (2) PCR (Pre-Hospital Care Report) forms to be tagged with a topic category and subsequently searched by information retrieval systems. We present an improvement of over 7% in raw recognition rate and a mean average precision of 0.28 over a set of 1,175 queries on a data set of unconstrained handwritten medical forms filled in emergency environments.
Similar content being viewed by others
References
Bahlmann, C., Haasdonk, B., Burkhardt, H.: On-line handwriting recognition with support vector machines—a kernel approach. International Workshop On Frontiers in Handwriting Recognition (2002)
Balasubramanian, A., Meshesha, M., Jawahar, C.V.: Retrieval from document image collections. In: Proceedings of Seventh IAPR Workshop on Document Analysis Systems, pp. 1–12 (2006)
Bayer T., Kressel U., Mogg-Schneider H., Renz I.: Categorizing paper documents. Comput. Vis. Image Understand. 70(3), 299–306 (1998)
Black, P.E. (ed.): Levenshtein distance. Algorithms and Theory of Computation Handbook; CRC Press LLC, dictionary of Algorithms and Data Structures, NIST (1999)
Blum, J.R., Rosenblatt, J.I.: Probability and statistics. Random Variables and Their Distributions, chap. 4. Expectations, Moment Generating Functions, and Quantiles, chap. 6. W.B. Saunders Company, USA (1972)
Blumenstein, M., Verma, S.: A neural based segmentation and recognition technique for handwritten words. IEEE Int. Conf. Neural Netw. (1998)
Byun H., Lee S.W.: Applications of support vector machines for pattern recognition: a survey. Lecture Notes in Computer Science. Springer, Berlin (2002)
Caesar, T., Gloger, J.M., Mandler, E.: Using lexical knowledge for the recognition of poorly written words. In: Third International Conference on Document Analysis and Recognition, vol. 2, pp. 915–918 (1995)
Chu-Carroll, J., Carpenter, B.: Dialogue management in vector-based call routing. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, pp. 256–262 (1999)
Chu-Carroll J., Carpenter B.: Vector-based natural language call routing. Comput. Linguist. 25(3), 361–388 (1999)
Chen M.Y., Jundu A., Zhou J.: Off-line handwritten word recognition using a hidden markov model type stochastic network. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 481–496 (1994)
Cho, S.B., Kim, J.H.: Applications of neural networks to character recognition. Pattern Recognit. (1991)
Cho S.B.: Neural-network classifiers for recognizing totally unconstrained handwritten numerals. IEEE Trans. Neural Netw. 8(1), 43–53 (1997)
Croft, B., Harding, S.M., Taghva, K., Borsack, J.: An evaluation of information retrieval accuracy with simulated OCR output. In: Proceedings of Symposium on Document Analysis and Information Retrieval, pp. 115–126 (1994)
Deerwester S., Dumais S.T., Furnas G.Q., Landauer T.K., Harshman R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Doermann D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Understand. 70(3), 287–298 (1998)
Edwards, J., Forsyth, D.: Searching for character models. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 331–338 (2005)
Edwards, J., Teh, Y.W., Forsyth, D., Bock, R., Maire, M., Vesom, G.: Making Latin manuscripts searchable using (gHMM)’s. In: Proceedings of the 18th Annual Conference on Neural Information Processing Systems, pp. 385–392 (2004)
Fagan J.: The effectiveness of a non-syntactic approach to automatic phrase indexing for document retrieval. J. Am. Soc. Inf. Sci. 40, 115–132 (1989)
Favata J.T.: Offline general handwritten word recognition using an approximate BEAM matching algorithm. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 23(9), 1009–1021 (2001)
Feng, S.L., Manmatha, R.: Classification models for historic manuscript recognition. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR) (2005)
Gader P.D., Keller J.M., Krishnapuram R., Chiang J.H., Mohamed M.A.: Neural and fuzzy methods in handwriting recognition. Computer 30(2), 79–86 (1997)
Goldman, R. Shivakumar, N., Venkatasubramanian, S., Garcia-Molina, H.: Proximity search in databases. IEEE Proc. Int. Conf. Very Large Databases, pp. 26–37 (1998)
Golub G.B., Van Loan C.E.: Matrix Computations, 2nd edn. John Hopkins University Press, Baltimore (1989)
Govindaraju, V., Xue, H.: Fast handwriting recognition for indexing historical documents. In: First International Workshop on Document Image Analysis for Libraries (DIAL) (2004)
Govindaraju V., Slavik P., Xue H.: Use of lexicon density in evaluating word recognizers. IEEE Trans. PAMI 24(6), 789–800 (2002)
Guillevic, D., Nishiwaki, D., Yamada, K.: Word lexicon reduction by character spotting. In: Seventh International Workshop on Frontiers in Handwriting Recognition, Amsterdam (2000)
Harding, S.M., Croft, W.B., Weir, C.: Probabilistic retrieval of OCR degraded textt using n-grams. In: Research and Advanced Technology for Digital Libraries, pp. 345–359 (1997)
Hersh W.R.: Information Retrieval: A Health and Biomedical Perspective, 2nd edn. Springer-Verlag, New York, Inc. USA (2003)
Howe, N.R., Rath, T. M., Manmatha, R.: Boosted decision trees for word recognition in handwritten document retrieval. In: Proceedings of the 28th Annual Int’l ACM SIGIR Conference, pp. 377–383 (2006)
Hu J., Brown M.K., Turin W.: HMM based online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 18(10), 1039–1045 (1996)
Jones K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28(1), 11–20 (1972)
Jones, K.S., Willet, P.: Readings in Information Retrieval. Morgan Kaufmann, San Francisco (1997)
Kaufmann, G., Bunke, H., Madom, M.: Lexicon reduction in an HMM-Framework based on quantized feature vectors. In: Proceedings of the 4th International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 1097–1101 (1997)
Kim G., Govindaraju V.: Bank check recognition using cross validation between legal and courtesy amounts. IJPRAI 11(4), 657–674 (1997)
Kim G., Govindaraju V.: A lexicon driven approach to handwritten word recognition for real-time applications. IEEE Trans. PAMI 19(4), 366–379 (1997)
Koerich, A.L., Sabourin, R., Suen, C.Y.: Fast two-level HMM decoding algorithm for large vocabulary handwriting recognition. Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR-9), pp. 232–237 (2004)
Koerich A.L., Sabourin R., Suen C.Y.: Large vocabulary off-line handwriting recognition: a survey. Pattern Anal. Appl. 6, 97–121 (2003)
Larson, R.E., Hostetler, R.P., Edwards, B.H.: Calculus with Analytic Geometry, chap. 13, sect. 13.9, 5th edn. D.C. Heath and Company, USA (1994)
Lopresti, D., Zhou, J.: Retrieval strategies for noisy text. In: Proceedings of Symposium on Document Analysis and Information Retrieval, pp. 255–270 (1996)
Luhn H.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1, 309–317 (1957)
Madhvanath, S.: The holistic paradigm in handwritten word recognition and its application to large and dynamic lexicon scenarios. Ph.D. Dissertation, University at Buffalo Computer Science and Engineering (1997)
Madvanath, S., Krpasundar, V., Govindaraju, V.: Syntactic methodology of pruning large lexicons in cursive script recognition. J. Pattern Recognit. Soc. Pattern Recognition, vol. 34. Elsevier Science, Amsterdam (2001)
Marti, U.V., Bunke, H.: Using a Statistical Language Model to Improve the Performance of an HMM-based Cursive Handwriting Recognition Systems. World Scientific Series in Machine Perception and Artificial Intelligence Series (2001)
Milewski, R., Govindaraju, V.: Medical word recognition using a computational semantic lexicon. In: Eighth International Workshop on Frontiers in Handwriting Recognition, Canada (2002)
Milewski, R., Govindaraju, V.: Handwriting analysis of pre-hospital care reports. In: IEEE Proceedings of the Seventeenth IEEE Symposium on Computer-Based Medical Systems (CBMS) (2004)
Milewski R., Govindaraju V.: Extraction of handwritten text from carbon copy medical forms. Document Analysis Systems (DAS). Springer, Berlin (2006)
Nakai, M., Akira, N., Shimodaira, H., Sagayama, S.: Substroke approach to HMM-based on-line kanji handwriting recognition. In: Sixth International Conference on Document Analysis and Recognition (2001)
National Library of Medicine. PubMed Stop List
Oh I.-S., Suen C.Y.: Distance features for neural network-based recognition of handwritten characters. Int. J. Doc. Anal. Recognit. (IJDAR) 1(2), 73–88 (2004)
Okuda, T. Tanaka, E. Kasai, T.: A method for the correction of garbled words based on the levenshtein distance. IEEE Trans. Comput., Col. C-25, No. 2 (1976)
Porter M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of IEEE International Conference on Document Analysis and Recognition, pp. 218–222 (2003)
Rath T.M., Manmatha R.: Word spotting for historical documents. IJDAR 9(2), 139–152 (2007)
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol.2, pp. 521–527, Madison, WI (2003)
Rath, T.M., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: ACM SIGR, pp. 369–376 (2004)
Rijsbergen, C.J. van, Robertson, S.E., Porter, M.F.: New models in probabilistic information retrieval. British Library, London (1980)
Russell, G., Perrone, M.P., Chee, Y.M.: Handwritten document retrieval. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition, pp. 233–238 (2002)
Salton G.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Sinha R.M.K., Prasada B.: Visual text recognition through contextual processing. Pattern Recognit. 21(5), 463–479 (1988)
Srihari S.N., Hull J.J., Choudhari R.: Integrating diverse knowledge sources in text recognition. ACM Trans. Office Inf. Syst. 1(1), 68–87 (1983)
Suen C.Y.: N-gram statistics for natural language understanding and processing. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 164–172 (1979)
Taghva, K., Narkter, T., Borsack, J., Lumos, S., Condit, A., Young, R.: Evaluating text categorization in the presence of OCR errors. In: Proceedings of IS&T SPIE 2001 International Symposium on Electronic Imaging Science and Technology, pp. 68–74 (2001)
Tan C.L., Huang W., Yu Z., Xu Y.: Imaged document text retrieval without OCR. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 838–844 (2002)
Vinciarelli A.: Application of information retrieval techniques to single writer documents. Pattern Recognit. Lett. 26(14–15), 2262–2271 (2005)
Vinciarelli A.: Noisy text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1882–1295 (2005)
Western Regional Emergency Medical Services. Bureau of Emergency Medical Services. New York State (NYS) Department of Health (DoH). Prehospital Care Report v4
Xue, H., Govindaraju, V.: Stochastic models combining discrete symbols and continuous attributes—application in handwriting recognition. In: Proceedings of 5th IAPR International Workshop on Document Analysis Systems, pp. 70–81 (2002)
Xue H., Govindaraju V.: On the dependence of handwritten word recognizers on lexicons. IEEE Trans. PAMI 24(12), 1553–1564 (2002)
Yates B.R., Ribeiro-Neto B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Zimmermann M., Mao J.: Lexicon reduction using key characters in cursive handwritten words. Pattern Recognit. Lett. 20, 1297–1304 (1999)
Zobel J., Dart P.: FInding approximate matches in large lexicons. Softw. Pract. Experience 25(3), 331–345 (1995)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Science Foundation.
Rights and permissions
About this article
Cite this article
Milewski, R.J., Govindaraju, V. & Bhardwaj, A. Automatic recognition of handwritten medical forms for search engines. IJDAR 11, 203–218 (2009). https://doi.org/10.1007/s10032-008-0077-1
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1007/s10032-008-0077-1

