{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T18:20:13Z","timestamp":1764872413779,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2015,5,6]],"date-time":"2015-05-06T00:00:00Z","timestamp":1430870400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2015,5,6]]},"abstract":"<jats:p>\n            An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this article, we develop a foundational framework where the central construct is what we call a\n            <jats:italic>document spanner<\/jats:italic>\n            (or just\n            <jats:italic>spanner<\/jats:italic>\n            for short). A spanner maps an input string into a relation over the spans (intervals specified by bounding indices) of the string. The focus of this article is on the representation of spanners. Conceptually, there are two kinds of such representations. Spanners defined in a\n            <jats:italic>primitive<\/jats:italic>\n            representation extract relations directly from the input string; those defined in an\n            <jats:italic>algebra<\/jats:italic>\n            apply algebraic operations to the primitively represented spanners. This framework is driven by SystemT, an IBM commercial product for text analysis, where the primitive representation is that of regular expressions with capture variables.\n          <\/jats:p>\n          <jats:p>\n            We define additional types of primitive spanner representations by means of two kinds of automata that assign spans to variables. We prove that the first kind has the same expressive power as regular expressions with capture variables; the second kind expresses precisely the algebra of the\n            <jats:italic>regular<\/jats:italic>\n            spanners\u2014the closure of the first kind under standard relational operators. The\n            <jats:italic>core<\/jats:italic>\n            spanners extend the regular ones by string-equality selection (an extension used in SystemT). We give some fundamental results on the expressiveness of regular and core spanners. As an example, we prove that regular spanners are closed under difference (and complement), but core spanners are not. Finally, we establish connections with related notions in the literature.\n          <\/jats:p>","DOI":"10.1145\/2699442","type":"journal-article","created":{"date-parts":[[2015,5,11]],"date-time":"2015-05-11T16:30:57Z","timestamp":1431361857000},"page":"1-51","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":64,"title":["Document Spanners"],"prefix":"10.1145","volume":"62","author":[{"given":"Ronald","family":"Fagin","sequence":"first","affiliation":[{"name":"IBM Research -- Almaden, San Jose, CA"}]},{"given":"Benny","family":"Kimelfeld","sequence":"additional","affiliation":[{"name":"IBM Research -- Almaden"}]},{"given":"Frederick","family":"Reiss","sequence":"additional","affiliation":[{"name":"IBM Research -- Almaden, San Jose, CA"}]},{"given":"Stijn","family":"Vansummeren","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Libre de Bruxelles (ULB), Brussels, Belgium"}]}],"member":"320","published-online":{"date-parts":[[2015,5,6]]},"reference":[{"volume-title":"Handbook of Theoretical Computer Science","author":"Aho Alfred V.","key":"e_1_2_1_1_1","unstructured":"Alfred V. Aho . 1990. Algorithms for finding patterns in strings . In Handbook of Theoretical Computer Science , Volume A: Algorithms and Complexity (A). North Holland, 255-- 300 . Alfred V. Aho. 1990. Algorithms for finding patterns in strings. In Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity (A). North Holland, 255--300."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/182.358434"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/1119089.1119095"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/303976.303983"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/LICS.2012.23"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2389241.2389250"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2012.12.036"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/876638.876642"},{"volume-title":"Transductions and Context-Free Languages. Teubner Studienb\u00fccher","author":"Berstel Jean","key":"e_1_2_1_9_1","unstructured":"Jean Berstel . 1979. Transductions and Context-Free Languages. Teubner Studienb\u00fccher , Stuttgart . Jean Berstel. 1979. Transductions and Context-Free Languages. Teubner Studienb\u00fccher, Stuttgart."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1998.1562"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of KR","author":"Calvanese Diego","year":"2000","unstructured":"Diego Calvanese , Giuseppe De Giacomo , Maurizio Lenzerini , and Moshe Y. Vardi . 2000a. Containment of conjunctive regular path queries with inverse . In Proceedings of KR 2000 . 176--185. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2000a. Containment of conjunctive regular path queries with inverse. In Proceedings of KR 2000. 176--185."},{"key":"e_1_2_1_12_1","volume-title":"Maurizio Lenzerini, and Moshe Y. Vardi.","author":"Calvanese Diego","year":"2000","unstructured":"Diego Calvanese , Giuseppe De Giacomo , Maurizio Lenzerini, and Moshe Y. Vardi. 2000 b. View-based query processing and constraint satisfaction. In Proceedings of LICS. 361--371. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2000b. View-based query processing and constraint satisfaction. In Proceedings of LICS. 361--371."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1142\/S012905410300214X"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2009.02.022"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00982-2_24"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 48th Annual\/Meeting of the Association for Computer Linguisties (ACL'10)","author":"Chiticariu Laura","year":"2010","unstructured":"Laura Chiticariu , Rajasekar Krishnamurthy , Yunyao Li , Sriram Raghavan , Frederick Reiss , and Shivakumar Vaithyanathan . 2010 . SystemT: An algebraic approach to declarative information extraction . In Proceedings of the 48th Annual\/Meeting of the Association for Computer Linguisties (ACL'10) . 128--137. Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, and Shivakumar Vaithyanathan. 2010. SystemT: An algebraic approach to declarative information extraction. In Proceedings of the 48th Annual\/Meeting of the Association for Computer Linguisties (ACL'10). 128--137."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/298514.298591"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/38713.38749"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1014348124664"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of DBPL. 21--39","author":"Deutsch Alin","year":"2001","unstructured":"Alin Deutsch and Val Tannen . 2001 . Optimization properties for classes of conjunctive regular path queries . In Proceedings of DBPL. 21--39 . Alin Deutsch and Val Tannen. 2001. Optimization properties for classes of conjunctive regular path queries. In Proceedings of DBPL. 21--39."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.91.0047"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463664.2463665"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594538.2594540"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/275487.275503"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.3115\/980845.980914"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of STACS (LIPIcs).","volume":"9","author":"Freydenberger Dominik D.","year":"2011","unstructured":"Dominik D. Freydenberger . 2011 . Extended regular expressions: Succinctness and decidability . In Proceedings of STACS (LIPIcs). Vol. 9 , Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik, 507--518. Dominik D. Freydenberger. 2011. Extended regular expressions: Succinctness and decidability. In Proceedings of STACS (LIPIcs). Vol. 9, Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik, 507--518."},{"volume-title":"Mastering Regular Expressions","author":"Friedl Jeffrey","key":"e_1_2_1_27_1","unstructured":"Jeffrey Friedl . 2006. Mastering Regular Expressions . O'Reilly Media . Jeffrey Friedl. 2006. Mastering Regular Expressions. O'Reilly Media."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1514"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1999.1633"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.3115\/992628.992709"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13089-2_47"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01692511"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01702865"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1519103.1519105"},{"key":"e_1_2_1_35_1","volume-title":"Pereira","author":"Lafferty John D.","year":"2001","unstructured":"John D. Lafferty , Andrew McCallum , and Fernando C. N . Pereira . 2001 . Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML. Morgan Kaufmann , 282--289. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML. Morgan Kaufmann, 282--289."},{"volume-title":"Information extraction using hidden Markov models. Master's thesis University of California","author":"Leek T. R.","key":"e_1_2_1_36_1","unstructured":"T. R. Leek . 1997. Information extraction using hidden Markov models. Master's thesis University of California , San Diego . T. R. Leek. 1997. Information extraction using hidden Markov models. Master's thesis University of California, San Diego."},{"key":"e_1_2_1_37_1","volume-title":"An Introduction to Formal Languages and Automata","author":"Linz Peter","unstructured":"Peter Linz . 2001. An Introduction to Formal Languages and Automata 3 rd Ed. Jones and Bartlett Publishers, Inc. , Sudbury, M. A. Peter Linz. 2001. An Introduction to Formal Languages and Automata 3rd Ed. Jones and Bartlett Publishers, Inc., Sudbury, M. A.","edition":"3"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920916"},{"key":"e_1_2_1_39_1","volume-title":"Pereira","author":"McCallum Andrew","year":"2000","unstructured":"Andrew McCallum , Dayne Freitag , and Fernando C. N . Pereira . 2000 . Maximum entropy Markov models for information extraction and segmentation. In Proceedings of ICML. Morgan Kaufmann , 591--598. Andrew McCallum, Dayne Freitag, and Fernando C. N. Pereira. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of ICML. Morgan Kaufmann, 591--598."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-3975(01)00301-2"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/505241.505245"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5802\/aif.287"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497502"},{"volume-title":"Automatically constructing a dictionary for information extraction tasks","author":"Riloff Ellen","key":"e_1_2_1_44_1","unstructured":"Ellen Riloff . 1993. Automatically constructing a dictionary for information extraction tasks . In Proceedings of AAAI. AAAI Press\/The MIT Press , 811--816. Ellen Riloff. 1993. Automatically constructing a dictionary for information extraction tasks. In Proceedings of AAAI. AAAI Press\/The MIT Press, 811--816."},{"key":"e_1_2_1_45_1","volume-title":"Lehnert","author":"Soderland Stephen","year":"1995","unstructured":"Stephen Soderland , David Fisher , Jonathan Aseltine , and Wendy G . Lehnert . 1995 . CRYSTAL : Inducing a conceptual dictionary. In Proceedings of IJCAI. Morgan Kaufmann , 1314--1321. Stephen Soderland, David Fisher, Jonathan Aseltine, and Wendy G. Lehnert. 1995. CRYSTAL: Inducing a conceptual dictionary. In Proceedings of IJCAI. Morgan Kaufmann, 1314--1321."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10472-012-9288-8"},{"volume-title":"Handbook of Formal Languages","author":"Sheng Yu.","key":"e_1_2_1_47_1","unstructured":"Sheng Yu. 1997. Regular Languages . In Handbook of Formal Languages , Grzegorz Rozenberg and Arto Salomaa (Eds.), vol. 1 , Springer , Chapter 2. Sheng Yu. 1997. Regular Languages. In Handbook of Formal Languages, Grzegorz Rozenberg and Arto Salomaa (Eds.), vol. 1, Springer, Chapter 2."}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2699442","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2699442","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:16:58Z","timestamp":1750227418000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2699442"}},"subtitle":["A Formal Approach to Information Extraction"],"short-title":[],"issued":{"date-parts":[[2015,5,6]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,5,6]]}},"alternative-id":["10.1145\/2699442"],"URL":"https:\/\/doi.org\/10.1145\/2699442","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"type":"print","value":"0004-5411"},{"type":"electronic","value":"1557-735X"}],"subject":[],"published":{"date-parts":[[2015,5,6]]},"assertion":[{"value":"2013-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-05-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}