OpenMethods

FactGrid – a database for historians

Author on Source — Fri, 08 Mar 2024 13:51:46 +0000

“Introduction by OpenMethods guest editors Lisa Eggert, Kevin Kuck, Melanie Seltmann (DHd2024, Passau)”

FactGrid is both a database as well as a wiki. This project operated by the Gotha Research Centre and the data lab of the University of Erfurt. It utilizes MediaWiki and a Wikidata’s “wikibase” extension to collect data from historic research. With FactGrid you can create a knowledge graph, giving information in triple statements. This knowledge graph can be asked with SPARQL. All data provided by FactGrid holds a CC0-license.

Advantages of FactGrid:

Incredibly flexible using different languages and existing data.
Enables users to use their language while other users can translate it into theirs. Accommodates collaborating in teams.
Provides a range of tools such as network analysis, map representations, complex linked searches as well as timeline representations.
Longevity of data due to active community, as FactGrid is part of the German National Research Data Infrastructure NFDI4Memory.

One of the project’s aims is to bring the platform into the upcoming consort of federated Wikibase instances as a resource for research data via a joint venture with Wikimedia Germany and the German National Library’s GND.

You can download any search in various data formats with the aim to explore FactGrid data in other software environments or visualise searches with various tools on our site.

Source: FactGrid

Linked Data from TEI (LIFT): A Teaching Tool for TEI to Linked Data Transformation

Author on Source — Mon, 04 Mar 2024 16:34:17 +0000

“Introduction by OpenMethods guest editors Cristian Santini and Sebastian Still (DHd2024, Passau)”

TEI editions are among the most used tool by scholarly editors to produce digital editions in various literary fields. LIFT is a Python-based tool that allows to programmatically extract information from digital texts annotated in TEI by modelling persons, places, events and relations annotated in the form of a Knowledge Graph which reuses ontologies and controlled vocabularies from the Digital Humanities domain.

Retrieving such a collection of interconnected entities, would be of great value with respect to re-use the information hidden in any well-structured XML-Edition and especially when connecting a digital edition with norm data or other digital projects from similar domains. Not only from a humanistic point of view, but particularly from a technical perspective the possibilities to build on top of such data are increased immensely.

While the standard maintained by TEI is long-established in the DH domain, there is a gap between the work which was carried in the scholarly editing domain and that related to Linked Open Data and Knowledge Graphs. The main challenge in that respect is the transition from a document-centric approach (TEI) to a data-centric approach (RDF).

The LIFT package, available on Github and published with thorough documentation, provides a series of scripts based on libraries such as lxml and RDFlib, that parse an XML document and convert it to a RDF Graph. In order to understand the functionality of this software, the authors provide a Jupyter Notebook organized step-by-step. The authors emphasize the accessibility of this tool also due to the fact that it was developed as a teaching tool for the degree in Digital Humanities and Digital Knowledge at the University of Bologna: in this program, students have to master both the principles of scholarly editing and those of LOD and Semantic Web.

In conclusion, this work realized by the DHarc at the University of Bologna is an initial step that paves the way for the integration of practices in scholarly editing and LOD, in order to envision new technologies and methodologies in the DH domain that provide high interoperability, machine-readability and explorability by leveraging the LOD cloud.

With LIFT, we […] aim to encourage further research into the development of open-source, user-friendly tools aiding the mutual integration of digital scholarly editions and the cultural heritage linked open data cloud. Such tools have the potential to make digital humanities, and especially knowledge representation, a more inclusive field of study and research.

Linked to Research Article: https://www.digitalhumanities.org/dhq/vol/16/2/000605/000605.html

Source: LIFT

“Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach”

Marinella Testori — Mon, 08 Jan 2024 12:31:02 +0000

Every scholar in digital hchicken road gameumanities and/or social sciences has probably already faced the challenge posed by consulting large digital newspaper archives in order to extract detailed information about a topic. It is beyond any doubt that computational-oriented methods and tools currently available may provide a great contribution; however, applying such methods and tools could pose several difficulties, especially in dealing with large ensembles of items.

In his contribution, Joshua W. Black illustrates an innovative approach overcoming the flaws of the traditichocospinonal keyword-based technique as well as those of text mining, especially where data are hard to detect and process due to chicken road casinoOCR issues.

Developed on a portion of the ‘Papers Past’ newspaper archive held in the National Library of New Zealand, the method described by Black entails a multiple-step path beginninchicken road casinog with a preprocessing stage, carried out on a META/ALTO XML format dataset, and followed by a corpus exploration and, finally, by a labelling stage. Through the iteration of such a loop, as demonstrated in the article, it is possible to reach an increasingly refined level of pinpointing and selection of relevant items for every potential research topic.

“In the case study, three iterations of the methods were sufficient to generate a specialized corpus of philosophical writing in early colonial New Zealand newduospin appspapers. After three iterations, the method achieved a balance of both selectiveness and accuracy […]. This project enables both model construction and model criticism” (p. 792).

Already well-known in statistics, as well as in computational linguistics, physics and other fields, the concept of bootstrapping is highlighted by Susan Carey as a “metaphor” (p.59) applied by many to the learning process, with pchicken road casinoarticular regard to languages and counting, and she argues that “in thinking about how bootstrapping might work, we are led to a fuller appreciation of the role of language in supporting the cultural transmission of knowledge” (p. 68).

References

Black, Joshua Wilson. “Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach”, Digital Scholarship in the Humanities, Volume 38, Issue 2, June 2023, Pages 779–797, https://doi.org/10.1093/llc/fqac079

Carey, Susan. “Bootstrapping & the Origin of Concepts.” Daedalus 133, no. 1 (2004): 59–68. http://www.jstor.org/stable/2002789.

Propyläen: Goethe’s Biographica – between print replacement and “treasure of data”

Author on Source — Fri, 15 Dec 2023 16:17:02 +0000

The research platform “Propyläen. Goethes Biographica” (https://goethe-biographica.de/) brings together four previously independent edition projects as part of an academy project (Akademienvorhaben) of the Klassik Stiftung Weimar, the Saxon Academy of Sciences and Humanities in Leipzig, the Academy of Sciences and Literature in Mainz and the Freies Deutsches Hochstift in Frankfurt am Main. It contains Goethe’s Letters, Letters to Goethe, Goethe’s Diaries and Encounters and Conversations. The presentation by Dr. Christian Thomas provides insights into the ongoing process of the digital transformation of this long-standing academy project with a focus on the necessary paradigm shift away from a print-orientation and towards a focus on research data as the central result of editorial efforts. It illustrates several challenges of the proces of digitally transforming editions and publication processes such as the curation and continuous maintainance a large amount of ‘legacy’ data from preliminary print stages, as well as the provision of an equally large amount of ‘born digital’ data with a naturally higher information density based on international standards and best practices in an efficient, connectable and sustainable manner. It discusses the challenges on a practical as well as a theoretical level.

Die Forschungsplattform “Propyläen. Goethes Biographica” (https://goethe-biographica.de/) vereint vier bisher unabhängige Editionsprojekte im Rahmen eines Akademienprojekts der Klassik Stiftung Weimar, der Sächsischen Akademie der Wissenschaften zu Leipzig, der Akademie der Wissenschaften und der Literatur in Mainz und des Freien Deutschen Hochstifts in Frankfurt am Main. Sie enthält Goethes Briefe, Briefe an Goethe, Goethes Tagebücher und Begegnungen und Gespräche. Der Vortrag von Dr. Christian Thomas gibt Einblicke in den laufenden Prozess der digitalen Transformation dieses traditionsreichen Akademieprojekts mit dem Fokus auf den notwendigen Paradigmenwechsel weg von einer Print-Orientierung hin zu einer Fokussierung auf Forschungsdaten als zentrales Ergebnis der editorischen Bemühungen. Er veranschaulicht verschiedene Herausforderungen des Prozesses der digitalen Transformation von Editionen und Publikationsprozessen, wie die Kuratierung und kontinuierliche Pflege einer großen Menge von “Legacy”-Daten aus den Vorstufen des Drucks sowie die Bereitstellung einer ebenso großen Menge von “born digital”-Daten mit einer naturgemäß höheren Informationsdichte auf der Grundlage internationaler Standards und Best Practices auf effiziente, vernetzbare und nachhaltige Weise. Diese Herausforderungen werden sowohl auf praktischer als auch auf theoretischer Ebene erörtert.

Source: https://www.youtube.com/watch?v=3XKDWyXm9t4

Spanish Paleography Digital Teaching and Learning Tool

Ulrike Wuttke — Tue, 26 Sep 2023 06:53:20 +0000

Introduction by open methods guest editors (DH2023, Graz) Josie Bready, Ahac Meden, Tine Reeh

Paleography is always a difficult skill to learn, especially when trying to teach oneself. It requires a mastery of the target language, but also knowledge of handwritten scripts, which vary greatly. Without a visual guide, it can be hard to identify words or even individual letters.

The Spanish Paleography (http://spanishpaleographytool.org) tool helps to bridge this gap for those interested in learning paleography of the early modern Spanish period, covering the late 15th to the 18th centuries. The tool is intended to allow users to learn how to decipher and read handwriting from documents of this era. Full transcriptions of the documents can be viewed in a facing-page format, or users can highlight individual words. This tool could be used as a teaching tool to introduce students to paleography.

The available documents are archival sources from La Española (modern day Dominican Republic). Some of these scripts were found across the Spanish world, allowing scholars of other regions to utilize this tool. Alphabets of the scripts can be downloaded for use with documents in the same handwriting styles.

The tool was developed by a team at City University of New York and first launched in March 2013. However, it seems that this project is not being updated which points to a broader DH problem, rooted in funding. The tool exists as an archival website, on which the interface provides a user-friendly and intuitive didactic approach of learning about Spanish from a specific time period and geographical space, but many of the possible explanatory categories of texts remain empty. As such it gives an impression that work on this has stopped, not because it has been completed (as for instance detailed translations of canonical texts in Slovenian https://nl.ijs.si/e-zrc/bs/index-en.html), but rather because of a lack of continuing institutional funding. Similar unfinished archival DH projects can be found abound, and one only has to look at projects on the EADH website.

Working with the Spanish Paleography Tool (http://spanishpaleographytool.org/)

The Spanish Paleography Digital Teaching and Learning Tool is an online interactive resource to assist users in the learning of the deciphering and reading of manuscripts written in Spanish during the early modern period, roughly from the late 15th to the 18th century.
Source http://spanishpaleographytool.org/

Original content: http://spanishpaleographytool.org/

Mediate: A Collaborative Time-Based Media Annotation Tool for the Web

Ulrike Wuttke — Tue, 26 Sep 2023 06:52:36 +0000

Introduction by open methods guest editors (DH2023, Graz) Joanne Bernardi; Vera Burrows; Dylan Palmer

Mediate is a collaborative time-based media annotation tool for the web that can be used both individually and collaboratively for synchronous and asynchronous digital annotation. One of its highlighting features is accessibility and customization, i.e. the ability to customize the schema that forms the basis of the analysis or the purpose of the project. Mediate can also be used for any kind of audio or visual media. For example, users can upload for analysis film and television clips, video games, music videos, social media, music, podcasts, and multi-modal content. After uploading the content, users can generate automated markers to annotate the content on the basis of customizable schema, produce real-time notes, and export their data to generate visualizations. The tool is being used for research and teaching purposes at the University of Rochester and Bowdoin College, but it is still in development. You can sign up for updates on its release and more sample projects at the Mediate website.

From film and television to video games, music videos, social media, music, and podcasts, multimodal content is ubiquitous in our everyday lives. Yet education still focuses primarily on text-based literacies. Mediate, a web-based platform that allows users to annotate multimedia content, tackles this problem by providing a means for individual or collective inquiry into time-based media. Users can upload video or audio, generate automated markers, annotate their content on the basis of customizable schema, produce real-time notes, and export their data to generate visualizations.
Source: https://www.library.rochester.edu/about/digital-scholarship/projects/mediate

Mediate interface highlighting the schema button (highlighted in red) and annotation (Picture provided by guest editors of this introduction)

Link to original resource: https://www.library.rochester.edu/about/digital-scholarship/projects/mediate

An Engaging Environment for Ancient Chinese Texts: An Introduction to ctext.org

Ulrike Wuttke — Tue, 26 Sep 2023 06:49:43 +0000

Introduction by open methods guest editors (DH2023, Graz) Yaming Fu, Radu Tulai, Viktor J. Illmer

The Chinese Text Project is a well-established resource in Sinology, providing open access to a large number of ancient Chinese texts. As a digital medium, it utilizes crowdsourcing, linked data, knowledge graph and other computational technologies to provide an interactive interface for users who are interested in ancient Chinese texts. Beyond its main aim of providing open access to Chinese literature and philosophy texts, the project features an integrated Chinese character dictionary tool, images of scanned source texts, a search function for parallel passages, and much more. In terms of structured data, the project’s data wiki contains a wealth of records on entities such as persons, locations, and works.

The strength of the tool lies in providing a friendly environment for interacting with ancient Chinese texts and a comprehensive filtering system based on diverse categories.

“中國哲學書電子化計劃是一個線上開放電子圖書館，為中外學者提供中國歷代傳世文獻，力圖超越印刷媒體限制，通過電子科技探索新方式與古代文獻進行溝通。收藏的文本已超過三萬部著作，並有五十億字之多，故為歷代中文文獻資料庫最大者。” Source: https://ctext.org/zh

“The Chinese Philosophical Books Electronic Project is an online open access e-library that provides Chinese and foreign scholars with access to Chinese handed down texts through the ages and seeks to transcend the limitations of the print media by exploring new ways of communicating with ancient texts through electronic technology. The collection of texts has exceeded 30,000 works and contains as many as 5 billion words, making it the largest database of Chinese literature through the ages.” Chinese translated into English with DeepL

“The goal of this site is to present accurate and accessible copies of ancient (in particular pre-Qin and Han dynasty) Chinese texts in an organized and searchable format, and to make the best possible use of modern technology to aid in the study and research of these texts, so making them accessible to the widest possible audience.” Source: https://ctext.org/introduction

The summary data used to create this visualization can be downloaded as a GraphViz (.gv) file suitable for use with various network visualization tools including Gephi.
Credits
The parallel passage function required a considerable amount of time and effort to create – please appropriately acknowledge any use of this data in your research. The techniques used to create this data are described and evaluated in the following paper, which you may wish to cite when making use of the data:
Donald Sturgeon. 2017. Unsupervised Identification of Text Reuse in Early Chinese Literature. Digital Scholarship in the Humanities.

Link to Ctext tool: https://ctext.org/

Tutorials: https://dsturgeon.net/tutorials/

Closing the Gap in Non-Latin-Script Data: A tool for building and navigating collections of DH research projects

Ulrike Wuttke — Wed, 09 Aug 2023 09:54:48 +0000

Introduction by Open Methods guest editors (DH2023, Graz) Jacob Hart, Till Grallert, Jose Hernandez

The Closing the Gap in non-Latin script data aims at mapping the field of digital humanities projects outside and beyond the anglosphere with a particular focus on non-Latin scripts such as Arabic or Chinese in both machine-actionable and human readable form. The urgency and value of such a survey has been highlighted in recent discussions around global, decolonial, and multilingual digital humanities. The project itself relies on minimal computing principles in that it gathers data as one JSON file for each project from which it produces a static website hosted on Github Pages. Beyond their own data collection, anyone on the internet can submit data through either a basic form or GitHub issues and pull requests.

The dataset includes information on project titles, aims, time span, disciplines, and, most importantly, project languages. The website provides multiple ways of accessing the data:

Text search allows the user to find a specific project, or search projects by metadata and attain a human-readable parsing of the underlying JSON data.
Faceted browsing allows users to select projects by language and keyword based on a custom tagging scheme.
A map allows the user to access a geographical overview of all of the projects in the database. This can be useful for assessing diversity, and identifying hotspots for research in the field. note: location is based on research location, not on the location of the actual language of study
A timeline also offers a composite view of the projects in the database. Here we can see when research occurred in relation to each other, and notably see which projects are still active.

This database is a result of community action and as such it has all the strengths and drawbacks that come with the nature of community dependent projects. Its strengths lie in a very intuitive way of browsing the data and allows its users to get a quick cursory overview of the state of the field. On the data level, however, the project depends on contributions from the community through GitHub issues and pull requests. This might not prevent the tech-savvy from contributing but might prove a step too far for many humanists and the general public. For now, while it is possible to create entries into the database using the GUI, this creates a JSON file for download to the user’s computer. This file must then be uploaded through opening a GitHub issue. If a user wishes to modify an entry, they must interact with the git repo (creating pull requests, logging issues etc.). On the upside, the simple data structure prevents any lock-in into a specific technology stack and easy transition to different infrastructures.

Even with these technological hurdles and concerns in mind, the current iteration of the website and the database itself performs an essential service for those in the digital humanities that are working with non -Latin scripts. By raising awareness of current projects, more and more researchers can interact with not only their results but also the unique challenges that they are facing across their research.

This system has great potential to be a template for many other use cases: be it for creating more collections of research projects, or collections of other digital objects. The JSON data format is flexible enough to allow the system to represent anything the user could want. There is still work to be done to make the integration with the underlying git repo more user-friendly, however Closing the Gap is a great resource for researchers and teams looking for a streamlined and simple solution for maintaining field information.

As we began gathering data on digital projects dealing with Arabic or similar languages, we thought about how to provide this data in a way that commits to OpenScience principles. So we chose a public Git repository as our main data store, offering the data as JSON in a way that should be as straightforward as possible. Everyone who is interested should be able to contribute without having to deal with too much of a technology stack.
Source: https://m-l-d-h.github.io/Closing-The-Gap-In-Non-Latin-Script-Data/about/

Screenshot from https://m-l-d-h.github.io/Closing-The-Gap-In-Non-Latin-Script-Data/map/

Original content: Closing the Gap Database interactive website; Capture of the site on Internet Archive (14.07.2022)

Website of parent project: Closing the Gap in Non-Latin Script Data • Berlin University Alliance; Capture of the site on Internet Archive