This is part of a developer tutorial for creating and consuming
Research Objects (RO).
This tutorial is programming language-agnostic, but assumes some
general JSON and Linux/UNIX shell knowledge.
(Translating shell commands to Windows Powershell equivalent is left as an exercise for the reader.)
Status: DRAFT: As of 2015-06-23, this document is a draft in progress. Feel free to help improve by providing bugs/wishes/suggestions and changes.
License: BSD 2-clause - see LICENSE for details.
Authors: Stian Soiland-Reyes, Norman Morrison, Finn Bacall
Research Object aim to improve reuse and reproducibility in academic scholarship by capturing not just the publication, but also the data and code that support it, in addition to metadata, provenance and detailed annotations about the constituent resources. This extends beyond the traditional "Supplementary resources" as it makes all of those resources first-class citizens and connect them to each other structurally.
The core Research Object principles are:
Identity: Use globally unique identifiers as names for things
Aggregation: Use some mechanism of aggregation to associate things that are related or part of the broader investigation, study, etc.
Annotation: Provide additional metadata about those things, how they relate to each other, where they came from, when etc.
In this tutorial we'll walk through how to make a simple Research Object, and hopefully along the way show how to achieve each of these principles.
Our use case for the purpose of this tutorial is to publish a Research Object that captures the data and analysis scripts that supports an accepted academic paper. We believe this kind of use case occur in many sciences and research fields, obviously with domain-specific variations and additional requirements.
In this use case, the purpose of the research object is to provide evidence for the claims in the article, but also to provide a direct starting point for someone else who want to reuse the algorithm or raw data.
Conceptually this particular research object should therefore aggregate minimally these resources:
- The accepted article
- The raw data
- The analysis script that used the data
A software tool or researcher that pick up the produced research object should be able to understand or use:
- The script performs a particular analysis
- The data was consumed by the script
- The paper is supported by the data and running of the script
At the core, Research Object (RO) is a model and vocabulary for describing an aggregation of resources that form part of a larger whole. To realize this model, however, some technology choices also needs to be done.
While the RO model in theory can be implemented by anything from an Excel spreadsheet to a virtual machine image/http://www.researchobject.org/initiative/docker/, in practice the choice stands between two approaches:
- Linked Data on the web - a series of HTTP accessible resources with links to relate each-other
- Research Object Bundle - a self-contained research object as a ZIP-file
Each of these have their strengths and weaknesses that we'll try to cover in detail below.
At the core of a Research Object is the aggregation of the related resources. In this example, the three resources to aggregate are available as individual files:
In the RO Bundle approach, we can add these three files to a ZIP file with our chosen filenames. The RO Bundle specification has one additional requirement for a special file mimetype, that must be the first file in the ZIP file to indicate it is a Research Object. In the shell we can create such a ZIP file like this:
echo -n application/vnd.wf4ever.robundle+zip > mimetype
zip -0 -X example.bundle.zip mimetypeAlternatively you may use the empty.bundle.zip as a starting point:
cp empty.bundle.zip example.bundle.zipAdding the files to aggregate to the ZIP:
zip example.bundle.zip rawdata5.csv paper3.pdf analyse2.pyA Research Object Bundle must also include a manifest that declares the aggregated
resources and optionally their metadata. The manifest is named .ro/manifest.json, and is in
JSON format.
A minimal manifest for our example would be:
{ "@id": "/",
"@context": ["https://w3id.org/bundle/context"],
"aggregates": [
"/paper3.pdf",
"/rawdata5.csv",
"/analyse2.py"
]
}Do not change the @id and @context from the above values.
Note: aggregatesfilenames are listed as relative URIs within the ZIP file, and should start with/` with any special characters like space must
in the manifest
%-escaped
appropriately.
You can now add the manifest to the RO bundle:
zip example.bundle.zip .ro/manifest.jsonexample.bundle.zip is now a complete minimal
Research Object Bundle of the above resources. The later sections will show how
we can augment this with additional metadata to differentiate
it from a plain ZIP file.
In the alternative Linked Data approach there is no single file to download the complete Research Object. Instead
the manifest will have to link to resources that can be adressed with a URI, typically starting with http:// or https://,
and itself be published on the web.
So the first step is to ensure we have made our resources available on the web. For simplicity of this tutorial, we naively use the URIs at GitHub, but any accessible URI would be valid. (see identity section).
- https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/paper3.pdf
- https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/rawdata5.csv
- https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/analyse2.py
A minimal Research Object manifest in JSON-LD that aggregates these would look like this:
{ "@id": "#ro",
"@context": ["https://w3id.org/bundle/context"],
"aggregates": [
"https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/paper3.pdf",
"https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/rawdata5.csv",
"https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/analyse2.py"
]
}If we provide such a JSON file on the web, and ideally make its Content-Type be application/ld+json, we have created
Linked Data. The above example has been published as
https://rawgit.com/ResearchObject/ro-tutorials/master/01-creating/ro.jsonld#ro which is a valid Resarch Object as Linked Data, and thus its manifest can also be converted to other RDF formats, if so desired.
The next tutorial on RO identity details how to provide and find identifiers for the Research Object and its resources.