Skip to content

Latest commit

 

History

History

README.md

Tutorial #1: Creating a Research Object

This is part of a developer tutorial for creating and consuming Research Objects (RO). This tutorial is programming language-agnostic, but assumes some general JSON and Linux/UNIX shell knowledge.
(Translating shell commands to Windows Powershell equivalent is left as an exercise for the reader.)

Status: DRAFT: As of 2015-06-23, this document is a draft in progress. Feel free to help improve by providing bugs/wishes/suggestions and changes.

License: BSD 2-clause - see LICENSE for details.

Authors: Stian Soiland-Reyes, Norman Morrison, Finn Bacall

Overview

Research Object aim to improve reuse and reproducibility in academic scholarship by capturing not just the publication, but also the data and code that support it, in addition to metadata, provenance and detailed annotations about the constituent resources. This extends beyond the traditional "Supplementary resources" as it makes all of those resources first-class citizens and connect them to each other structurally.

The core Research Object principles are:

Identity: Use globally unique identifiers as names for things

Aggregation: Use some mechanism of aggregation to associate things that are related or part of the broader investigation, study, etc.

Annotation: Provide additional metadata about those things, how they relate to each other, where they came from, when etc.

In this tutorial we'll walk through how to make a simple Research Object, and hopefully along the way show how to achieve each of these principles.

Use case: Publishing data and analysis script

Our use case for the purpose of this tutorial is to publish a Research Object that captures the data and analysis scripts that supports an accepted academic paper. We believe this kind of use case occur in many sciences and research fields, obviously with domain-specific variations and additional requirements.

In this use case, the purpose of the research object is to provide evidence for the claims in the article, but also to provide a direct starting point for someone else who want to reuse the algorithm or raw data.

Conceptually this particular research object should therefore aggregate minimally these resources:

  • The accepted article
  • The raw data
  • The analysis script that used the data

A software tool or researcher that pick up the produced research object should be able to understand or use:

  • The script performs a particular analysis
  • The data was consumed by the script
  • The paper is supported by the data and running of the script

Implementation choices

At the core, Research Object (RO) is a model and vocabulary for describing an aggregation of resources that form part of a larger whole. To realize this model, however, some technology choices also needs to be done.

While the RO model in theory can be implemented by anything from an Excel spreadsheet to a virtual machine image/http://www.researchobject.org/initiative/docker/, in practice the choice stands between two approaches:

  • Linked Data on the web - a series of HTTP accessible resources with links to relate each-other
  • Research Object Bundle - a self-contained research object as a ZIP-file

Each of these have their strengths and weaknesses that we'll try to cover in detail below.

Aggregation

At the core of a Research Object is the aggregation of the related resources. In this example, the three resources to aggregate are available as individual files:

Aggregating in an RO Bundle

In the RO Bundle approach, we can add these three files to a ZIP file with our chosen filenames. The RO Bundle specification has one additional requirement for a special file mimetype, that must be the first file in the ZIP file to indicate it is a Research Object. In the shell we can create such a ZIP file like this:

echo -n application/vnd.wf4ever.robundle+zip > mimetype
zip -0 -X example.bundle.zip mimetype

Alternatively you may use the empty.bundle.zip as a starting point:

cp empty.bundle.zip example.bundle.zip

Adding the files to aggregate to the ZIP:

zip example.bundle.zip rawdata5.csv paper3.pdf analyse2.py

A Research Object Bundle must also include a manifest that declares the aggregated resources and optionally their metadata. The manifest is named .ro/manifest.json, and is in JSON format.

A minimal manifest for our example would be:

{ "@id": "/",
  "@context": ["https://w3id.org/bundle/context"],
  "aggregates": [
    "/paper3.pdf",
    "/rawdata5.csv",
    "/analyse2.py"
  ]
}

Do not change the @id and @context from the above values.

Note: aggregatesfilenames are listed as relative URIs within the ZIP file, and should start with/` with any special characters like space must in the manifest %-escaped appropriately.

You can now add the manifest to the RO bundle:

zip example.bundle.zip .ro/manifest.json

example.bundle.zip is now a complete minimal Research Object Bundle of the above resources. The later sections will show how we can augment this with additional metadata to differentiate it from a plain ZIP file.

Aggregating as Linked Data

In the alternative Linked Data approach there is no single file to download the complete Research Object. Instead the manifest will have to link to resources that can be adressed with a URI, typically starting with http:// or https://, and itself be published on the web.

So the first step is to ensure we have made our resources available on the web. For simplicity of this tutorial, we naively use the URIs at GitHub, but any accessible URI would be valid. (see identity section).

A minimal Research Object manifest in JSON-LD that aggregates these would look like this:

{ "@id": "#ro",
  "@context": ["https://w3id.org/bundle/context"],
  "aggregates": [
    "https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/paper3.pdf",
    "https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/rawdata5.csv",
    "https://github.com/ResearchObject/ro-tutorials/blob/master/01-creating/analyse2.py"
  ]
}

If we provide such a JSON file on the web, and ideally make its Content-Type be application/ld+json, we have created Linked Data. The above example has been published as https://rawgit.com/ResearchObject/ro-tutorials/master/01-creating/ro.jsonld#ro which is a valid Resarch Object as Linked Data, and thus its manifest can also be converted to other RDF formats, if so desired.

Next steps

The next tutorial on RO identity details how to provide and find identifiers for the Research Object and its resources.