Skip to content

planning for CWLProv in toil-cwl-runner #2390

@mr-c

Description

@mr-c
  • Refactor CWLJob.run() to return (outputs, metadata) instead of just outputs. metadata is a dictionary that will contain the information we need for generating CWLProv.

  • Propagate the metadata through the .run() calls to the root of the computation

  • Try to reuse Toil's Jobstore ID's (See Accessing the Jobstoreid corresponding to a job #2449) for each CWLJob record this ID and the parent ID.

  • Fill metadata with a data structure containing runtime information about the tasks (tree or dict, with the keys being the jobstore IDs)

  • Generate a ProvenanceProfile per task and a ResearchObject when all the metadata has been gathered.

  • Refactor cwltool/provenance.py so that recorded time and time of recording are decoupled.

  • Refactor ProvenanceProfile:prospective_prov out of the class to be the function that creates all the ProvenanceProfiles and relates them in a tree-like structure.

  • Refactor cwltool/provenance.py so that we can defer file movements until the end of the run

  • Update Toil to use cwltool with the fixes (Update cwltool version to the latest #2469)

Most of the progress is found on https://github.com/DataBiosphere/toil/tree/wip-prov

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-280

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions