Skip to content

Allow API import/export of format 2 workflows.#6776

Merged
mvdbeek merged 8 commits intogalaxyproject:devfrom
jmchilton:gxformat2_integration
Oct 26, 2018
Merged

Allow API import/export of format 2 workflows.#6776
mvdbeek merged 8 commits intogalaxyproject:devfrom
jmchilton:gxformat2_integration

Conversation

@jmchilton
Copy link
Member

@jmchilton jmchilton commented Sep 26, 2018

This builds on a set of Galaxy PRs #6746, #6807, #6811 and a series of gxformat2 releases.

In addition to bringing in the latest changes to gxformat2 to allow these syntax changes (described below) and updating hundreds of lines of test workflows - this PR finally enables such workflows to work directly with the Galaxy API (switched on enable_beta_workflow_format). When enabled, workflows imported from dictionaries will now be checked and if they look like a format2 workflow they will be pre-converted to a native format before import - uploading YAML wrapped in JSON to avoid dictionary handling changes is allowed also. Likewise, there is a couple new styles of download that attempt to extract format 2 workflows from the native representation (using new functionality added to recent versions of gxformat2). These can be downloaded as JSON or as YAML string content wrapped up in JSON in order to preserve the pretty YAML formatting and dictionary ordering implemented in gxformat2.

If the JSON wrapped YAML-in-strings seems odd - consider also an important future direction of this work is likely to store the original supplied YAML alongside the native representation of the workflow and then allow the workflow editor to operate a series of deltas to both in parallel. This can be done with a round trip aware YAML parsing/writer such as ruamel.yaml so that user comments and formatting as well as extraneous data in the YAML are preserved.

This PR continues to refine the workflow syntax toward a more concise and CWL compatible syntax. The following two code blocks are a before and after example demonstrating the syntax changes.

Before this series of PRs:

class: GalaxyWorkflow
inputs:
  - id: input_fastqs
    type: collection
  - id: reference
outputs:
  - id: pileup_output
    source: pileup#out_file1
steps:
  - label: map_over_mapper
    tool_id: mapper
    state:
      input1:
        $link: input_fastqs
      reference:
        $link: reference
  - label: pileup
    tool_id: pileup
    state:
      input1:
        $link: map_over_mapper#out_file1
      reference:
        $link: reference

My recommended best practice after this PR is merged:

class: GalaxyWorkflow
inputs:
  input_fastqs: collection
  reference: data
outputs:
  pileup_output:
    outputSource: pileup/out_file1
steps:
  map_over_mapper:
    tool_id: mapper
    in:
      input1: input_fastqs
      reference: reference
  pileup:
    tool_id: pileup
    in:
      input1: map_over_mapper/out_file1
      reference: reference

Prior to this pull request, all subworkflows (format-version 0.1 or 2) would be imported repeatedly - once per step that referenced them. This PR introduces a CWL-derived syntax for repeatedly referencing the same workflow and updates the Galaxy import functionality to properly resolve these references and import such workflows only once instead of once per step.

$graph:
- id: nested
  class: GalaxyWorkflow
  inputs:
    inner_input: data
  outputs:
    inner_output:
      outputSource: inner_cat/out_file1
  steps:
    inner_cat:
      tool_id: cat
      in:
        input1: inner_input
        queries_0|input2: inner_input
- id: main
  class: GalaxyWorkflow
  inputs:
    outer_input: data
  steps:
    outer_cat:
      tool_id: cat
      in:
        input1: outer_input
    nested_workflow_1:
      run: '#nested'
      in:
       inner_input: outer_cat/out_file1
    nested_worklfow_2:
      run: '#nested'
      in:
        inner_input: nested_workflow_1/inner_output

@jmchilton jmchilton force-pushed the gxformat2_integration branch 3 times, most recently from a90464c to bc9717b Compare September 30, 2018 20:11
@jmchilton jmchilton force-pushed the gxformat2_integration branch 6 times, most recently from 50bdfd5 to eb0ddac Compare October 6, 2018 13:41
@jmchilton jmchilton force-pushed the gxformat2_integration branch 2 times, most recently from 540706a to 3be0747 Compare October 9, 2018 14:16
@jmchilton jmchilton changed the title [WIP] gxformat2 workflow integration with Galaxy API Allow API import/export of format 2 workflows. Oct 9, 2018
@jmchilton jmchilton force-pushed the gxformat2_integration branch from 3be0747 to 397f237 Compare October 10, 2018 12:44
@jmchilton jmchilton changed the title Allow API import/export of format 2 workflows. [WIP] Allow API import/export of format 2 workflows. Oct 10, 2018
@jmchilton jmchilton force-pushed the gxformat2_integration branch from 397f237 to 2fac59a Compare October 11, 2018 00:13
@jmchilton jmchilton force-pushed the gxformat2_integration branch from 2fac59a to f7415cb Compare October 12, 2018 17:41
@jmchilton jmchilton changed the title [WIP] Allow API import/export of format 2 workflows. Allow API import/export of format 2 workflows. Oct 12, 2018
@mvdbeek mvdbeek merged commit dda297c into galaxyproject:dev Oct 26, 2018
@jmchilton
Copy link
Member Author

Thanks a million @mvdbeek !

@nsoranzo nsoranzo deleted the gxformat2_integration branch November 30, 2018 16:04
nsoranzo added a commit to nsoranzo/galaxy that referenced this pull request Nov 30, 2018
with `make config-rebuild` .
Follow-up on galaxyproject#6776 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants