Skip to content

Manifest-driven shed repository definitions#143

Merged
jmchilton merged 5 commits intogalaxyproject:masterfrom
jmchilton:shed_realizations
Apr 28, 2015
Merged

Manifest-driven shed repository definitions#143
jmchilton merged 5 commits intogalaxyproject:masterfrom
jmchilton:shed_realizations

Conversation

@jmchilton
Copy link
Member

Currently there exists a tension between what is best for developers (storing all tools in a single repository - e.g. ncbi_blast_plus or bedtools) and what is best for Galaxy users (storing a single repository per tool and collecting them together with a suite - e.g. samtools or gatk). More discussion here.

This pull request extends the semantics of .shed.yml in a attempt to resolve this tension and make the best practice for Galaxy users trivial to manage for developers. Previously each .shed.yml could only correspond to a single Tool Shed repository and it would collect all files in a directory (except an optional list of ignored files). This pull request extends the shed_create and shed_upload commands to allow .shed.yml files to correspond to any number of actual Tool Shed repositories each with fully customizable file includes and excludes.

While there is a great deal customization allowed - two new keys auto_tool_repositories and suite provide shortcuts to quickly and implicity define repositories for for each individual tool in the directory and build a suite for those. Consider the following (admittedly idealized) samtools example:

owner: "devteam"
remote_repository_url: "https://github.com/galaxyproject/tools-devteam/tool_collections/samtools"
homepage_url: "https://github.com/galaxyproject/tools-devteam/"
categories:
  - "SAM"
auto_tool_repositories:
  name_template: "{{ tool_id }}"
  description_template: "Wrapper for samtools application {{ tool_name }}."
suite:
  name: "suite_samtools_1_2"
  description: "A suite of Galaxy tools designed to work with version 1.2 of the SAMtools package."
  long_description:
  > SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence
    alignments.   This repository suite associates selected repositories containing Galaxy utilities that require
    version 1.2 of the SAMTools package.  These associated Galaxy utilities consist of a Galaxy Data
    Manager contained in the repository named data_manager_sam_fasta_index_builder and Galaxy tools
    contained in several separate repositories.

This example assumes the .shed.yml file is placed in a "flat" directory with each samtools tool wrapper and planemo will create and update repositories for each individual tool given the specified templates in auto_tool_repositories. The suite key here will auto-generate a suite repository for all of these tools and will automatically created the corresponding repository_dependencies.xml to populate it with (this is generated during shed_upload and never needs to exist in your repository).

Again this example is admittedly idealized, but if auto_tool_repositories is not specified, a repositories list can be specified instead. There are some examples of this in the test data included with this pull request:

  • This .shed.yml is a simple example of specifying custom repositories for individual tools.
    -This demonstrates complex inclusions files from sub-directories and renaming.
  • This .shed.yml demonstrates complex inclusions files from sub-directories and renaming.

The test data also includes some more advanced usages of the suite key as well - specifically using it without auto_tool_repositories as a generic replacement for repository_dependencies.xml and adding additional dependent repositories in addition to the ones defined by the .shed.yml file.

Implements #26.

```
repositories:
  cs-cat1:
    include:
      - cat1.xml
      - macros.xml
      - test-data
  cs-cat2:
    include:
      - cat2.xml
      - macros.xml
      - test-data
```

Adding tests for tar ball and repository creation to verify this.

``exclude`` now works in addition to ``ignore`` in ``.shed.yml`` for consistency with ``include``.
An example might look like:

```
owner: "iuc"
remote_repository_url: "https://github.com/galaxyproject/planemo/tree/master/tests/data/repos/multi_repos_flat_flag"
homepage_url: "http://planemo.readthedocs.org/en/latest/"
categories:
  - "Text Manipulation"
auto_tool_repositories:
  name_template: "cs-{{ tool_id }}"
  description_template: "The tool {{ tool_name }} from the cat tool suite."
```
An example of creating a .shed.yml that produces just a single repository with one suite in it might be:

```
owner: devteam
suite:
  name: suite_1
  description: "A suite of Galaxy tools designed to work with version 1.2 of the SAMtools package."
  include_repositories:
  - name: data_manager_sam_fasta_index_builder
    owner: devteam
  - name: bam_to_sam
    owner: devteam
  - name: sam_to_bam
    owner: devteam
  - name: samtools_bedcov
    owner: devteam
```

In this case we are defining explicit dependent repositories but it can also be used with .shed.yml files that define other repositries. For instance if used with ``auto_tool_repositories`` these will automatically be included in the suite.

```
owner: "iuc"
remote_repository_url: "https://github.com/galaxyproject/planemo/tree/master/tests/data/repos/multi_repos_flat_flag"
homepage_url: "http://planemo.readthedocs.org/en/latest/"
categories:
  - "Text Manipulation"
auto_tool_repositories:
  name_template: "cs-{{ tool_id }}"
  description_template: "The tool {{ tool_name }} from the cat tool suite."
suite:
  name: "suite_cat"
  description: "A suite of Cat tools."
  long_description: "A longer description of all the cat tools."
```
Think these semantics are a little better.
Previously custom include statements must have plain strings - files, directories, or globs relative to the .shed.yml file. This has now been extended to allow more complex source and destination selection.

The following is taken from the added test data and demonstrates pulling in and renaming a single file from outside the .shed.yml directory and copying a whole directory into a new directory.

```
repositories:
  cs-cat1:
    description: "The tool Cat 1 from the cat tool suite."
    include:
      - cat1.xml
      - macros.xml
      - test-data
      - source: ../shared_files/CITATION
        destination: CITATION.txt
      - source: ../shared_files/extra_test_data/**
        strip_components: 3  # drop "..", "shared_files", "extra_test_data" from source
        destination: test-data
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this belong in shed util?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is shed util?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I meant the shed.py file in planemo where a bunch of TS interactivity
was stuck

man. 27. apr. 2015, 08.32 skrev John Chilton notifications@github.com:

In planemo/shed.py
#143 (comment):

  • repository_dependencies.repo_pairs = list(repo_pairs) + list(extra_pairs)
  • repo = {
  •    "_files": {
    
  •        REPO_DEPENDENCIES_CONFIG_NAME: str(repository_dependencies)
    
  •    },
    
  •    "include": [],
    
  •    "name": name,
    
  •    "description": description,
    
  • }
  • if long_description:
  •    repo["long_description"] = long_description
    
  • repos[name] = repo

+def find_repository(tsi, owner, name):

What is shed util?


Reply to this email directly or view it on GitHub
https://github.com/galaxyproject/planemo/pull/143/files#r29145707.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is this file :). Though I would like to break it up (#135).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I...uh...how could I possibly be this blind. So sorry @jmchilton

@jmchilton
Copy link
Member Author

I would like to do a planemo release tonight - preferably including this pull request. Let me know if this .shed.yml additions in here rub anyone the wrong way and I will do the release without this.

@hexylena
Copy link
Member

+1

@hexylena
Copy link
Member

galaxy-iuc/standards#6

@jmchilton
Copy link
Member Author

Awesome - thanks for the review @erasche.

jmchilton added a commit that referenced this pull request Apr 28, 2015
Manifest-driven shed repository definitions
@jmchilton jmchilton merged commit 79c5c93 into galaxyproject:master Apr 28, 2015
@jmchilton jmchilton deleted the shed_realizations branch April 28, 2015 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants