Skip to content

Multi-packages container updates, extended base image #940

@mvdbeek

Description

@mvdbeek

Hi everyone,

We’ve recently tried to containerize around 2000 high quality Galaxy tools. All of these tools contain test cases (2800 in total) that we can run programmatically, and before accepting a new tool or upgrading a tool these tests are run. Previously these tests ran on travis-ci using Conda to satisfy any dependency. Running containerized tests revealed two very common classes of errors which I think we should address:

  1. We don’t respect the extended-base requirement annotated in many bioconda recipes, which leads to broken multi-package-containers (bioconda sets the destination image when building biocontainers). I’ve fixed this in galaxy-tool-util by checking if any of the specified recipes requires the extended base container. A new version of planemo will roll this out. But I think we also need to indicate that a container has been built with the extended base somewhere. We could record whether the minimal base image or the extended base image (or other images we may need in the future, think GPU etc...) @bgruening proposed to prefix the combinations with container:extended;, etc.
    One issue here is that we’ll only know that a container needs the extended base after running the check in galaxy-tool-util, so I’m not sure what could be done about the helper service (https://biocontainers.pro/#/multipackage).

  2. We don’t update multi-containers when new package builds are published to Conda. New builds are frequently published because of missing dependencies, and so multi-package-containers are frequently broken despite the problem being fixed upstream. I think it would be feasible to scan for new builds and increment the multi-container build number (possibly also deleting the earlier build should we run into space problems). I think this is the biggest time-sink when switching a large code-base from Conda to containers at this point. Relatedly I think we should record the build numbers of a build somewhere, so that we know with which build numbers we are dealing with.
    One suggestion here would be to include the Conda build numbers, so we would have container:extended;<package1>--<build2>,<package2>--<build10>.
    Again it would be great to get some suggestions for how to manage this.
    If adding new versions would be restrictive in terms of size we could prune everything but the newest version. I think getting working and up to date containers should be higher priority than maintaining potentially broken containers (but I can see that this might be controversial).
    Another more limited option is to prefix the build number, so we’d have build:2;container:extended;<package1>,<package2> etc.
    That also gives us a nice handle on manually triggering a rebuild. Detailed build info could still be parsed out manually or on a CI system from /usr/local/conda-meta.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions