Skip to content

Install SARS-CoV-2 workflow dependencies in Native runtime#115

Merged
victorlin merged 1 commit intomasterfrom
victorlin/install-ncov-dependencies-in-native
Jul 7, 2022
Merged

Install SARS-CoV-2 workflow dependencies in Native runtime#115
victorlin merged 1 commit intomasterfrom
victorlin/install-ncov-dependencies-in-native

Conversation

@victorlin
Copy link
Copy Markdown
Member

@victorlin victorlin commented Jun 23, 2022

preview

Description of proposed changes

The Nextstrain Docker image used by the Docker runtime comes with many
other software tools, so that the runtime can work with all core
pathogen workflows. In contrast, the current Native runtime installs a
"bare minimum" set of software, requiring individual workflows to
specify additional installation commands just for the Native runtime.
This commit effectively syncs the two runtimes to provide the same
tools.

In the long term, it would be best to abandon this approach of
supporting n pathogen workflows by a single setup (for both Docker and
Native runtime). While this might seem like taking a step backwards by
putting more potentially unused tools in the Native runtime, the
difference of available tools between Docker/Native currently causes
confusion for users and unnecessary instructions specific to the Native
runtime for each pathogen workflow.

Related issue(s)

Testing

N/A

TODO

Copy link
Copy Markdown
Contributor

@tsibley tsibley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fully on board with this short-term step and the long-term ideal described. 🙌 I think it's better for users than more cross-linking (like #115 adds) but would still say that merge of this should wait until we have broadly concurring feedback from more of the team.

Comment thread src/install.rst Outdated
mamba create -n nextstrain \
-c conda-forge -c bioconda \
nextstrain-cli augur auspice nextalign snakemake git \
nextstrain-cli augur auspice nextalign snakemake git epiweeks pangolin pangolearn \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

Did you look into if adding these three packages significantly increases the installation time or size on disk?

Copy link
Copy Markdown
Member Author

@victorlin victorlin Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't, but that would be good to know! Here's some benchmarks (tl;dr: no significant increase):

Installation time

Note that each time is taken after cleaning all caches so these are worst-case scenarios given my processor and network speed. Not much difference here (93.63s vs. 98.21s user+sys time).

conda clean --all --force-pkgs-dirs
time mamba create -n tmp1 \
    -c conda-forge -c bioconda \
    nextstrain-cli augur auspice nextalign snakemake git \
    --yes
# 52.33s user 41.30s system 92% cpu 1:41.63 total

conda clean --all --force-pkgs-dirs
time mamba create -n tmp2 \
    -c conda-forge -c bioconda \
    nextstrain-cli augur auspice nextalign snakemake git epiweeks pangolin pangolearn \
    --yes
# 55.48s user 42.73s system 90% cpu 1:48.14 total

Disk usage

1.5G -> 1.6G (~0.1G increase).

du -hs /opt/homebrew/Caskroom/miniconda/base/envs/tmp1
# 1.5G	/opt/homebrew/Caskroom/miniconda/base/envs/tmp1
du -hs /opt/homebrew/Caskroom/miniconda/base/envs/tmp2
# 1.6G	/opt/homebrew/Caskroom/miniconda/base/envs/tmp2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The Nextstrain Docker image used by the Docker runtime comes with many
other software tools, so that the runtime can work with all core
pathogen workflows. In contrast, the current Native runtime installs a
"bare minimum" set of software, requiring individual workflows to
specify additional installation commands just for the Native runtime.
This commit effectively syncs the two runtimes to provide the same
tools.

In the long term, it would be best to abandon this approach of
supporting *n* pathogen workflows by a single setup (for both Docker and
Native runtime). While this might seem like taking a step backwards by
putting more potentially unused tools in the Native runtime, the
difference of available tools between Docker/Native currently causes
confusion for users and unnecessary instructions specific to the Native
runtime for each pathogen workflow.
@victorlin victorlin force-pushed the victorlin/install-ncov-dependencies-in-native branch from 55fa033 to b954874 Compare June 28, 2022 23:12
@victorlin victorlin requested a review from a team July 7, 2022 19:21
Copy link
Copy Markdown
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution. We've started to use all of these tools for seasonal flu, now, too, so this is going to make everyone's life easier in the future.

My only note is that eventually this list will get so long that we'll be tempted to put it into a Conda environment YAML or even its own Bioconda recipe and we will have come full circle. Which I guess is a +1 for working on buildpacks (nextstrain/docker-base#71) sooner than later as a way to better manage our dependencies.

@victorlin victorlin merged commit ed237c0 into master Jul 7, 2022
@victorlin victorlin deleted the victorlin/install-ncov-dependencies-in-native branch July 7, 2022 22:40
@tsibley
Copy link
Copy Markdown
Contributor

tsibley commented Jul 8, 2022

My only note is that eventually this list will get so long that we'll be tempted to put it into a Conda environment YAML or even its own Bioconda recipe and we will have come full circle.

IIRC, the only reason we're not using the nextstrain meta-package in Bioconda for Conda-based installation (the original intent) is because the update path is unclear/difficult given limitations of conda/mamba?

One solution to that which I think we briefly considered but didn't pursue would be pinning versions of our packages in the meta-package and automating the update of the pins + making new meta-package releases. This is very much akin to the way to the Docker runtime image is managed currently, and there's lots of benefits to it…

(A reliable meta-package would also make it much easier to implement the vision I have for nextstrain setup, which would be a counterpart to check-setup that would do the work of getting a runtime (Docker, Conda, etc) setup. This is particularly compelling combined with the standalone installation archives for Nextstrain CLI to avoid bootstrapping issues.)

Which I guess is a +1 for working on buildpacks (nextstrain/docker-base#71) sooner than later as a way to better manage our dependencies.

Buildpacks definitely help here for containerized runtimes (Docker, AWS Batch, Terra, (in-some-future) Singularity).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

Streamline native installation steps for subproject references

3 participants