Skip to content

Snakemake v9 support#1180

Merged
joverlee521 merged 11 commits intomasterfrom
snakemake-v9
Jul 17, 2025
Merged

Snakemake v9 support#1180
joverlee521 merged 11 commits intomasterfrom
snakemake-v9

Conversation

@jameshadfield
Copy link
Copy Markdown
Member

Needs a lot of testing and some more polish, but it doesn't seem like it'll take much to support Snakemake v9. I tested using Snakemake v9.6.2 via a single modified ncov-open run: snakemake --cores 3 -p --profile nextstrain_profiles/nextstrain-open -f auspice/ncov_open_global_1m.json which now completes successfully.

Please add comments / commits to this PR if you can!

Closes #1179
May close #1114, but I only tested on Snakemake v9

printshellcmds: True
show-failed-logs: True
restart-times: 2
stats: stats.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: 4b2685c

(non-blocking)

No more stats files? That's a bummer 😞 I've been using Tom's vis tool for these files to inspect performance of augur filter calls. We can still get the same numbers from per-rule benchmark files, but it's been nice to have a summary across all rules.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really look into it (hence the "WIP" commit), I was just seeing the minimal amount of work to get the workflow running. The migration guide, somewhat comically, says "Unsupported?" so... maybe it's supported in some capacity? Please feel free to dive in here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nothing wrong with your changes. Just wanted to rant 😄

Here's some discussion on its removal: snakemake/snakemake#3523

The reports feature is a potential replacement, but I haven't used it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are moving to v9 we should look into the reports feature, but from ~1 minute of scanning the docs it doesn't seem anywhere as nice. All in all we're not going to block the v9 upgrade on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried out the reports feature last year, it wasn't that great...

Copy link
Copy Markdown

@j23414 j23414 Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For norovirus/dengue work, I roughly generated a python script (prompting chatgpt o4) to convert the "benchmarks" directory of timings to a "stats.json" file. Seems to work so far, although probably room for improvement.

./summarize-stats.py --benchmark-dir phylogenetic/benchmarks --stats stats.json

summarize-stats.py.txt

@jameshadfield
Copy link
Copy Markdown
Member Author

jameshadfield commented Jul 9, 2025

Triggered a GISAID 21L test run using, or attempting to use, the nextstrain/base:branch-snakemake-v9 image, which has now completed successfully. I'm not sure we need to test any more of our workflows, but I know we've documented a few ways ncov can be run by external users which it'd be good for someone to test?

@joverlee521
Copy link
Copy Markdown
Contributor

Added e115838 to check the Snakemake version based on today's dev chat.

This is what the built-in error looks like

$ nextstrain build . --configfile nextstrain_profiles/nextstrain-ci/config.yaml -n
WorkflowError in file /nextstrain/build/Snakefile, line 17:
Expecting Snakemake version 8.0.0 or higher (you are currently using 7.32.3).
  File "/nextstrain/build/Snakefile", line 17, in <module>

jameshadfield and others added 7 commits July 11, 2025 12:10
The previous version of `path_or_url` was not compatible with Snakemake
v8, which implemented a storage plugin architecture [1]. This is a
complete rewrite of the approach and was originally written for zika
[2]. See [2] for more commentary as well as the docs added to
`remote_files.smk`

Google Storage URIs are not supported due to authn complexity, as
discussed in [3] and HTTP support is dropped as requested in [4].

The default of `keep_local` is now True. Since all (n=3) existing usages
of `path_or_url` specify `keep_local=True` this is consistent with
previous behaviour.

[1] <https://snakemake.readthedocs.io/en/stable/snakefiles/storage.html>
[2] <nextstrain/zika#89>
[3] <nextstrain/zika#89 (comment)>
[4] <nextstrain/zika#89 (comment)>
Using 8.0.0 as the minimum version because that is when the storage plugins 
were added. Even though we are bumping runtimes to Snakemake v9, we are not 
using any specific v9 features.
This printed the reason for each executed rule but was already
deprecated and always true in v7 [1] and has been removed entirely in
later snakemake versions. The upgrade path is clear [2]: "Deprecated:
Drop it and don't worry about anything"

Note: The same changes were applied to zika [3]

[1] <https://snakemake.readthedocs.io/en/v7.22.0/executing/cli.html#output>
[2] <https://snakemake.readthedocs.io/en/stable/getting_started/migration.html>
[3] <nextstrain/zika#88>
The `--stats` CLI argument is no longer supported and according to
<https://snakemake.readthedocs.io/en/stable/getting_started/migration.html>
there is no suitable replacement. The reports feature looks along the
same lines as stats, but doesn't seem as well suited for our purposes.
See <#1180 (comment)>
for more discussion around this.
The keyword storage now has a special meaning in Snakemake
<https://snakemake.readthedocs.io/en/stable/snakefiles/storage.html>.
Rename our existing usage of this variable to avoid conflicts.
As we now require snakemake v8 we can leverage the more convenient
`input.size_mb` functionality, which is functionally equivalent to
`input.size / 1024 / 1024` [1]

The `rule colors` wants the size of an individual input (not the sum of
all inputs) and the `_IOFile.size` function is now a coroutine. The more
widely used (by us) `InputFiles.size` function uses `async_run` to allow
coroutine calls so I think it's fine for us to do so explicitly to get
the size of an individual file. Other code paths end up at the same
point, such as `input.size_files_mb[list(input.keys()).index('metadata')]`

Closes #1114

[1] <https://github.com/snakemake/snakemake/blob/7f2606e453ba6bd83222ec3679178ae96bd142d2/src/snakemake/io/__init__.py#L1894-L1923>
[2] <https://github.com/snakemake/snakemake/blob/7f2606e453ba6bd83222ec3679178ae96bd142d2/src/snakemake/io/__init__.py#L622-L633>
With Snakemake v9 we observed the following error:

    No validator found for JSON Schema version identifier 'http://json-schema.org/draft-06/schema'
    Defaulting to validator for JSON Schema version 'https://json-schema.org/draft/2020-12/schema'
    Note that schema file may not be validated correctly.

There's nothing salient in
<https://snakemake.readthedocs.io/en/stable/project_info/history.html>
indicating why this happened but it wasn't observed in v7. I read
through the migration guides to get us up to 2020-12
<https://json-schema.org/specification/migration> and didn't find any
changed needed, which matches with the validation of our schema using
the new schema defs.
jameshadfield added a commit to nextstrain/zika that referenced this pull request Jul 11, 2025
The code here is lifted directly from <nextstrain/ncov#1180>
which itself was lifted from <#89>.
See those PRs for in-depth discussions.

The `remote_files.smk` file added here will be moved to the
github.com/nextstrain/shared/ repo and vendored across pathogen repos,
including this one. Assuming this doesn't change cf. the ncov
implementation I think it's fine to have the file copy-pasted around a
few repos before it arrives in the shared repo.
joverlee521 added a commit to nextstrain/docker-base that referenced this pull request Jul 11, 2025
No longer needed since we are removing Google Cloud support in 
ncov/remove_files

<nextstrain/ncov#1180>
joverlee521 added a commit to nextstrain/conda-base that referenced this pull request Jul 11, 2025
No longer needed since we are removing Google Cloud support in 
ncov/remove_files

<nextstrain/ncov#1180>
j23414 pushed a commit to nextstrain/zika that referenced this pull request Jul 14, 2025
The code here is lifted directly from <nextstrain/ncov#1180>
which itself was lifted from <#89>.
See those PRs for in-depth discussions.

The `remote_files.smk` file added here will be moved to the
github.com/nextstrain/shared/ repo and vendored across pathogen repos,
including this one. Assuming this doesn't change cf. the ncov
implementation I think it's fine to have the file copy-pasted around a
few repos before it arrives in the shared repo.
We are no longer using these internally [1] however we can support them
again (using different code) in the future as needed

[1] <https://bedfordlab.slack.com/archives/CSKMU6YUC/p1752630023070009>
If we wish to support this again in the future we can re-implement it
in a snakemake v9 compatible way. Given the hardcoded paths in the
config I'm confident this was only ever used by the neher lab and not
external users.
@joverlee521 joverlee521 merged commit 137ffc4 into master Jul 17, 2025
6 checks passed
@joverlee521 joverlee521 deleted the snakemake-v9 branch July 17, 2025 18:28
jameshadfield added a commit to nextstrain/shared that referenced this pull request Jul 20, 2025
I originally implemented this in <nextstrain/zika#89>
and that's where the bulk of the discussion and review happened. That
PR wasn't merged and instead it was added to ncov via
<nextstrain/ncov#1180>. The code in this commit
is copied directly from <https://github.com/nextstrain/ncov/blob/137ffc479a1bfe4e2c6ae1bbcd1b22e3afc5f8dd/workflow/snakemake_rules/remote_files.smk>

Closes #56
jameshadfield added a commit to nextstrain/shared that referenced this pull request Jul 20, 2025
I originally implemented this in <nextstrain/zika#89>
and that's where the bulk of the discussion and review happened. That
PR wasn't merged and instead it was added to ncov via
<nextstrain/ncov#1180>. The code in this commit
is copied directly from <https://github.com/nextstrain/ncov/blob/137ffc479a1bfe4e2c6ae1bbcd1b22e3afc5f8dd/workflow/snakemake_rules/remote_files.smk>

Closes #56
jameshadfield added a commit to nextstrain/shared that referenced this pull request Jul 21, 2025
I originally implemented this in <nextstrain/zika#89>
and that's where the bulk of the discussion and review happened. That
PR wasn't merged and instead it was added to ncov via
<nextstrain/ncov#1180>. The code in this commit
is copied directly from <https://github.com/nextstrain/ncov/blob/137ffc479a1bfe4e2c6ae1bbcd1b22e3afc5f8dd/workflow/snakemake_rules/remote_files.smk>
with some changes to comments in the code as applicable.

Closes #56
j23414 pushed a commit to nextstrain/zika that referenced this pull request Jul 21, 2025
The code here is lifted directly from <nextstrain/ncov#1180>
which itself was lifted from <#89>.
See those PRs for in-depth discussions.

The `remote_files.smk` file added here will be moved to the
github.com/nextstrain/shared/ repo and vendored across pathogen repos,
including this one. Assuming this doesn't change cf. the ncov
implementation I think it's fine to have the file copy-pasted around a
few repos before it arrives in the shared repo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update workflow to be compatible with Snakemake v9 Workflow fails on Snakemake 8.x

5 participants