Allow different (multiple) inputs#106
Closed
jameshadfield wants to merge 1 commit intojames/update-config-syntaxfrom
Closed
Allow different (multiple) inputs#106jameshadfield wants to merge 1 commit intojames/update-config-syntaxfrom
jameshadfield wants to merge 1 commit intojames/update-config-syntaxfrom
Conversation
82abda3 to
c6d92ed
Compare
victorlin
reviewed
Dec 4, 2024
Member
victorlin
left a comment
There was a problem hiding this comment.
Comments regarding sequence merge
Comment on lines
+320
to
+332
| input: | ||
| metadata = lambda w: collect_inputs(segment=w.segment) | ||
| output: | ||
| metadata = "results/sequences_merged_{segment}.fasta" |
Member
There was a problem hiding this comment.
Naming nitpick:
Suggested change
| input: | |
| metadata = lambda w: collect_inputs(segment=w.segment) | |
| output: | |
| metadata = "results/sequences_merged_{segment}.fasta" | |
| input: | |
| sequences = lambda w: collect_inputs(segment=w.segment) | |
| output: | |
| sequences = "results/sequences_merged_{segment}.fasta" |
jameshadfield
commented
Dec 4, 2024
| additional_inputs: | ||
| - name: secret | ||
| metadata: secret.tsv | ||
| sequencs: secret_{segment}.fasta |
5 tasks
By having all phylogenetic workflows start from two lists of inputs (`config.inputs`, `config.additional_inputs`) we enable a broad range of uses with a consistent interface. 1. Using local ingest files is trivial (see added docs) and doesn't need a bunch of special-cased logic that is prone to falling out of date (as it had indeed done) 2. Adding extra / private data follows the similar pattern, with an additional config list being used so that we are explicit that the new data is additional and enforce an ordering which is needed for predictable `augur merge` behaviour. The canonical data can be removed / replaced via step (1) if needed. I considered adding additional data after the subtype-filtering step, which would avoid the need to add subtype in the metadata but requires encoding this in the config overlay. I felt the chosen way was simpler and more powerful. Note that this workflow uses an old version of the CI workflow, <https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240> which copies `example_data`. We could upgrade to the latest version and use a config overlay to swap out the canonical inputs with the example data.
c6d92ed to
05b4622
Compare
fb99903 to
c60554a
Compare
3 tasks
Member
Author
|
Closing in favor of #112 |
This was referenced Dec 16, 2024
j23414
added a commit
to nextstrain/zika
that referenced
this pull request
Feb 22, 2025
j23414
added a commit
to nextstrain/zika
that referenced
this pull request
Feb 22, 2025
1 task
j23414
added a commit
to nextstrain/zika
that referenced
this pull request
Jun 27, 2025
j23414
added a commit
to nextstrain/zika
that referenced
this pull request
Jul 14, 2025
j23414
added a commit
to nextstrain/zika
that referenced
this pull request
Jul 21, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
By having all phylogenetic workflows start from two lists of inputs
(
config.inputs,config.additional_inputs) we enable a broad range ofuses with a consistent interface.
a bunch of special-cased logic that is prone to falling out of date
(as it had indeed done)
additional config list being used so that we are explicit that the
new data is additional and enforce an ordering which is needed for
predictable
augur mergebehaviour. The canonical data can beremoved / replaced via step (1) if needed.
I considered adding additional data after the subtype-filtering step,
which would avoid the need to add subtype in the metadata but requires
encoding this in the config overlay. I felt the chosen way was simpler
and more powerful.
Note that this workflow uses an old version of the CI workflow,
https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240
which copies
example_data. We could upgrade to the latest versionand use a config overlay to swap out the canonical inputs with the
example data.
See added docs for examples.