Allow different (multiple) inputs by jameshadfield · Pull Request #106 · nextstrain/avian-flu

jameshadfield · 2024-12-02T04:17:38Z

By having all phylogenetic workflows start from two lists of inputs
(config.inputs, config.additional_inputs) we enable a broad range of
uses with a consistent interface.

Using local ingest files is trivial (see added docs) and doesn't need
a bunch of special-cased logic that is prone to falling out of date
(as it had indeed done)
Adding extra / private data follows the similar pattern, with an
additional config list being used so that we are explicit that the
new data is additional and enforce an ordering which is needed for
predictable augur merge behaviour. The canonical data can be
removed / replaced via step (1) if needed.

I considered adding additional data after the subtype-filtering step,
which would avoid the need to add subtype in the metadata but requires
encoding this in the config overlay. I felt the chosen way was simpler
and more powerful.

Note that this workflow uses an old version of the CI workflow,
https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240
which copies example_data. We could upgrade to the latest version
and use a config overlay to swap out the canonical inputs with the
example data.

See added docs for examples.

victorlin

Comments regarding sequence merge

victorlin · 2024-12-04T18:55:53Z

+    input:
+        metadata = lambda w: collect_inputs(segment=w.segment)
+    output:
+        metadata = "results/sequences_merged_{segment}.fasta"


Naming nitpick:

Suggested change

input:

metadata = lambda w: collect_inputs(segment=w.segment)

output:

metadata = "results/sequences_merged_{segment}.fasta"

input:

sequences = lambda w: collect_inputs(segment=w.segment)

output:

sequences = "results/sequences_merged_{segment}.fasta"

jameshadfield · 2024-12-04T23:26:33Z

+additional_inputs:
+  - name: secret
+    metadata: secret.tsv
+    sequencs: secret_{segment}.fasta


By having all phylogenetic workflows start from two lists of inputs (`config.inputs`, `config.additional_inputs`) we enable a broad range of uses with a consistent interface. 1. Using local ingest files is trivial (see added docs) and doesn't need a bunch of special-cased logic that is prone to falling out of date (as it had indeed done) 2. Adding extra / private data follows the similar pattern, with an additional config list being used so that we are explicit that the new data is additional and enforce an ordering which is needed for predictable `augur merge` behaviour. The canonical data can be removed / replaced via step (1) if needed. I considered adding additional data after the subtype-filtering step, which would avoid the need to add subtype in the metadata but requires encoding this in the config overlay. I felt the chosen way was simpler and more powerful. Note that this workflow uses an old version of the CI workflow, <https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240> which copies `example_data`. We could upgrade to the latest version and use a config overlay to swap out the canonical inputs with the example data.

jameshadfield · 2024-12-16T02:38:00Z

Closing in favor of #112

This is inspired by the work in: nextstrain/avian-flu#106 https://github.com/nextstrain/avian-flu?tab=readme-ov-file#use-additional-metadata-andor-sequences

jameshadfield force-pushed the james/refactor-inputs branch from 82abda3 to c6d92ed Compare December 2, 2024 20:47

jameshadfield mentioned this pull request Dec 2, 2024

Provide a generic pattern for including additional user data alongside curated data nextstrain/pathogen-repo-guide#72

Closed

victorlin reviewed Dec 4, 2024

View reviewed changes

jameshadfield commented Dec 4, 2024

View reviewed changes

victorlin mentioned this pull request Dec 4, 2024

merge: Support sequences with cross-checking nextstrain/augur#1601

Closed

5 tasks

jameshadfield force-pushed the james/refactor-inputs branch from c6d92ed to 05b4622 Compare December 12, 2024 03:55

jameshadfield force-pushed the james/update-config-syntax branch from fb99903 to c60554a Compare December 12, 2024 03:55

jameshadfield mentioned this pull request Dec 15, 2024

Use augur merge for sequences nextstrain/zika#76

Merged

3 tasks

jameshadfield closed this Dec 16, 2024

This was referenced Dec 16, 2024

merge: Support sequences nextstrain/augur#1579

Closed

Multiple inputs / overriding inputs #112

Merged

victorlin deleted the james/refactor-inputs branch December 17, 2024 23:31

jameshadfield mentioned this pull request Dec 18, 2024

WIP add config schema & generate HTML docs #107

Closed

j23414 mentioned this pull request Feb 5, 2025

Optional enhancement: Revisit current way of merging private data (via annotations.tsv) nextstrain/WNV#65

Closed

j23414 mentioned this pull request Feb 24, 2025

Allow for multiple inputs from the config file nextstrain/zika#80

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow different (multiple) inputs#106

Allow different (multiple) inputs#106
jameshadfield wants to merge 1 commit intojames/update-config-syntaxfrom
james/refactor-inputs

jameshadfield commented Dec 2, 2024 •

edited

Loading

Uh oh!

victorlin left a comment

Uh oh!

Uh oh!

victorlin Dec 4, 2024

Uh oh!

jameshadfield Dec 4, 2024

Uh oh!

jameshadfield commented Dec 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jameshadfield commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

victorlin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

victorlin Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

jameshadfield Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

jameshadfield commented Dec 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jameshadfield commented Dec 2, 2024 •

edited

Loading