Skip to content

WIP: remove preprocessing pipelines#823

Closed
jameshadfield wants to merge 4 commits intomove-filterfrom
remove-preprocess
Closed

WIP: remove preprocessing pipelines#823
jameshadfield wants to merge 4 commits intomove-filterfrom
remove-preprocess

Conversation

@jameshadfield
Copy link
Copy Markdown
Member

@jameshadfield jameshadfield commented Dec 16, 2021

WIP

  • Remove preprocessing config files
  • Change main profiles to start from sequences but call them aligned
  • Remove conditional statements using config.upload
    • config.schema.yaml
    • slack messages
    • upload rule doesn’t upload alignments / filtered (previous PR)
    • upload files
  • remove trigger_phylogenetic_rebuild
  • GitHub actions
  • Docs pass
    • dev-docs
  • Rebase upon parent PR Move filter after subsampling #814 after changes made there
  • CI

huddlej and others added 4 commits December 9, 2021 15:44
Starts to reorganize the workflow such that we only need the sequence
index when subsampling with priorities and we only filter after the
subsampling, alignment, and mask steps.

Related to #810
These should be small enough, it isn't worth the extra effort/complexity
to compress.
@jameshadfield jameshadfield marked this pull request as draft December 16, 2021 05:05
jameshadfield added a commit to nextstrain/ncov-ingest that referenced this pull request Dec 16, 2021
This change is a companion to nextstrain/ncov#823
which removed the preprocessing pipelines.
@huddlej
Copy link
Copy Markdown
Contributor

huddlej commented Dec 23, 2021

Closing this PR since @jameshadfield and I cherry-picked its single new commit onto the original move-filter PR (9abad2a).

@huddlej huddlej closed this Dec 23, 2021
@huddlej huddlej deleted the remove-preprocess branch December 23, 2021 18:50
tsibley added a commit that referenced this pull request Apr 6, 2023
Use aligned sequences as the aligned sequences input, rather than pass
off unaligned sequences as the aligned sequences input.

This should be inconsequential to workflow behaviour or results, but it
makes the config a bit more straightforward and less confusing.

In a quick dig thru history, it seems like ncov-ingest's
aligned.fasta.xz was not _quite_ available when we first switched our
profiles to use an "aligned" input instead of a "sequences" input.  The
original use of "aligned" with unaligned sequences was driven by run
time concerns and related to the move of the filtering step after the
subsampling step and move of the "preprocess" steps from this workflow
(ncov) to ncov-ingest.¹

¹ <#814>
  <#823>
tsibley added a commit that referenced this pull request Apr 6, 2023
Use aligned sequences as the aligned sequences input, rather than pass
off unaligned sequences as the aligned sequences input.

This should be inconsequential to workflow behaviour or results, but it
makes the config a bit more straightforward and less confusing.

In a quick dig thru history, it seems like ncov-ingest's
aligned.fasta.xz was not _quite_ available when we first switched our
profiles to use an "aligned" input instead of a "sequences" input.  The
original use of "aligned" with unaligned sequences was driven by run
time concerns and related to the move of the filtering step after the
subsampling step and move of the "preprocess" steps from this workflow
(ncov) to ncov-ingest.¹

Resolves <#1054>.

¹ <#814>
  <#823>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants