Skip to content

Why use unaligned sequences as "aligned" input instead of the actually aligned sequences? #1054

@tsibley

Description

@tsibley

Why do we do this (also in our other profile configs):

# Note: unaligned sequences are provided as "aligned" sequences to avoid an initial full-DB alignment
# as we re-align everything after subsampling.
inputs:
- name: open
metadata: "s3://nextstrain-data/files/ncov/open/metadata.tsv.gz"
aligned: "s3://nextstrain-data/files/ncov/open/sequences.fasta.xz"
skip_sanitize_metadata: true

instead of something more self-explanatory like this?

-# Note: unaligned sequences are provided as "aligned" sequences to avoid an initial full-DB alignment
-# as we re-align everything after subsampling.
 inputs:
   - name: open
     metadata: "s3://nextstrain-data/files/ncov/open/metadata.tsv.gz"
-    aligned: "s3://nextstrain-data/files/ncov/open/sequences.fasta.xz"
+    aligned: "s3://nextstrain-data/files/ncov/open/aligned.fasta.xz"
     skip_sanitize_metadata: true

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions