Assembled SARS-CoV-2 viral genomes may be classified as belonging to a particular phylogenetic lineage using Pangolin. Lineage designations evolve and change as the virus does, meaning that a sequence may be assigned to different lineages at different points in time depending on the models in use at the time. For this reason, lineage assignment should be rerun regularly as Pangolin's underlying models are updated to ensure that those lineage assignments are up-to-date with the latest available information.
The input to the workflow is an array of paths to assembled SARS-CoV-2 genomes.
For details on preparing the inputs.json file, see the corresponding section in the README.
- If the workflow is being run locally, you should first download the assembly files, and use absolute paths to the files in the
inputs.jsonfile (or paths relative to the directory from which you are executing the workflow). - If the workflow is being run on AWS (e.g. via the Amazon Genomics CLI),
s3://URLs to the files in thes3://dnastack-covid-19-sra-datacan be used instead, and local download is not required. Note that if the workflow is run on AWS, cloud compute and storage costs will be incurred
Each of the example_inputs.*.json files in this directory will run the workflow using 10 samples as input.
- The
example_inputs.local.jsonfile should be used when running the workflow locally - The
example_inputs.aws.jsonfile should be used when running the workflow in AWS
To run locally, first download the files from s3:
mkdir data
for sample in ERR4082025 ERR4082026 ERR4082027 ERR4082028 ERR4082029 ERR4082030 ERR4082031 ERR4082032 ERR4082033 ERR4082034; do
aws s3 cp s3://dnastack-covid-19-sra-data/Illumina_PE/fasta/${sample}/${sample}.consensus.fasta ./data/ --no-sign-request
done
Running on AWS will not require any additional steps; data will be pulled directly from s3 as part of workflow execution.
The workflow outputs a CSV file containing a lineage assignment for each of the assembled genomes input to the workflow.
See the README for details on running the tutorial workflows, either locally or in the AWS cloud.