feat: add incremental sequence alignment by ivan-aksamentov · Pull Request #237 · nextstrain/ncov-ingest

ivan-aksamentov · 2021-11-29T12:52:04Z

Description of proposed changes

During daily ingest, extracted full fasta file is being filtered by whether sequence name is already present in the nextclade.tsv file from S3, which is an accumulation of all Nextclade outputs produced during previous ingests. The sequences that are not present in nextclade.tsv are considered new and are passed to Nextclade for processing.

Nextclade aligns these sequences on every run. However currently alignment is ignored (not emitted).

In this PR:

I add --output-fasta flag to Nextclade invocation, so that the nucleotide alignment is emitted for the new sequences into data/gisaid/nextclade.aligned.new.fasta.
The old "cache" of aligned sequences is downloaded from S3

nextclade.aligned.fasta.xz

as

data/gisaid/nextclade.aligned.old.fasta

Then I concatenate the 2 fasta files to get the new

data/gisaid/nextclade.aligned.fasta

and is uploaded back to S3.

That is, this file will then accumulate aligned sequences after each daily ingest.
This behavior is similar to the concatenating nextclade.tsv for new sequences to the accumulator nextclade.tsv from S3.

For the very first run, I generated the nextclade.aligned.fasta.xz by running the Nextclade full run.

TODO:

Related issue(s)

Testing

This is what I did to test alignment for GISAID (Upd: this was before mutation summary was added):

2021-12-02: ran "Nextclade full run" on AWS Batch to generate alignment for the entire GISAID (appeared as nextclade.aligned.fasta.xz in a special directory on S3)
copied the alignment (nextclade.aligned.fasta.xz) to the root of the bucket, where the modified daily ingest can find it
2021-12-03: waited until GISAID data updated
2021-12-03: ran daily GISAID locally, checked that the resulting concatenated fasta file has grown. I used seqkit stat to count number of sequences.

jameshadfield · 2021-12-02T06:23:48Z

This is great - thanks for starting this @ivan-aksamentov. Let me know if I can help to convert to Snakemake.

This should allow us to get rid of the preprocessing step (from nextstrain/ncov) and simply start phylogenetic builds from the alignment produced here. This will be a huge improvement! The remaining steps which would be needed to achieve this (not necessarily for this PR):

There's a mutation summary computed from the alignment and other files produced by nextalign/nextclade. I think we'd want to calculate this here in ncov-ingest. cc @emmahodcroft
Filtering (mainly QC)

Finally, on today's call we also discussed indexing this alignment here, but that may also be best left for a subsequent PR.

emmahodcroft · 2021-12-02T10:12:05Z

There's a mutation summary computed from the alignment and other files produced by nextalign/nextclade. I think we'd want to calculate this here in ncov-ingest.

Agreed, if we're doing other things in Nextclade here, we might as well gather up more information!
One thing I'm a little less clear about is whether there's any difference between the mutation summary that ends up back in the metadata & the mutation_summary.tsv file - which is what I currently use, at least. I could make the switch to using the one in metadata (if someone can let me know any key differences) - but it would be super nice if this could wait until things die down a bit from Omicron 😅

ivan-aksamentov · 2021-12-03T05:22:59Z

I just pushed all the necessary steps to produce *nucleotide* alignment and made a full run on AWS + daily ingest locally (for GISAID only). See the updated first message.

However, I just realized that Nextalign in ncov also produces and uses:

peptides: controlled by --output-basename --output-dir and emitted as <output-dir>/<output-basename>.gene.<gene_name>.fasta
insertions: emitted into --output-insertions as <base-name>.insertions.csv

See:
https://github.com/nextstrain/ncov/blob/cb6361a6da154d88c9f7ea3ac2aff4601f679a46/workflow/snakemake_rules/main_workflow.smk#L99-L102

So these folks also need to be packaged into the cache, I guess...

@jameshadfield @rneher Can you tell if some of these steps related to peptides and insertions can be moved from ncov into ingest to reduce the amount of things to be cached? I don't think peptides are used raw, they are probably summarized into something? mutation_summary.tsv Emma mentioned? Why does it live separately? Maybe we can just join this stuff into metadata? And what is the story with these indexes you are talking about?

Alternatively, we could declare caching of alignment in ingest a mistake and instead do the caching in ncov, whatever is simpler. Or what is a good solution here?

Either way, the diff in this PR is very small, basically:

1 flag added to Nextclade
cat the fastas
download/upload to S3

Porting to Snakemake (either new ingest or ncov) should be straightforward.

ivan-aksamentov · 2021-12-03T10:32:41Z

I added similar logic for peptide alignment to daily ingests. It's on branch feat/incremental-alignment-with-peptides. This is basically the same as for what was added for nuc alignment, but with a for loop over peptide fastas (for every gene).

The tricky part with peptides is the full runs. Because of batching, joining many fasta files gets very hairy in bash. I haven't figured it yet. My current hope is that someone shows up here and says that we don't need to cache peptides, and can just calculate and append the required summarized data to metadata instead. Pretty please? :)

Just realized that we might not need to cache insertions, because they are already in the metadata and are exactly the same thing as in insertions.csv. I hope ncov can use Nextclade's insertions from metadata instead of from insertions.csv.

jameshadfield · 2021-12-06T05:06:29Z

Can you tell if some of these steps related to peptides and insertions can be moved from ncov into ingest to reduce the amount of things to be cached?

The insertions, translations etc for the entire alignment are only needed to produce the mutation summary. This is currently part of the ncov preprocessing workflow (note that this is different from rule build_mutation_summary which is done on the subsampled alignments, but both rules use the same python script).

The mutation summary can be shifted over to ncov-ingest as we are the only consumers of it, and this is what I recommend. This allows ncov-ingest to upload the alignment & mutation summary without needing to upload insertions / peptides etc.

P.S. I improved the efficiency of the mutation summary script recently so it shouldn't be the end of the world to simply recompute this each time for the entire alignment; at the very least it shouldn't block this PR.

And what is the story with these indexes you are talking about?

The indexing of the alignment (often done behind-the-scenes by augur filter) is taking a long time. Ideally this could be computed here (via augur index) and uploaded alongside the alignment. However there was discussion whether the index is actually needed since we have most of this information in the metadata and thus I think we ignore this for the purposes of this PR.

ivan-aksamentov · 2021-12-06T16:11:39Z

Thanks, James, @jameshadfield

The mutation summary can be shifted over to ncov-ingest as we are the only consumers of it

mutation_summary.tsv seems to be containing nuc and aa mutations per gene. metadata.tsv already contains it in a slightly different format. I'd say that we might not need mutation_summary.tsv at all. But again I don't know how it's used. I could produce it from metadata.tsv.

There's still a few questions remaining:

The whole idea is to avoid running Nextalign in ncov for GISAID sequences. Let's say we precompute nuc alignment and mutation_summary.tsv in ingest, and we don't need peptides then. But are there other remaining outputs of Nextalign in ncov that are used and need to be precomputed? For example insertions.tsv (these are also in the metadata by the way).
GISAID alignment is handled by ingest fully, okay, but how about these spike-in sequences and otherwise external non-GISAID inputs. How I imagine that: currently external unaligned fastas and metadatas are merged to the GISAID ones and and passed to Nextalign. With the cache instead we need to: pass externals to Nextalign and then merge the results. This merge might be tricky. I don't have a clear idea of how ncov should be changed to accommodate the GISAID cache (because as we discovered it goes beyond just nuc alignment).

emmahodcroft · 2021-12-06T17:23:19Z

@ivan-aksamentov

I'd say that we might not need mutation_summary.tsv at all. But again I don't know how it's used. I could produce it from metadata.tsv.

Long term you're right, we have a lot of this info (if not the same thing exactly? I haven't had time to check!) in metadata.tsv now. The plea is mostly a personal one from me - I use mutation_summary.tsv as the key part of CoVariants, and though I can make a switch longer term, it would be an enormous help if I don't have to do this right now - I've got so many balls in the air. However, if the information is the same, happy for this to just be copied out of metadata into mutation_summary.tsv. And my apologies!! 🙏 🙏

But are there other remaining outputs of Nextalign in ncov that are used and need to be precomputed? For example insertions.tsv (these are also in the metadata by the way).

James can give more detail here, but my naive assumption would be to try and keep the outputs the same as much as possible unless we're absolutely sure they're redundant now. We could perhaps do this by incrementally updating more than just the alignment, but these files as well?

GISAID alignment is handled by ingest fully, okay, but how about these spike-in sequences and otherwise external non-GISAID inputs.

At the moment we actually align twice - once the whole of GISAID in order to get a master file that we can do priorities & etc on. Then once you've picked your set to actually do a build on - because this may contain spiked in sequences - we align again. Because these are often only 10,000 or less, this is actually pretty fast. We could keep doing this.

Two other options:

We set a rule so that anything that isn't set as a GISAID or Genbank input gets aligned before getting included into the set to be 'actually built' (we already have rules to dedup & clean up such files (though fool-proof deduping obviously lies with the user - if you change all the seq names... well)) & stop aligning after the combining
We simply expect that users use a simple command to align their own spiked sets before they include them in the workflow, either manually or via a localrule that they write

Another approach is to stick with our current method (aligning twice) - though somewhat redundant - for the purposes of this PR - and then figure out a better approach in the future. We'll still be cutting hours off preprocess time.

jameshadfield · 2021-12-06T20:36:48Z

I'd say that we might not need mutation_summary.tsv at all.

I agree with Emma - while the mutation summary may not be needed long term, it's straightforward to keep it going in the short-to-medium term.

Let's say we precompute nuc alignment and mutation_summary.tsv in ingest, and we don't need peptides then. But are there other remaining outputs of Nextalign in ncov that are used and need to be precomputed? For example insertions.tsv (these are also in the metadata by the way).

No extra files needed :) I think the only files from ncov-ingest which this PR should add to the upload are

aligned.fasta.xz
mutation-summary.tsv.xz

GISAID alignment is handled by ingest fully, okay, but how about these spike-in sequences and otherwise external non-GISAID inputs ...

The current ncov workflow allows such behavior via a config file with multiple inputs. Each input is aligned (via nextalign) independently, and only if needed.

I don't have a clear idea of how ncov should be changed to accommodate the GISAID cache (because as we discovered it goes beyond just nuc alignment).

I don't think it does go beyond nuc alignment [2] and I don't think ncov needs to change here [1]. There is no merging necessary beyond what we currently have for multiple inputs, which happens after each has been aligned.

[1] There is still the matter of the rule filter (in nextstrain/ncov) which will be the last remaining part of the "preprocessing" workflow. This can be dealt with separate to this PR and there are some decisions we need to make about what to do with that rule.

[2] Things are different after we subsample, where we do use the peptides etc. But that's currently handled by a subsequent nextalign run after subsampling.

ivan-aksamentov · 2021-12-08T11:34:12Z

I added mutation summary to both full run and daily. Checked locally on a small dataset.

I had to remove auspice dependency, because we don't have it in the environment and adding it seems to be requiring gcc, which is not in the container. Auspice was only used for open_file() call. I just made it the stock open().

Now running the full run on AWS Batch to produce full alignment, nextclade.tsv and mutation_summary.tsv. If succesful, I will then wait for gisaid to update and run daily ingest to test the full cycle.

In theory, the additions to ingest-gisaid and ingest-genbank scripts can be pasted as steps to the Snakemake and should hopefully work.

The bulk of changes are in the full-run script, which is tricky due to batching and bash.

ivan-aksamentov · 2021-12-08T18:44:35Z

Alright, the full jobs succeeded.

The job ids were:

8cf3776a-ad98-4c53-8d62-0b61ec3894af
64878851-e332-4970-a417-2a0125ebe03e

GISAID outputs are the

nextclade.aligned.fasta.xz
nextclade.mutation_summary.tsv.xz
nextclade.tsv.gz

under

s3://nextstrain-ncov-private/nextclade-full-run-2021-12-08--10-24-50--UTC/

After gisaid updates tomorrow I will try to run daily ingest script with these files locally to see if fasta and summary incrementally updates correctly.

ivan-aksamentov · 2021-12-10T12:01:36Z

The mutation_summary results of local ingest seems to be okay:

$ wc -l data/gisaid/nextclade.mutation_summary.new.tsv | cut -f1 -d' '
32575

$ wc -l data/gisaid/nextclade.mutation_summary.old.tsv | cut -f1 -d' '
5718627

$ wc -l data/gisaid/nextclade.mutation_summary.tsv | cut -f1 -d' '
5751201

There is old + new - 1 lines in the result. -1 because we throw away the second header row.

I was not able to run GenBank ingest with fetch locally. After a few hours it fails with

curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to www.ncbi.nlm.nih.gov:44

Maybe we have better luck running this on usual infra.

At this point, @jameshadfield I can use your help in pasting that into the Snakefile somehow.

Summary of changes with relation to Snakemake port:

Porting: the fasta step needs to be incorporated similarly to how "join metadata" step is already there. The mutation summary is a new step. The downloads (in A) and uploads (in B) are the new steps. Note that downloads are only needed when there are new sequences.

4. bin/nextclade-full-run was adjusted to produce the alignment and mutation summary. This is symmetrical to daily increments, but a bit hairy due to the batching. Porting: take as is. Upd: done.

bin/run-nextclade

…ignment

ivan-aksamentov · 2021-12-10T16:30:14Z

I merged master into this branch, which resolved (2), (4) and (1.1).

Remaining items:

~~The call sites for bin/run-nextclade need to be changed in (1.2).~~ Upd: done
~~And (3) - convert bash changes to snakemake things.~~

With this merge I kept the ingest-gisaid and ingest-genbank scripts, so that they can be used as a reference for the new snakefile things. We should delete them after porting is done. Note: these scripts don't have the adjustments required in (1.2) (the threads params).

ivan-aksamentov · 2021-12-10T16:42:30Z

I just realized that this merge removed the diff between the old and new ingest-gisaid and ingest-genbank scripts from "Changed files" tab here on GitHub. They are now all green, because they don't not exist on merged master.

Here is what was in the diff before the merge: e29ab01...c501a89

ivan-aksamentov · 2021-12-10T18:28:17Z

Alright, I sketched all of the changes that need to be ported in Snakemake file, but haven't actually run it yet. There are probably mistakes there.

One deviation from my bash versions is that I upload new alignment and mutation summary right after they are ready. This is probably not how it should be done in good workflows, but this is convenient right now, because I simply don't know (yet) how to extend the "if there are new sequences" check until the end of the workflow. Alternatively we may simply check for presence of these files in the end of workflow and upload if they exist. But this solution is less reliable, as it may hide mistakes (cases where files should have been generated, but were not not will silently succeed)

TODO:

run the Snakemake thing and see if it even runs with my changes
if not, eliminate the mistakes, make it run locally reliably
remove the bash versions of ingest scripts, resolve/silence shellcheck warnings
try to run on AWS Batch

jameshadfield · 2021-12-15T02:57:08Z

@ivan-aksamentov I've refactored this to use snakemake checkpoints. Let's watch the actions (genbank and gisaid) to see if it works as expected.

I have not reviewed this fully, but it looks generally good. Some things I noticed:

shellcheck
is failing

Consistent fillenames

It would help (me) a lot were we to systematise the filenames we use. My understanding, based on the PR, is that:

“old” means the existing files / data on S3.
“upd” refers to sequences / metadata present in this run but not present in the “old” version of that file
“new” means “old” + “upd”

However this isn't always the case, e.g., sometimes the middle one is called "new".

ivan-aksamentov · 2021-12-15T14:14:18Z

Hey thanks @jameshadfield,

shellcheck is failing

Yeah, we need globbing in that case, so I think we just need to silence the warning on that line. But maybe there's some sort of a syntax for making it proper. It's bash, so you never know :)

It would help (me) a lot were we to systematise the filenames we use.

Yeah, I wanted to do this as well. The reason why it's different for now is that it's different in the full-run script and on S3, so changing things might be a bit awkward. We need to go through these one by one and see what is affected.

Oh no! Looks like it failed
https://github.com/nextstrain/ncov-ingest/runs/4529141570?check_suite_focus=true#step:4:358

[batch] [Wed Dec 15 05:57:02 2021]
[batch] checkpoint get_sequences_without_nextclade_annotations:
[batch]     input: data/gisaid/sequences.fasta, data/gisaid/nextclade_old.tsv
[batch]     output: data/gisaid/nextclade.sequences.fasta
[batch]     jobid: 3
[batch] Downstream jobs will be updated after completion.
[batch]         ./bin/filter-fasta             --input_fasta=data/gisaid/sequences.fasta             --input_tsv=data/gisaid/nextclade_old.tsv             --output_fasta=data/gisaid/nextclade.sequences.fasta         
[batch] Updating job 1 (upload).
[batch] WorkflowError in line 423 of /nextstrain/build/Snakefile:
[batch] Can only use unpack() on list and dict

I will try to read and understand what's going on.

Snakefile

bin/run-nextclade-full

…ignment

Snakefile

ivan-aksamentov · 2021-12-16T11:09:49Z

The "quotation mark" jobs also failed, because, as I mentioned above, Nextclade accepts comma-delimited gene list, but mutation summary only accepts space-delimited list. And if not, then this happens:

[batch]   File "./bin/mutation-summary", line 107, in <module>
[batch]     gene = (set(fname.split('.')) & genes).pop()
[batch] KeyError: 'pop from an empty set'

I made it space delimited. Actions:
https://github.com/nextstrain/ncov-ingest/actions/runs/1587080622
https://github.com/nextstrain/ncov-ingest/actions/runs/1587080625

We could also modify mutation-summary script, but this is a bit out of scope of this PR and will make an exact copy of this script from ncov into an "almost exact" copy, which feels awkward to do. What is the best way to "commonize" this functionality across repos? (make it shared)

ivan-aksamentov · 2021-12-16T20:53:39Z

In the latest news, the runs with the

fix: use space-delimited list of genes for mutation-summary script
fix: use xz compression, not gz for mutation summary

succeeded for GenBank, but GISAID download keeps failing today (Slack thread), so inconclusive result again.

ivan-aksamentov · 2021-12-20T13:53:49Z

Alright, GISAID have now ran successfully as well. I think this is ready to be reviewed.

I suggest we run this version in parallel to the master flow and at the same time, we could try to use the new results on a dev version of ncov, or perhaps on staging, to see how the new flow works end-to-end. And then if it's good, we can make it the default.

Before merge, a full run should be performed to update the caches.

Snakefile

…ignment

nextclade.aligned.fasta -> aligned.fasta nextclade.mutation_summary.tsv -> mutation_summary.tsv

ivan-aksamentov · 2022-01-07T06:14:32Z

Last changes before merging:

Rename

nextclade.aligned.fasta -> aligned.fasta
nextclade.mutation_summary.tsv -> mutation-summary.tsv

i.e. no nextclade. prefix and mutation summary with a dash, not underscore. This is to align with the filenames on S3.

ivan-aksamentov · 2022-01-07T06:33:43Z

The plan on bringing this to production:

(Slack thread: https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1641427855047000)

Make a Nextclade full run

Done. AWS Batch Job IDs:

b0d9fa0d-fdae-432a-8656-01435d1364f3
ffcc6adc-6b6d-4f3c-bdc4-05a0a0d7e857

Wait for result in

s3://nextstrain-ncov-private/nextclade-full-run-2022-01-06--08-39-23--UTC/
s3://nextstrain-data/files/ncov/open/nextclade-full-run-2022-01-06--08-39-22--UTC/

Note that full fun was started before renaming. So when copying, I will also need to rename (see below).

Copy results of a full run to the location of daily ingest, adjusting filenames to what's already on S3, and overwriting

s3://nextstrain-ncov-private/nextclade-full-run-2022-01-06--08-39-23--UTC/nextclade.aligned.fasta.xz -> s3://nextstrain-ncov-private/aligned.fasta.xz

s3://nextstrain-ncov-private/nextclade-full-run-2022-01-06--08-39-23--UTC/nextclade.mutation_summary.tsv.xz -> s3://nextstrain-ncov-private/mutation-summary.tsv.xz

s3://nextstrain-ncov-private/nextclade-full-run-2022-01-06--08-39-23--UTC/nextclade.tsv.gz -> s3://nextstrain-ncov-private/nextclade.tsv.gz

s3://nextstrain-data/files/ncov/open/nextclade-full-run-2022-01-06--08-39-22--UTC/nextclade.aligned.fasta.xz -> aligned.fasta.xz

s3://nextstrain-data/files/ncov/open/nextclade-full-run-2022-01-06--08-39-22--UTC/nextclade.mutation_summary.tsv.xz -> s3://nextstrain-data/files/ncov/open/mutation-summary.tsv.xz

s3://nextstrain-data/files/ncov/open/nextclade-full-run-2022-01-06--08-39-22--UTC/nextclade.tsv.gz -> s3://nextstrain-data/files/ncov/open/nextclade.tsv.gz

Merge this PR
Run "GISAID fetch and ingest" and "GenBank fetch and ingest" GitHub actions to perform daily ingest with the new code and fresh cache.

Launched at 06:24 UTC

https://github.com/nextstrain/ncov-ingest/actions/runs/1666402303
https://github.com/nextstrain/ncov-ingest/actions/runs/1666402782

AWS Batch Job IDs:
```
045eaf99-4a4b-4186-a7bc-f8548a00638c
7138b2aa-9702-4f1c-892c-425f2c19b00d
```

* move `aligned.fasta.xz` to list of `ncov-ingest` produced files since this is now output from Nextclade in the daily ingest pipeline¹ * remove `mutation-summary.tsv.xz` since this is no longer updated in the daily ingest pipeline² ¹ nextstrain/ncov-ingest#237 ² nextstrain/ncov-ingest@05cd82f

Removed checkpoint because it only slows down the pipeline without any of the past benefits. It was initially added because Nextclade v1 used to error on an empty FASTA input and we needed the checkpoint to check if we should generate a new mutation summary¹. Now, running Nextclade v2 on a no-op file takes ~1 minute and we no longer use the mutation summary. ¹ #237 (comment)

ivan-aksamentov force-pushed the feat/incremental-alignment branch from 2374b4b to 3088af4 Compare December 3, 2021 04:34

feat: add incremental sequence alignment [no ci]

cb55fef

ivan-aksamentov force-pushed the feat/incremental-alignment branch from 3088af4 to cb55fef Compare December 3, 2021 09:57

feat: calculate mutation summary

ff0f309

corneliusroemer assigned ivan-aksamentov Dec 8, 2021

fix: path to mutation_summary file on s3

c501a89

ivan-aksamentov commented Dec 10, 2021

View reviewed changes

bin/run-nextclade Show resolved Hide resolved

Merge remote-tracking branch 'origin/master' into feat/incremental-al…

5970950

…ignment

refactor: adjust run-nextclade invocation with the new args [no ci]

7c44030

ivan-aksamentov force-pushed the feat/incremental-alignment branch 2 times, most recently from e06b768 to d16065f Compare December 10, 2021 17:27

refactor: port incremental alignment to snakemake [no ci]

249a2da

ivan-aksamentov force-pushed the feat/incremental-alignment branch from d16065f to 249a2da Compare December 10, 2021 17:27

ivan-aksamentov added 2 commits December 10, 2021 18:49

refactor: port mutation_summary step to snakemake [no ci]

c9f608e

refactor: port upload steps to snakemake [no ci]

70bc628

fix: correct snakemake syntax

e3ad535

ivan-aksamentov added 3 commits December 15, 2021 15:25

fix: add missing return

784dbfa

chore: silence shellcheck warning

2704a0d

chore: remove unused bash scripts

8cecc0f

ivan-aksamentov commented Dec 15, 2021

View reviewed changes

Snakefile Show resolved Hide resolved

ivan-aksamentov commented Dec 15, 2021

View reviewed changes

bin/run-nextclade-full Show resolved Hide resolved

ivan-aksamentov added 2 commits December 15, 2021 18:13

Merge remote-tracking branch 'origin/master' into feat/incremental-al…

71875aa

…ignment

fix: remove stray quotation mark

485e753

jameshadfield reviewed Dec 15, 2021

View reviewed changes

Snakefile Show resolved Hide resolved

fix: use space-delimited list of genes for mutation-summary script

a24ec2a

fix: use xz compression, not gz for mutation summary

b2ffc76

ivan-aksamentov requested a review from a team December 20, 2021 13:54

ivan-aksamentov marked this pull request as ready for review December 20, 2021 13:54

ivan-aksamentov commented Dec 20, 2021

View reviewed changes

Snakefile Show resolved Hide resolved

ivan-aksamentov added 3 commits January 6, 2022 09:33

Merge remote-tracking branch 'origin/master' into feat/incremental-al…

35bf0e2

…ignment

feat: rename alignment and mutations summary outputs [no ci]

08c59dd

nextclade.aligned.fasta -> aligned.fasta nextclade.mutation_summary.tsv -> mutation_summary.tsv

feat: rename mutation_summary to mutation-summary [no ci]

1b1f0d5

ivan-aksamentov merged commit 94332fd into master Jan 7, 2022

ivan-aksamentov deleted the feat/incremental-alignment branch January 7, 2022 06:20

danrlu mentioned this pull request Jan 19, 2022

Move filter after subsampling nextstrain/ncov#814

Merged

7 tasks

huddlej mentioned this pull request May 13, 2022

Only trigger preprocess if ingest if there are new sequences or new annotations #235

Closed

ivan-aksamentov mentioned this pull request Nov 2, 2022

feat: scrap checkpoint and parallelize s3 uploads #355

Merged

Conversation

ivan-aksamentov commented Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of proposed changes

Related issue(s)

Testing

Uh oh!

jameshadfield commented Dec 2, 2021

Uh oh!

emmahodcroft commented Dec 2, 2021

Uh oh!

ivan-aksamentov commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivan-aksamentov commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameshadfield commented Dec 6, 2021

Uh oh!

ivan-aksamentov commented Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emmahodcroft commented Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameshadfield commented Dec 6, 2021

Uh oh!

ivan-aksamentov commented Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivan-aksamentov commented Dec 8, 2021

Uh oh!

ivan-aksamentov commented Dec 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ivan-aksamentov commented Dec 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivan-aksamentov commented Dec 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivan-aksamentov commented Dec 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameshadfield commented Dec 15, 2021

Uh oh!

ivan-aksamentov commented Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivan-aksamentov commented Dec 16, 2021

Uh oh!

ivan-aksamentov commented Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivan-aksamentov commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ivan-aksamentov commented Jan 7, 2022

Uh oh!

ivan-aksamentov commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ivan-aksamentov commented Nov 29, 2021 •

edited

Loading

ivan-aksamentov commented Dec 3, 2021 •

edited

Loading

ivan-aksamentov commented Dec 3, 2021 •

edited

Loading

ivan-aksamentov commented Dec 6, 2021 •

edited

Loading

emmahodcroft commented Dec 6, 2021 •

edited

Loading

ivan-aksamentov commented Dec 8, 2021 •

edited

Loading

ivan-aksamentov commented Dec 10, 2021 •

edited

Loading

ivan-aksamentov commented Dec 10, 2021 •

edited

Loading

ivan-aksamentov commented Dec 10, 2021 •

edited

Loading

ivan-aksamentov commented Dec 10, 2021 •

edited

Loading

ivan-aksamentov commented Dec 15, 2021 •

edited

Loading

ivan-aksamentov commented Dec 16, 2021 •

edited

Loading

ivan-aksamentov commented Dec 20, 2021 •

edited

Loading

ivan-aksamentov commented Jan 7, 2022 •

edited

Loading