Skip to content

change: update 21L definitions to exclude BA.5/BA.4 from 21L#908

Closed
corneliusroemer wants to merge 5 commits intomasterfrom
corneliusroemer-patch-1
Closed

change: update 21L definitions to exclude BA.5/BA.4 from 21L#908
corneliusroemer wants to merge 5 commits intomasterfrom
corneliusroemer-patch-1

Conversation

@corneliusroemer
Copy link
Copy Markdown
Member

Added two mutations that appear only in BA.2 and not in BA.4/5:
21L (Omicron) nuc 9866 T
21L (Omicron) nuc 23040 A

Reasoning: BA.4/5 are currently considered sister lineages and not direct descendents of BA.2. Hence it makes sense to call them 21L for now.

If they become big, they may be given their own clade label 22A or similar.

Added two mutations that appear only in BA.2 and not in BA.4/5:
21L (Omicron)	nuc	9866	T
21L (Omicron)	nuc	23040	A
@corneliusroemer corneliusroemer requested review from rneher and trvrb April 8, 2022 14:45
@trvrb
Copy link
Copy Markdown
Member

trvrb commented Apr 9, 2022

@corneliusroemer: This isn't working on top of the latest ncov. I've pushed result to: https://nextstrain.org/staging/ncov/gisaid/global/21L-update

Site 9866 is polymorphic in our live 21L clade as seen here:

9866

with basal 9866C viruses.

And mutation 23040A should not be part of the 21L clade definition. 23040G is shared by Omicron BA.1, BA.2 and BA.3 viruses:

23040

Copy link
Copy Markdown
Member

@trvrb trvrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in issue

@corneliusroemer
Copy link
Copy Markdown
Member Author

corneliusroemer commented Apr 9, 2022

@trvrb

Site 9866 is polymorphic in our live 21L clade as seen here:

You are totally right about this one - I learned yesterday that 9866 is reference in most BA.2 in South Africa. Just the ones that spread globally happened to have that mutation.
This was a mistake from my end.

And mutation 23040A should not be part of the 21L clade definition. 23040G is shared by Omicron BA.1, BA.2 and BA.3 viruses

I think I disagree here. The point of this PR is not to distinguish BA.2 from BA.1/3, but to distinguish BA.2 from BA.4/5. So that 23040A is in BA.1/3 as well doesn't matter. What's important is that it isn't in both of BA.4/5.

In fact, there's no single mutation that's present in BA.2 but in none of BA.1/3/4/5 as you can see here. So we have to choose a site like this.

What I did now was add reference for 2 mutations shared by BA.4/5. I don't think there's any better way than to either say BA.2 has these at reference, or BA.2 has the one that we previously had mutated

Does it make sense? What do you prefer?

Here's the Omicron tree with BA.4/5 if you want to try to find some good mutations yourself :)

https://nextstrain.org/nextclade/sars-cov-2/2022-04-08?c=gt-nuc_12160,23018

@corneliusroemer corneliusroemer requested a review from trvrb April 9, 2022 23:51
@trvrb
Copy link
Copy Markdown
Member

trvrb commented Apr 11, 2022

@corneliusroemer: I'd like to do an actual test run of this with real data before merging. I'll kick off a trial build that spikes in BA.4 and BA.5 to confirm. You can access trial builds through GitHub Actions in this repo.

@trvrb
Copy link
Copy Markdown
Member

trvrb commented Apr 11, 2022

@corneliusroemer
Copy link
Copy Markdown
Member Author

@trvrb Of course, good idea to test! Not online yet

@trvrb
Copy link
Copy Markdown
Member

trvrb commented Apr 11, 2022

Hmm... runs have completed and it's not always the same topology as you were getting with your Nextclade tree. Take a look at https://nextstrain.org/staging/ncov/gisaid/trial/test-BA4-BA5/global?c=gt-nuc_12160&label=clade:21M%20%28Omicron%29&m=div.

You can see the yellow lineages with 12160A corresponding to BA.4 and BA.5 are resolved as separate subclades of BA.2 rather than outgroups.

Screen Shot 2022-04-11 at 1 35 35 PM

The "Africa" build resolved these as a single subclade of BA.2: https://nextstrain.org/staging/ncov/gisaid/trial/test-BA4-BA5/africa?c=gt-nuc_12160&m=div

Screen Shot 2022-04-11 at 1 38 23 PM

The "North America" build resolved BA.4 and BA.5 as outgroup to BA.2, but the placement of clade 21M was affected and 21K was dropped.

Screen Shot 2022-04-11 at 1 44 55 PM

Isn't this topology with BA.4 and BA.5 as sublineages of BA.2 basically equally parsimonious as outgrouping? I'm assuming that depending on exact strains included and sequencing artifacts will create stochastically different topologies.

Given that users (as well as ourselves) will often have spotty data, we need a clade labeling strategy that's robust to these stochastic topology outcomes.

@corneliusroemer
Copy link
Copy Markdown
Member Author

corneliusroemer commented Apr 11, 2022

Builds are done.

BA.4/5 get pulled into BA.2 - this is going to be hard to fix as 4/5 are very similar to BA.2
image

For Nextclade this is no problem, so if this is important, we could use a constraint tree for IQtree to make BA.4/5 appear outside in ncov.

21M is only on the 21K branch in the Africa build because some South African BA.3 seem to lack 23525T. This is very unusual. It's actually never wild type in all BA.3 GISAID sequences. Strange. We could pick 21M defining mutations outside of Spike region where dropout may be more common than elsewhere.

@trvrb
Copy link
Copy Markdown
Member

trvrb commented Apr 11, 2022

Are you certain that the true topology should have BA.4 and BA.5 as outgroups? Walking through mutations seems to be pretty equally parsimonious to me, but maybe I'm missing some logic.

Also, the subclade topology makes more sense from a time tree perspective, but this can be an artifact of TreeTime not using different coalescent processes for within-host vs between-host evolution.

@corneliusroemer
Copy link
Copy Markdown
Member Author

Ha, good spot with North America and 21M/K -> the reason is 27259 is not C in BA.5 in contrast to all of BA.1/2/3/4

27259C is a defining mutation for clade definitions of 21L/M/K from December. Back then we didn't know of BA.5.

So we should drop it.

We are sure BA.4/5 place outside of BA.2 in clean builds. The problem here is stochasticity and dirty sequences.

We can't avoid BA.4/5 sometimes appearing inside BA.2 due to tree building error.

If you think that means BA.4/5 should be 21L just to not confuse people that would be ok with me. In that case we would officially pull back 21L for now because BA.4/5 are quite similar to BA.2

If BA.4 and/or BA.5 get big, they can become their own clade 22A (and possibly 22B).

This may be the more stable solution. Either is fine with me.

@trvrb
Copy link
Copy Markdown
Member

trvrb commented Apr 14, 2022

Superseded by #913.

@trvrb trvrb closed this Apr 14, 2022
@trvrb trvrb deleted the corneliusroemer-patch-1 branch April 15, 2022 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

2 participants