change: update 21L definitions to exclude BA.5/BA.4 from 21L#908
change: update 21L definitions to exclude BA.5/BA.4 from 21L#908corneliusroemer wants to merge 5 commits intomasterfrom
Conversation
Added two mutations that appear only in BA.2 and not in BA.4/5: 21L (Omicron) nuc 9866 T 21L (Omicron) nuc 23040 A
|
@corneliusroemer: This isn't working on top of the latest Site 9866 is polymorphic in our live 21L clade as seen here: with basal 9866C viruses. And mutation 23040A should not be part of the 21L clade definition. 23040G is shared by Omicron BA.1, BA.2 and BA.3 viruses: |
You are totally right about this one - I learned yesterday that 9866 is reference in most BA.2 in South Africa. Just the ones that spread globally happened to have that mutation.
I think I disagree here. The point of this PR is not to distinguish BA.2 from BA.1/3, but to distinguish BA.2 from BA.4/5. So that 23040A is in BA.1/3 as well doesn't matter. What's important is that it isn't in both of BA.4/5. In fact, there's no single mutation that's present in BA.2 but in none of BA.1/3/4/5 as you can see here. So we have to choose a site like this. What I did now was add reference for 2 mutations shared by BA.4/5. I don't think there's any better way than to either say BA.2 has these at reference, or BA.2 has the one that we previously had mutated Does it make sense? What do you prefer? Here's the Omicron tree with BA.4/5 if you want to try to find some good mutations yourself :) https://nextstrain.org/nextclade/sars-cov-2/2022-04-08?c=gt-nuc_12160,23018 |
|
@corneliusroemer: I'd like to do an actual test run of this with real data before merging. I'll kick off a trial build that spikes in BA.4 and BA.5 to confirm. You can access trial builds through GitHub Actions in this repo. |
|
Running now from branch When complete trees will be visible at: |
|
@trvrb Of course, good idea to test! Not online yet |
|
Hmm... runs have completed and it's not always the same topology as you were getting with your Nextclade tree. Take a look at https://nextstrain.org/staging/ncov/gisaid/trial/test-BA4-BA5/global?c=gt-nuc_12160&label=clade:21M%20%28Omicron%29&m=div. You can see the yellow lineages with 12160A corresponding to BA.4 and BA.5 are resolved as separate subclades of BA.2 rather than outgroups. The "Africa" build resolved these as a single subclade of BA.2: https://nextstrain.org/staging/ncov/gisaid/trial/test-BA4-BA5/africa?c=gt-nuc_12160&m=div The "North America" build resolved BA.4 and BA.5 as outgroup to BA.2, but the placement of clade Isn't this topology with BA.4 and BA.5 as sublineages of BA.2 basically equally parsimonious as outgrouping? I'm assuming that depending on exact strains included and sequencing artifacts will create stochastically different topologies. Given that users (as well as ourselves) will often have spotty data, we need a clade labeling strategy that's robust to these stochastic topology outcomes. |
|
Are you certain that the true topology should have BA.4 and BA.5 as outgroups? Walking through mutations seems to be pretty equally parsimonious to me, but maybe I'm missing some logic. Also, the subclade topology makes more sense from a time tree perspective, but this can be an artifact of TreeTime not using different coalescent processes for within-host vs between-host evolution. |
|
Ha, good spot with North America and 21M/K -> the reason is 27259 is not C in BA.5 in contrast to all of BA.1/2/3/4 27259C is a defining mutation for clade definitions of 21L/M/K from December. Back then we didn't know of BA.5. So we should drop it. We are sure BA.4/5 place outside of BA.2 in clean builds. The problem here is stochasticity and dirty sequences. We can't avoid BA.4/5 sometimes appearing inside BA.2 due to tree building error. If you think that means BA.4/5 should be 21L just to not confuse people that would be ok with me. In that case we would officially pull back 21L for now because BA.4/5 are quite similar to BA.2 If BA.4 and/or BA.5 get big, they can become their own clade 22A (and possibly 22B). This may be the more stable solution. Either is fine with me. |
|
Superseded by #913. |






Added two mutations that appear only in BA.2 and not in BA.4/5:
21L (Omicron) nuc 9866 T
21L (Omicron) nuc 23040 A
Reasoning: BA.4/5 are currently considered sister lineages and not direct descendents of BA.2. Hence it makes sense to call them 21L for now.
If they become big, they may be given their own clade label 22A or similar.