-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: switching from ZONE to REGION survival causes unexpected data movement #63810
Description
When switching from ZONE to REGION survival in a 3 node cluster, only a single snapshot is necessary per range. This is because we switch from a topology that looks like:
region 1: voter (leaseholder), voter, voter
region 2: non-voter
region 3: non-voter
to a topology that looks like:
region 1: voter (leaseholder), voter
region 2: voter, voter
region 3: voter
So if both non-voters are promoted to voters, there should only be one snapshot necessary. Furthermore, we could do something smart about how we send that snapshot to avoid the WAN traffic - #42491. But let's ignore that for now.
In one of my tests, this is not what I saw. After switching from ZONE to REGION survivability, each range took the following steps:
1. add new voter in region 2
2. add new voter in region 3
3. remove non-voter in region 2
4. remove non-voter in region 3
5. move voter from region 1 to region 2
This resulted in a total of 3 range snapshots all sent over the WAN. This is a decent amount of wasted data movement, given that we had two perfectly good non-voting replicas that we could have promoted. Do we understand why we made these decisions?
r6455_manual_enqueue_logs.txt
r6455 Range _ Debug _ Cockroach Console Before.pdf
r6455 Range _ Debug _ Cockroach Console After.pdf
Here's the log from a second instance that hurts even more because it includes a non-voter that is deleted and then is quickly replaced by a voter on the same node.
Note: this is an inefficiency, but certainly nothing that we need to rush to fix for v21.1.0. Everything still worked, it was just not as optimal as I was hoping.
Jira issue: CRDB-6780