-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: inconsistent committed entries in Raft log #36353
Description
I've got a scary one.
Check out the tail of this cockroach debug raft-log on the leader, n1:
0.000000000,0 /Local/RangeID/7828/u/RaftLog/logIndex:18 (0x0169f71e94757266746c000000000000001200): Term:8 Index:18 by lease #4
proposer_replica:<node_id:1 store_id:1 replica_id:3 > proposer_lease_sequence:4 max_lease_index:4 replicated_eval_result:<timestamp:<wall_time:1553825991176579823 > delta:<last_update_nanos:1553825991176579823 live_bytes:34914540 live_count:104618 key_bytes:2126461 key_count:104618 val_bytes:32788079 val_count:104618 > add_sstable:<crc32:4108212582 > > write_batch:<data:"\000\000\000\000\000\000\000\000\000\000\000\000" >
write batch:
Now look at it on any of the followers (say, n2):
0.000000000,0 /Local/RangeID/7828/u/RaftLog/logIndex:18: Term:6 Index:18 by lease #0
proposer_replica:<node_id:0 store_id:0 replica_id:0 > replicated_eval_result:<timestamp:<> delta:<> >
Same index, different term, very different data.
The leader has what I have reason to believe is an AddSStable (so, there's sideloaded data in there), where the followers have a zero value, which seems to have come back from the data because there's a zero value at term 6 at index 12 (consistent with the leader).
First question: are these zero values supposed to occur?
The range debug page claims that index 18 is a committed entry (18 is the "Last Index" and the "Commit" and "Applied" across all the replicas).
The log size reported by that page differs between the replicas - 33.7MB on the leader, 929B on the followers.
The way I started looking into this cluster and this range is because a workload fixtures import got blocked with one particular AddSSTable command being reproposed over and over again, and each time being refused by Raft because:
I190329 16:55:57.578308 1024 storage/replica_raft.go:420 [n1,s1,r7828/3:/Table/59/1/{1/4612-7/50294}] proposing command 3ece3298cc9d63af: 1 AddSstable
I190329 16:55:57.601162 1024 vendor/go.etcd.io/etcd/raft/raft.go:655 [n1,s1,r7828/3:/Table/59/1/{1/4612-7/50294}] 3 appending new entries to log would exceed uncommitted entry size limit; dropping proposal
There should be no uncommitted entries and so the command in question should be allowed to go through, but somehow the stats maintained by the Raft lib seem to have gotten corrupted. How that's related to the inconsistent log exactly, I'm not sure, but it has to be related somehow.
@ajwerner had a theory that perhaps the cache we have around raft log entries might be to blame, cause we don't cache entries with sideloaded data. So far a smoking gun was not found.
Given the leader's belief that some AddSSTable was applied for that range, it's seems interesting that a reproposal an AddSSTable is ongoing on that range. Cause my understanding about imports is that we only expect one AddSSTable per range - so if it succeeded, what's up with the reproposals?
r7828 Range _ Debug _ Cockroach Console.pdf
[n2-raft-log-7828.txt]
(https://github.com/cockroachdb/cockroach/files/3024648/n2-raft-log-7828.txt)
n1-raft-log-7828.txt
Debug zip:
https://drive.google.com/open?id=1bsJfYduXnZqJ7_u6dMBZDhSNyus_4qgw
The cluster is andrei-repro-1553814655-02-n7cpu72. The range in question is 7828.
I've extended the cluster, stopped the consistency queue so that it doesn't kill anything. The roachtest and workload invocation will timeout in a few hours, but hopefully they won't stop that command from being endlessly reproposed. And the cluster should be left alive. I've also copied all the data dirs away and gotten a debug.zip.
More looking to come.