-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: Raft state corruption when receiving snapshot #39604
Description
Currently on master, acceptance/bank/cluster-recovery would occasionally fail after (*Replica).applySnapshot’s call to r.store.engine.IngestExternalFiles.
Adding some logging statements around this area, it looks like the old Raft entries are not being deleted which causes the last index to diverge from the truncated index:
I190810 22:37:23.147087 398 storage/replica_raftstorage.go:931 [n2,s2,r19/2:/Table/2{3-4}] [jeffreyxiao][BEFORE INGESTION] LAST INDEX: 31; LAST TERM: 0; TRUNCATED STATE: index:31 term:7
W190810 22:37:23.147167 398 storage/engine/rocksdb.go:116 [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190810 22:37:23.152563 17 storage/engine/rocksdb.go:116 [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190810 22:37:23.153370 398 storage/engine/rocksdb.go:116 [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190810 22:37:23.156833 398 storage/engine/rocksdb.go:116 [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190810 22:37:23.162646 398 storage/replica_raftstorage.go:943 [n2,s2,r19/2:/Table/2{3-4}] [jeffreyxiao][AFTER INGESTION] LAST INDEX: 20; TRUNCATED STATE: {31 7}
The deletion of the Raft entries should be handled by the range deletion of the unreplicated range-id SST:
cockroach/pkg/storage/replica_raftstorage.go
Lines 847 to 853 in b320ff5
| // Clearing the unreplicated state. | |
| unreplicatedPrefixKey := keys.MakeRangeIDUnreplicatedPrefix(r.RangeID) | |
| unreplicatedStart := engine.MakeMVCCMetadataKey(unreplicatedPrefixKey) | |
| unreplicatedEnd := engine.MakeMVCCMetadataKey(unreplicatedPrefixKey.PrefixEnd()) | |
| if err = unreplicatedSST.ClearRange(unreplicatedStart, unreplicatedEnd); err != nil { | |
| return errors.Wrapf(err, "error clearing range of unreplicated SST writer") | |
| } |
However, when I replace this range deletion tombstone with point deletions, I'm unable to reproduce the failure on acceptance/bank/cluster-recovery (~80 successful runs), which leads me to believe that it's some strange happening with the ingestion of range deletion tombstones.