-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: consistency check should only checkpoint relevant range #90543
Description
When the consistency checker detects a range inconsistency, it takes a storage checkpoint on all nodes with range replicas. These are hardlinks, so they're a cheap copy of the entire database. However, over time, as data is written, this copy will consume as much space as the main database. This can easily run the node out of disk rather rapidly.
We should consider only taking a checkpoint of the SSTs that are relevant to the replica instead, to avoid running the node out of disk. This should be made available as a Pebble database for those SSTs, so that usual debug tooling can be used to investigate it. It also needs to contain the relevant manifest history.
Note that we specifically don't want to export the KV pairs of the replica, since we often need the LSM structure for debugging, e.g. due to Pebble compaction bugs.
Jira issue: CRDB-20829