Skip to content

kvserver: consistency check should only checkpoint relevant range #90543

@erikgrinaker

Description

@erikgrinaker

When the consistency checker detects a range inconsistency, it takes a storage checkpoint on all nodes with range replicas. These are hardlinks, so they're a cheap copy of the entire database. However, over time, as data is written, this copy will consume as much space as the main database. This can easily run the node out of disk rather rapidly.

We should consider only taking a checkpoint of the SSTs that are relevant to the replica instead, to avoid running the node out of disk. This should be made available as a Pebble database for those SSTs, so that usual debug tooling can be used to investigate it. It also needs to contain the relevant manifest history.

Note that we specifically don't want to export the KV pairs of the replica, since we often need the LSM structure for debugging, e.g. due to Pebble compaction bugs.

Jira issue: CRDB-20829

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)E-quick-winLikely to be a quick win for someone experienced.N-followupNeeds followup.O-postmortemOriginated from a Postmortem action item.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions