Skip to content

kv: batch evaluation should operate on a consistent storage snapshot #55461

@nvb

Description

@nvb

Neither Engine.NewReadOnly nor Engine.NewBatch actually grabs a stable storage snapshot. Instead, they wait until iterator creation to do so, and then cache the iterator across multiple uses. This leads to two issues.

First, it is unclear when the storage snapshot is captured, making it difficult to coordinate with other state changes without more aggressive serialization using latches or the readOnlyCmdMu. Ideally, users would be able to grab a read-only/batch, check a few conditions, and then decide whether evaluation should proceed using the storage snapshot. This comes up in the context of replica destruction and in the context of MVCC GC.

Second, both of these objects cache multiple types of iterators, including a prefix iterator and a non-prefix iterator. This means that neither reader actually provides a stable storage snapshot at any point. Users have to be ready for inconsistencies to arise between different iterators pulled from the same read-only/batch. This leads to bugs like #47219, where state read from one iterator may not agree with state read from another.

Related Slack thread: https://cockroachlabs.slack.com/archives/CAC6K3SLU/p1602087652111000.

Jira issue: CRDB-3659

Metadata

Metadata

Assignees

Labels

A-kv-transactionsRelating to MVCC and the transactional model.A-storageRelating to our storage engine (Pebble) on-disk storage.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions