-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: batch evaluation should operate on a consistent storage snapshot #55461
Description
Neither Engine.NewReadOnly nor Engine.NewBatch actually grabs a stable storage snapshot. Instead, they wait until iterator creation to do so, and then cache the iterator across multiple uses. This leads to two issues.
First, it is unclear when the storage snapshot is captured, making it difficult to coordinate with other state changes without more aggressive serialization using latches or the readOnlyCmdMu. Ideally, users would be able to grab a read-only/batch, check a few conditions, and then decide whether evaluation should proceed using the storage snapshot. This comes up in the context of replica destruction and in the context of MVCC GC.
Second, both of these objects cache multiple types of iterators, including a prefix iterator and a non-prefix iterator. This means that neither reader actually provides a stable storage snapshot at any point. Users have to be ready for inconsistencies to arise between different iterators pulled from the same read-only/batch. This leads to bugs like #47219, where state read from one iterator may not agree with state read from another.
Related Slack thread: https://cockroachlabs.slack.com/archives/CAC6K3SLU/p1602087652111000.
Jira issue: CRDB-3659