kv: READ_UNCOMMITTED scan observes WriteIntentError, hits assertion

In https://github.com/cockroachdb/cockroach/pull/47120, we saw instances of the following assertion fire:
https://github.com/cockroachdb/cockroach/blob/27debe8006fc99a8e9db9bae6407796a2d6e6f23/pkg/kv/kvserver/replica_send.go#L247-L249

At this point in the `executeBatchWithConcurrencyRetries` retry loop, we expect any request that hits a `WriteIntentError` to be holding latches. The assertion was firing because a meta2 scan, which uses the `READ_UNCOMMITTED` read consistency level and therefore [does not acquire latches](https://github.com/cockroachdb/cockroach/blob/27debe8006fc99a8e9db9bae6407796a2d6e6f23/pkg/kv/kvserver/concurrency/concurrency_manager.go#L221-L223), was hitting a `WriteIntentError`. This should not be possible, as the MVCC layer is [extra careful](https://github.com/cockroachdb/cockroach/blob/27debe8006fc99a8e9db9bae6407796a2d6e6f23/pkg/storage/rocksdb.go#L2529) to never return this form of error to inconsistent reads.

Further debugging revealed that the `WriteIntentError` was actually coming from the [`ScanRequest`'s call](https://github.com/cockroachdb/cockroach/blob/27debe8006fc99a8e9db9bae6407796a2d6e6f23/pkg/kv/kvserver/batcheval/cmd_scan.go#L79) into [`CollectIntentRows `](https://github.com/cockroachdb/cockroach/blob/27debe8006fc99a8e9db9bae6407796a2d6e6f23/pkg/kv/kvserver/batcheval/intent.go#L29). `CollectIntentRows` is called to look up the provisional value associated with each intent observed by a `READ_UNCOMMITTED` scan. This is critical to avoid deadlock during range splits, where a range lookup may need to consider a meta2 intent to be the source of truth in terms of range addressing in order to resolve that intent.

It's completely unexpected that `CollectIntentRows` would be returning a `WriteIntentError`, because it uses `MVCCGetAsTxn` to lookup the provisional values of intents that we just saw during the scan. However, in the failure, we saw that the `WriteIntentError` was for a different transaction than the one responsible for the original intent. How could this be? Shouldn't we be working off of a consistent engine snapshot in `MVCCScan`?

Well, no. Read-only requests use a "read-only engine" returned from `Engine.NewReadOnly`. This engine by itself *does not* guarantee a consistent snapshot across all uses. The read-only engine contains two cached iterators, one for normal iteration and one for prefix iteration. Each iterator has an implicit snapshot of the DB, but those can be slightly different snapshots. In this case, the `MVCCScan` used a non-prefix iterator and the `MVCCGetAsTxn` used a prefix iterator. And since the `READ_UNCOMMITTED` scan wasn't holding latches, it had no guarantee of isolation within its keyspan across its two iterators.

So to summarize what happened here:
- a meta2 `READ_UNCOMMITTED` came in and did not acquire latches
- is scanned using a non-prefix iterator and hit an intent
- it passed this intent to `CollectIntentRows`
- this used a different prefix iterator and hit a different intent
- some other write must have slipped in and resolved and replaced the first intent with the second intent between the two iterator creations
- the scan returned a WriteIntentError to the concurrency manager
- the assertion blew up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: READ_UNCOMMITTED scan observes WriteIntentError, hits assertion #47219

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	case *roachpb.WriteIntentError:
	// Drop latches, but retain lock wait-queues.
	g.AssertLatches()

kv: READ_UNCOMMITTED scan observes WriteIntentError, hits assertion #47219

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions