-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: READ_UNCOMMITTED scan observes WriteIntentError, hits assertion #47219
Description
In #47120, we saw instances of the following assertion fire:
cockroach/pkg/kv/kvserver/replica_send.go
Lines 247 to 249 in 27debe8
| case *roachpb.WriteIntentError: | |
| // Drop latches, but retain lock wait-queues. | |
| g.AssertLatches() |
At this point in the executeBatchWithConcurrencyRetries retry loop, we expect any request that hits a WriteIntentError to be holding latches. The assertion was firing because a meta2 scan, which uses the READ_UNCOMMITTED read consistency level and therefore does not acquire latches, was hitting a WriteIntentError. This should not be possible, as the MVCC layer is extra careful to never return this form of error to inconsistent reads.
Further debugging revealed that the WriteIntentError was actually coming from the ScanRequest's call into CollectIntentRows . CollectIntentRows is called to look up the provisional value associated with each intent observed by a READ_UNCOMMITTED scan. This is critical to avoid deadlock during range splits, where a range lookup may need to consider a meta2 intent to be the source of truth in terms of range addressing in order to resolve that intent.
It's completely unexpected that CollectIntentRows would be returning a WriteIntentError, because it uses MVCCGetAsTxn to lookup the provisional values of intents that we just saw during the scan. However, in the failure, we saw that the WriteIntentError was for a different transaction than the one responsible for the original intent. How could this be? Shouldn't we be working off of a consistent engine snapshot in MVCCScan?
Well, no. Read-only requests use a "read-only engine" returned from Engine.NewReadOnly. This engine by itself does not guarantee a consistent snapshot across all uses. The read-only engine contains two cached iterators, one for normal iteration and one for prefix iteration. Each iterator has an implicit snapshot of the DB, but those can be slightly different snapshots. In this case, the MVCCScan used a non-prefix iterator and the MVCCGetAsTxn used a prefix iterator. And since the READ_UNCOMMITTED scan wasn't holding latches, it had no guarantee of isolation within its keyspan across its two iterators.
So to summarize what happened here:
- a meta2
READ_UNCOMMITTEDcame in and did not acquire latches - is scanned using a non-prefix iterator and hit an intent
- it passed this intent to
CollectIntentRows - this used a different prefix iterator and hit a different intent
- some other write must have slipped in and resolved and replaced the first intent with the second intent between the two iterator creations
- the scan returned a WriteIntentError to the concurrency manager
- the assertion blew up