-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: txn giving up on refresh span collection causes closed ts to kick it out #44645
Description
Found by user:
- txn starts
- txn has a lot of operations whereby they exceed max_refresh_span_bytes and refresh span collection stops
- txn lasts for more than 30s
- closed ts "Catches up", doesn't find refresh spans and "kicks the txn out" (pushes it and client receives an error)
- the error is not the usual retry error because it is not caused by contention, but the error message does not clarify what is happening
There are three separate issues here:
-
we want a larger default for max_refresh_span_bytes so that the scenario becomes less likely. This is predicated on better memory tracking in KV, a separate work item (planned for 20.1, see the work @tbg has started on [dnm] kv: expose (and use) byte batch response size limit #44341 ). I think this is orthogonal and should be kept out of scope here.
-
when the scenario happens we want the error message to be clearer about what needs to happen: either decrease the duration of the txn, or decrease the its number of refresh spans (fewer reads/writes), or increase max_refresh_span_bytes, or increase the closed ts delay
-
or we could avoid the situation entirely? Make the closed ts lag behind the long-running txn if it has disabled refresh spans collection.
Jira issue: CRDB-5215