-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: pessimistic-mode, replicated read locks to enable large, long running transactions #52768
Description
Is your feature request related to a problem? Please describe.
We've increasingly seen issues with unlimited retries of large, long-running transactions (see #51294, #44645). In some cases where there is no contention whatsoever, the solution to those problems has been to increase the kv.transaction.max_refresh_spans_bytes cluster settings. The work in 20.2 (#46275) to compress those spans should be helpful in these zero-contention cases, however, it may lead to false dependencies.
We've also noted that operations which have laid intents down over reads do not need to refresh those reads. This fact however has not been actionable (and thus there is no code to subtract write keys from the refresh spans) because SQL reads have always been performed as scans and not gets. This is changing soon as @helenmhe works on #46758 with a WIP at #52511.
I suspect that the above situation will be rather helpful in unbounded DELETEs read off of a secondary index with no writes in the range being deleted but writes interspersed in the primary index. In that case, the compression introduced in #46275 will prove problematic.
As we move towards an implementation of transactional schema changes, we are going to be introducing operations which will, by their very nature, will have their timestamp pushed. Furthermore, these transactions are likely to be extremely expensive to retry. However, these transactions are unlikely to be latency sensitive and thus might pair nicely with a mode that allowed reads to push them but blocked contended writes.
Describe the solution you'd like
The solution I'd like to see is a transaction mode whereby all reads acquired a replicated, durable read lock over all spans which were read. This read lock would prevent the need for reads to be refreshed when the transaction's timestamp is pushed. It might make sense to automatically switch to this mode when a transaction enters its second epoch.
Describe alternatives you've considered
The primary alternative in the context to transactional schema changes is to just accept that retries may happen in the face of contention and that one needs to deal with retries.
Additional context
The concept of ranged read locks has long been blocked on the existence of a separated lock table. This seems to be possible in the 21.1 timeframe.
Jira issue: CRDB-3922