-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: intents from transactions that have been successfully PUSH_TIMESTAMP-ed is O(num intents) #103126
Description
Describe the problem
Higher priority readers in CRDB are able to push lower priority writers above their timestamp, thus allowing readers to proceed without conflicting with the writer. However, if the writer has written intents at the lower timestamp, the reader needs to resolve them (i.e move them to the higher timestamp) before it can proceed with its scan.
Today, this process is O(num intents) -- the reader pushes the writer every time it discovers a conflicting intent and resolves that intent before proceeding to the next one. This happens here:
| return w.ir.ResolveIntent(ctx, resolve, opts) |
To Reproduce
-- session 1
CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys SELECT generate_series(1, 1000);
-- session 2
BEGIN PRIORITY HIGH; SELECT count(*) FROM keys;
-- takes ~7ms per intent
Proposed solution
Prior to #49218, this problem existed for finalized (committed, but more notably, aborted) transactions as well. That patch introduced the finalizedTxnCache which is added to here:
cockroach/pkg/kv/kvserver/concurrency/lock_table_waiter.go
Lines 531 to 536 in 5418acd
| // If the transaction is finalized, add it to the finalizedTxnCache. This | |
| // avoids needing to push it again if we find another one of its locks and | |
| // allows for batching of intent resolution. | |
| if pusheeTxn.Status.IsFinalized() { | |
| w.lt.TransactionIsFinalized(pusheeTxn) | |
| } |
This ensures a subsequent re-scan of the lock table (post intent resolution) do not have to push the same transaction again to recognize that it is finalized. Instead, it can just collect intents from the finalized transaction and batch resolve them in one go. This means that intent resolution is O(num ranges) for finalized transactions instead of O(intents)[*].
We should extend this concept for transactions that are known to have been pushed to a higher timestamp as well. This would allow high priority readers to collect and batch resolve intents in similar fashion.
[*] Assuming no async-intent resolution and that the readers read set includes all intents written by the writer.
Additional context
Notably, this impacts backups (which eventually run high priority ExportRequests). Backups run high priority ExportRequests so that they aren't starved by concurrent writers; however, if we have writer that's writing a high enough amount of intents, the backup can indeed be starved.
cc @nvanbenschoten @adityamaru
Jira issue: CRDB-27844