Skip to content

kv: release latches before evaluation for read-only requests #66485

@nvb

Description

@nvb

Related to, but separate from #25579.
Related to #41720 (comment).
Original discussion in https://forum.cockroachlabs.com/t/why-do-we-keep-read-commands-in-the-command-queue/360.

Currently, read-only requests hold latches across evaluation. These are read-only latches, and so they do not conflict with writes at higher timestamps than the read. However, they do still block writes at lower timestamps than the read (at least until we address #25579) and do still block non-MVCC writes like range splits. This is a problem, as the more we allow requests to block on one another, the more we open ourselves up to unexpected tail latency under rare sequences of events like we saw in #66338, where an Export (MVCC read), blocked a split (non-MVCC write), blocked a large set of writes (MVCC write).

We should re-evaluate this decision and determine whether we can drop latches earlier for reads, before they spend any time reading data. Doing so would rely on #55461. The high-level idea is that we could grab an LSM snapshot and bump the timestamp cache, and then drop the read's latches (and readOnlyCmdMu) before proceeding to evaluation.

For context, there are two reasons why we didn't do this in the past, and both have to do with wanting to wait to bump the timestamp cache until after evaluation:

  1. we don't want to bump the timestamp cache immediately in case the read runs into intents during evaluation and is forced to retry after pushing/waiting. Bumping the timestamp cache early could cause some form of starvation to writers who lay down and intent and then want to re-write it in a subsequent retry. If readers always bumped the timestamp cache before finding the conflicting intent then they could prevent the writer from every completing.
  2. we don't want to bump the timestamp cache for limited scans if we don't know the full extent of what they will (or won't) read. Doing so could cause false-contention.

These concerns are still both valid to some degree, but things have also changed over the past few years.

Regarding the starvation of writers, this is no longer as severe of a problem as it once was. This is because transactions can now refresh their reads if their write timestamp is bumped. While contention can still prevent a read refresh from succeeding and force a transaction to perform a full restart, this is no longer a given like it once was. Additionally, with the lockTable, we will soon have the ability to check for conflicting locks over a span of keys without performing a full scan of the data. This opens up the opportunity to scan the lockTable upfront, while holding latches, and then only proceed to bump the timestamp cache if no conflicts are found.

The second reason still seems legitimate. We don't want to bump the timestamp cache during a scan if we don't know the full width of the scan ahead of time. Doing so would often lead to limited scans (SELECT * FROM t LIMIT 1) bumping the timestamp cache across the entire range, which would trip up otherwise uncontended writes. We have recently landed changes to optimistically latch and lock for limited scans that don't know until evaluation how far they will need to scan. This feels related to me, though I don't have a clear picture about how to take advantage of optimistic evaluation to avoid holding latches across evaluation.

However, one thing that has changed regarding limited scans is that we now have a concept of a closed timestamp, which prevents writes below some fixed interval that lags present time. If a limited scan is run below this timestamp, as ExportRequests often do, there is no benefit for bumping the timestamp cache after evaluation.

So with all of these changes, it seems appropriate that we reconsider the synchronization of read-only requests.

/cc. @andreimatei

Jira issue: CRDB-8063

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kv-transactionsRelating to MVCC and the transactional model.C-performancePerf of queries or internals. Solution not expected to change functional behavior.T-kvKV Team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions