Skip to content

kvserver: request stage metrics #82200

@tbg

Description

@tbg

Is your feature request related to a problem? Please describe.

As a BatchRequest arrives at a Store and gets processed by a Replica, it passes through multiple stages of execution.

Currently, the KV latency metrics we have all close over contention, so we can never quite conclusively blame or de-blame KV for perceived slowness of an aggregate workload: it is possible that there is contention, in which case the KV latency metrics are expected to be elevated. But it's also possible that the slowness arises from, say, an overloaded pebble instance.

In effect, one needs to check all possible sources of slowness individually but there is no easy way to do so since we don't divvy up how a request spends it time. For example, we don't track how long a request takes to evaluate, which is mostly a function of CPU and I/O. Slow evaluation leads to slow latching and more contention, etc, so it is really slow evaluation that one would want to know about first. But we don't track at that kind of granularity.

Describe the solution you'd like

A breakdown of the phases is described in this comment:

// Send executes a command on this range, dispatching it to the
// read-only, read-write, or admin execution path as appropriate.
// ctx should contain the log tags from the store (and up).
//
// A rough schematic for the path requests take through a Replica
// is presented below, with a focus on where requests may spend
// most of their time (once they arrive at the Node.Batch endpoint).
//
// DistSender (tenant)
// │
// ┆ (RPC)
// │
// ▼
// Node.Batch (host cluster)
// │
// ▼
// Admission control
// │
// ▼
// Replica.Send
// │
// Circuit breaker
// │
// ▼
// Replica.maybeBackpressureBatch (if Range too large)
// │
// ▼
// Replica.maybeRateLimitBatch (tenant rate limits)
// │
// ▼
// Replica.maybeCommitWaitBeforeCommitTrigger (if committing with commit-trigger)
// │
// read-write ◄─────────────────────────┴────────────────────────► read-only
// │ │
// │ │
// ├─────────────► executeBatchWithConcurrencyRetries ◄────────────┤
// │ (handles leases and txn conflicts) │
// │ │
// ▼ │
// executeWriteBatch │
// │ │
// ▼ ▼
// evalAndPropose (turns the BatchRequest executeReadOnlyBatch
// │ into pebble WriteBatch)
// │
// ├──────────────────► (writes that can use async consensus do not
// │ wait for replication and are done here)
// │
// ├──────────────────► maybeAcquireProposalQuota
// │ (applies backpressure in case of
// │ lagging Raft followers)
// │
// │
// ▼
// handleRaftReady (drives the Raft loop, first appending to the log
// to commit the command, then signaling proposer and
// applying the command)
func (r *Replica) Send(

I'd like to standardize on a set of phases that we are going to measure and ideally we measure them always (regardless of tracing), so that we can populate metrics. Some of the phases are going to be very straightforward (for example, time spent waiting for admission control, time spent latching, time spent evaluating) while others are more subtle (time spent replicating - this is just hard to set up since most requests return early, before they are replicated). We don't have to get everything sorted out in the first pass, but should do the obvious phases, with everything else falling into an "unaccounted" bucket. It's kind of annoying to actually properly build an unaccounted bucket, but I think we don't really have to - we have the existing metric that tracks the entire duration:

n.metrics.callComplete(timeutil.Since(tStart), pErr)

so (at least in prometheus) we can subtract from that metric all of the phases to get a good idea of whether the "time between phases" is significant.

Describe alternatives you've considered

Additional context

#71169 E2E latency
#82203 extension of this issue to also record the stage latencies on a per-request basis.

Jira issue: CRDB-16246

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kvAnything in KV that doesn't belong in a more specific category.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions