Skip to content

kv: log slow client requests with replica information #114431

@nvb

Description

@nvb

We currently log slow/hanging requests from the client at the level of a range, but not at the level of a replica:

const slowDistSenderThreshold = time.Minute
if dur := timeutil.Since(tBegin); dur > slowDistSenderThreshold && !tBegin.IsZero() {
{
var s redact.StringBuilder
slowRangeRPCWarningStr(&s, ba, dur, attempts, routingTok.Desc(), err, reply)
log.Warningf(ctx, "slow range RPC: %v", &s)
}
// If the RPC wasn't successful, defer the logging of a message once the
// RPC is not retried any more.
if err != nil || reply.Error != nil {
ds.metrics.SlowRPCs.Inc(1)
defer func(tBegin time.Time, attempts int64) {
ds.metrics.SlowRPCs.Dec(1)
var s redact.StringBuilder
slowRangeRPCReturnWarningStr(&s, timeutil.Since(tBegin), attempts)
log.Warningf(ctx, "slow RPC response: %v", &s)
}(tBegin, attempts)
}
tBegin = time.Time{} // prevent reentering branch for this RPC
}

As a result, it can be difficult to determine which replica a request was executing on when it got stuck. Ex. #112373 (comment).

We should push similar logging into DistSender.sendToReplicas, surrounding the call to transport.SendNext.

Jira issue: CRDB-33510

Epic CRDB-34227

Metadata

Metadata

Assignees

Labels

A-kv-observabilityC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)E-quick-winLikely to be a quick win for someone experienced.T-kvKV Team

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions