We currently log slow/hanging requests from the client at the level of a range, but not at the level of a replica:
|
const slowDistSenderThreshold = time.Minute |
|
if dur := timeutil.Since(tBegin); dur > slowDistSenderThreshold && !tBegin.IsZero() { |
|
{ |
|
var s redact.StringBuilder |
|
slowRangeRPCWarningStr(&s, ba, dur, attempts, routingTok.Desc(), err, reply) |
|
log.Warningf(ctx, "slow range RPC: %v", &s) |
|
} |
|
// If the RPC wasn't successful, defer the logging of a message once the |
|
// RPC is not retried any more. |
|
if err != nil || reply.Error != nil { |
|
ds.metrics.SlowRPCs.Inc(1) |
|
defer func(tBegin time.Time, attempts int64) { |
|
ds.metrics.SlowRPCs.Dec(1) |
|
var s redact.StringBuilder |
|
slowRangeRPCReturnWarningStr(&s, timeutil.Since(tBegin), attempts) |
|
log.Warningf(ctx, "slow RPC response: %v", &s) |
|
}(tBegin, attempts) |
|
} |
|
tBegin = time.Time{} // prevent reentering branch for this RPC |
|
} |
As a result, it can be difficult to determine which replica a request was executing on when it got stuck. Ex. #112373 (comment).
We should push similar logging into DistSender.sendToReplicas, surrounding the call to transport.SendNext.
Jira issue: CRDB-33510
Epic CRDB-34227
We currently log slow/hanging requests from the client at the level of a range, but not at the level of a replica:
cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go
Lines 1922 to 1941 in ce3f78b
As a result, it can be difficult to determine which replica a request was executing on when it got stuck. Ex. #112373 (comment).
We should push similar logging into
DistSender.sendToReplicas, surrounding the call totransport.SendNext.Jira issue: CRDB-33510
Epic CRDB-34227