-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: use pprof label for Replica.Send (replace sendWithRangeID) #85948
Description
Is your feature request related to a problem? Please describe.
We used to pass a faux parameter to Replica.sendWithoutRangeID but a) this broke in a recent Go update. We should just do this properly and use a profiler label, which will then show up in the ?debug=1 goroutines profile.
Describe the solution you'd like
It's a tricky because the code path is allocation sensitive (it's a hot path) and the pprof API forces us to use a context. We don't want to take the incoming context and wrap it due to the allocations, but if there is a label in there already we probably want to respect it and need to allocate. What we can do is to allocate a "background" context along with the replica for use in the fast path, i.e. we'd allocate only once per Replica, and to the expensive wrapping only when there are labels already.
To do this, we'd add a new field pprofLabelCtx to Replica and populate it in newUnloadedReplica:
pprofLabelCtx: pprof.WithLabels(context.Background(), pprof.Labels("rangeID", desc.RangeID.String()))As well as some alloc avoidance infra:
type emptyChecker bool
func (c *emptyChecker) check(k, v string) bool {
*c = true
return false // seen enough
}
var emptyCheckerPool = sync.Pool{New: func() interface{} { return new(emptyChecker) }}and then at the top of Replica.Send (after inlining sendWithoutRangeID):
{
var pprofLabelCtx context.Context
const rangeIDLabel = "range_id"
c := emptyCheckerPool.Get().(*emptyChecker)
pprof.ForLabels(ctx, c.check)
hasLabel := *c
*c = false
emptyCheckerPool.Put(c)
if !hasLabel {
// Fast path, saving allocations.
pprof.SetGoroutineLabels(pprofLabelCtx)
} else {
// There are some incoming labels so we want to add our label to
// the incoming context, which is more allocation intensive.
pprof.SetGoroutineLabels(pprof.WithLabels(ctx, pprof.Labels(rangeIDLabel, r.RangeID.String())))
}
defer pprof.SetGoroutineLabels(ctx)
}Describe alternatives you've considered
Nothing? Or see if sendWithoutRangeID can be resurrected properly.
Additional context
When a request hangs on a replica, it's important to be able to see on which range. In theory, tracing infra can help here too, but I think it's still spotty.
Jira issue: CRDB-18497