Skip to content

kvserver: use pprof label for Replica.Send (replace sendWithRangeID) #85948

@tbg

Description

@tbg

Is your feature request related to a problem? Please describe.

We used to pass a faux parameter to Replica.sendWithoutRangeID but a) this broke in a recent Go update. We should just do this properly and use a profiler label, which will then show up in the ?debug=1 goroutines profile.

Describe the solution you'd like

It's a tricky because the code path is allocation sensitive (it's a hot path) and the pprof API forces us to use a context. We don't want to take the incoming context and wrap it due to the allocations, but if there is a label in there already we probably want to respect it and need to allocate. What we can do is to allocate a "background" context along with the replica for use in the fast path, i.e. we'd allocate only once per Replica, and to the expensive wrapping only when there are labels already.

To do this, we'd add a new field pprofLabelCtx to Replica and populate it in newUnloadedReplica:

  pprofLabelCtx: pprof.WithLabels(context.Background(), pprof.Labels("rangeID", desc.RangeID.String()))

As well as some alloc avoidance infra:

type emptyChecker bool

func (c *emptyChecker) check(k, v string) bool {
	*c = true
	return false // seen enough
}

var emptyCheckerPool = sync.Pool{New: func() interface{} { return new(emptyChecker) }}

and then at the top of Replica.Send (after inlining sendWithoutRangeID):

	{
		var pprofLabelCtx context.Context
		const rangeIDLabel = "range_id"
		c := emptyCheckerPool.Get().(*emptyChecker)
		pprof.ForLabels(ctx, c.check)
		hasLabel := *c
		*c = false
		emptyCheckerPool.Put(c)
		if !hasLabel {
			// Fast path, saving allocations.
			pprof.SetGoroutineLabels(pprofLabelCtx)
		} else {
			// There are some incoming labels so we want to add our label to
			// the incoming context, which is more allocation intensive.
			pprof.SetGoroutineLabels(pprof.WithLabels(ctx, pprof.Labels(rangeIDLabel, r.RangeID.String())))
		}
		defer pprof.SetGoroutineLabels(ctx)
	}

Describe alternatives you've considered

Nothing? Or see if sendWithoutRangeID can be resurrected properly.

Additional context

When a request hangs on a replica, it's important to be able to see on which range. In theory, tracing infra can help here too, but I think it's still spotty.

Jira issue: CRDB-18497

Metadata

Metadata

Assignees

Labels

A-kv-observabilityA-kv-replicationRelating to Raft, consensus, and coordination.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions