Skip to content

tracing: memory budgeting #59188

@tbg

Description

@tbg

Is your feature request related to a problem? Please describe.
Verbose trace spans collect possibly large amounts of data in memory, which can present a stability risk. With always-on tracing, the same holds for the structured payloads (though with a small factor, since a lot less data is collected).

Describe the solution you'd like
Heuristically count the number of bytes held in a Span's own recording (i.e. not traversing into the children). New data enters via these methods:

// LogStructured adds a Structured payload to the Span. It will be added to the
// recording even if the Span is not verbose; however it will be discarded if
// the underlying Span has been optimized out (i.e. is a noop span).
//
// The caller must not mutate the item once LogStructured has been called.
func (s *Span) LogStructured(item Structured) {
if s.isNoop() {
return
}
s.crdb.logStructured(item)
}
// Record provides a way to record free-form text into verbose spans.
//
// TODO(irfansharif): We don't currently have redactability with trace
// recordings (both here, and using LogStructured above). We'll want to do this
// soon.
func (s *Span) Record(msg string) {
if !s.hasVerboseSink() {
return
}
if s.ot.shadowSpan != nil {
s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, msg))
}
if s.netTr != nil {
s.netTr.LazyPrintf("%s", msg)
}
s.crdb.record(msg)
}
// Recordf is like Record, but accepts a format specifier.
func (s *Span) Recordf(format string, args ...interface{}) {
if !s.hasVerboseSink() {
return
}
str := fmt.Sprintf(format, args...)
if s.ot.shadowSpan != nil {
s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, str))
}
if s.netTr != nil {
s.netTr.LazyPrintf(format, args)
}
s.crdb.record(str)

As a set max size (per span) is exceeded or the root account runs out of quota, drop old messages every time a new message is added. Also set a flag on the span that indicates that the recording is incomplete, perhaps by retaining the most recent timestamp for which events were dropped. We should drop verbose messages first, then structured payloads.

Note that the opentracing and net/trace spans use a cardinality based limit:

netTr.SetMaxEvents(maxLogsPerSpan)

return lightStepManager{}, lightstep.NewTracer(lightstep.Options{
AccessToken: token,
MaxLogsPerSpan: maxLogsPerSpan,
MaxBufferedSpans: 10000,
UseGRPC: true,
})

I think for net/trace we may want to remove the limit and instead, upon first exceeding the memory budget for any given span, print a message to net/trace explaining that no more logs will be relayed there.

Describe alternatives you've considered

We could also try to attach a memory budget and use all of the infrastructure we already have in SQL, by attaching

// BoundAccount and BytesMonitor together form the mechanism by which
// allocations are tracked and constrained (e.g. memory allocations by the
// server on behalf of db clients). The primary motivation is to avoid common
// cases of memory or disk blow-ups due to user error or unoptimized queries; a
// secondary motivation in the longer term is to offer more detailed metrics to
// users to track and explain memory/disk usage.

to the Tracer

// Tracer is our own custom implementation of opentracing.Tracer. It supports:
//
// - forwarding events to x/net/trace instances
//
// - recording traces. Recording is started automatically for spans that have
// the verboseTracingBaggageKey baggage and can be started explicitly as well. Recorded
// events can be retrieved at any time.
//
// - lightstep traces. This is implemented by maintaining a "shadow" lightstep
// Span inside each of our spans.
//
// Even when tracing is disabled, we still use this Tracer (with x/net/trace and
// lightstep disabled) because of its recording capability (verbose tracing needs
// to work in all cases).
//
// Tracer is currently stateless so we could have a single instance; however,
// this won't be the case if the cluster settings move away from using global
// state.
type Tracer struct {

and use that to guide the decisions. This might actually be my favorite, assuming that the mon package is in good standing (I'm not familiar with it). Some notes:

  • this will likely require refactoring of the mon package, since the tracing package can not depend on util/log due to import cycles, which is fine.
  • we need to clear the recording in Span.Finish() and we need to prevent new recording data from entering a Finish'ed span (for otherwise it will allocate from the monitor, but never release back).
  • this is more brittle in the sense that a failure to return bytes to the monitor correctly will cause problems. However, we can add assertions that are active through large swathes of our test suite which should give us good confidence that this is not happening.
  • we need to determine where to take the bytes from. We could hang it off the sql memory monitor, or use a separate, fixed-size monitor.

Additional context

We will want some kind of basic safeguard in place for 21.1, even if it just extends the maxLogsPerSpan mechanism to also apply to payloads.

Jira issue: CRDB-3323

Metadata

Metadata

Assignees

Labels

A-tracingRelating to tracing in CockroachDB.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)X-staleno-issue-activity

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions