-
Notifications
You must be signed in to change notification settings - Fork 4.1k
tracing: memory budgeting #59188
Description
Is your feature request related to a problem? Please describe.
Verbose trace spans collect possibly large amounts of data in memory, which can present a stability risk. With always-on tracing, the same holds for the structured payloads (though with a small factor, since a lot less data is collected).
Describe the solution you'd like
Heuristically count the number of bytes held in a Span's own recording (i.e. not traversing into the children). New data enters via these methods:
cockroach/pkg/util/tracing/span.go
Lines 324 to 366 in a01689b
| // LogStructured adds a Structured payload to the Span. It will be added to the | |
| // recording even if the Span is not verbose; however it will be discarded if | |
| // the underlying Span has been optimized out (i.e. is a noop span). | |
| // | |
| // The caller must not mutate the item once LogStructured has been called. | |
| func (s *Span) LogStructured(item Structured) { | |
| if s.isNoop() { | |
| return | |
| } | |
| s.crdb.logStructured(item) | |
| } | |
| // Record provides a way to record free-form text into verbose spans. | |
| // | |
| // TODO(irfansharif): We don't currently have redactability with trace | |
| // recordings (both here, and using LogStructured above). We'll want to do this | |
| // soon. | |
| func (s *Span) Record(msg string) { | |
| if !s.hasVerboseSink() { | |
| return | |
| } | |
| if s.ot.shadowSpan != nil { | |
| s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, msg)) | |
| } | |
| if s.netTr != nil { | |
| s.netTr.LazyPrintf("%s", msg) | |
| } | |
| s.crdb.record(msg) | |
| } | |
| // Recordf is like Record, but accepts a format specifier. | |
| func (s *Span) Recordf(format string, args ...interface{}) { | |
| if !s.hasVerboseSink() { | |
| return | |
| } | |
| str := fmt.Sprintf(format, args...) | |
| if s.ot.shadowSpan != nil { | |
| s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, str)) | |
| } | |
| if s.netTr != nil { | |
| s.netTr.LazyPrintf(format, args) | |
| } | |
| s.crdb.record(str) |
As a set max size (per span) is exceeded or the root account runs out of quota, drop old messages every time a new message is added. Also set a flag on the span that indicates that the recording is incomplete, perhaps by retaining the most recent timestamp for which events were dropped. We should drop verbose messages first, then structured payloads.
Note that the opentracing and net/trace spans use a cardinality based limit:
cockroach/pkg/util/tracing/tracer.go
Line 343 in 513b01b
| netTr.SetMaxEvents(maxLogsPerSpan) |
cockroach/pkg/util/tracing/shadow.go
Lines 107 to 112 in e9e1f2b
| return lightStepManager{}, lightstep.NewTracer(lightstep.Options{ | |
| AccessToken: token, | |
| MaxLogsPerSpan: maxLogsPerSpan, | |
| MaxBufferedSpans: 10000, | |
| UseGRPC: true, | |
| }) |
I think for net/trace we may want to remove the limit and instead, upon first exceeding the memory budget for any given span, print a message to net/trace explaining that no more logs will be relayed there.
Describe alternatives you've considered
We could also try to attach a memory budget and use all of the infrastructure we already have in SQL, by attaching
cockroach/pkg/util/mon/bytes_usage.go
Lines 30 to 35 in 94a429b
| // BoundAccount and BytesMonitor together form the mechanism by which | |
| // allocations are tracked and constrained (e.g. memory allocations by the | |
| // server on behalf of db clients). The primary motivation is to avoid common | |
| // cases of memory or disk blow-ups due to user error or unoptimized queries; a | |
| // secondary motivation in the longer term is to offer more detailed metrics to | |
| // users to track and explain memory/disk usage. |
to the Tracer
cockroach/pkg/util/tracing/tracer.go
Lines 114 to 132 in 513b01b
| // Tracer is our own custom implementation of opentracing.Tracer. It supports: | |
| // | |
| // - forwarding events to x/net/trace instances | |
| // | |
| // - recording traces. Recording is started automatically for spans that have | |
| // the verboseTracingBaggageKey baggage and can be started explicitly as well. Recorded | |
| // events can be retrieved at any time. | |
| // | |
| // - lightstep traces. This is implemented by maintaining a "shadow" lightstep | |
| // Span inside each of our spans. | |
| // | |
| // Even when tracing is disabled, we still use this Tracer (with x/net/trace and | |
| // lightstep disabled) because of its recording capability (verbose tracing needs | |
| // to work in all cases). | |
| // | |
| // Tracer is currently stateless so we could have a single instance; however, | |
| // this won't be the case if the cluster settings move away from using global | |
| // state. | |
| type Tracer struct { |
and use that to guide the decisions. This might actually be my favorite, assuming that the mon package is in good standing (I'm not familiar with it). Some notes:
- this will likely require refactoring of the
monpackage, since the tracing package can not depend onutil/logdue to import cycles, which is fine. - we need to clear the recording in
Span.Finish()and we need to prevent new recording data from entering aFinish'ed span (for otherwise it will allocate from the monitor, but never release back). - this is more brittle in the sense that a failure to return bytes to the monitor correctly will cause problems. However, we can add assertions that are active through large swathes of our test suite which should give us good confidence that this is not happening.
- we need to determine where to take the bytes from. We could hang it off the sql memory monitor, or use a separate, fixed-size monitor.
Additional context
We will want some kind of basic safeguard in place for 21.1, even if it just extends the maxLogsPerSpan mechanism to also apply to payloads.
Jira issue: CRDB-3323