Automatic ttrpc and bridge span creation#1379
Automatic ttrpc and bridge span creation#1379helsaawy wants to merge 2 commits intomicrosoft:mainfrom
Conversation
4a510c9 to
01b6418
Compare
01b6418 to
668fd6e
Compare
74fd8aa to
aadc6de
Compare
|
In general I like it! The one comment that I have is I never loved the default name for gRPC entry spans and in our case here we are going to do |
Do you mean
Yeah ... we are planning to clean up the bridge protocol code (eventually). But it is cludgy to say the least. Unfortunately, the spans on the host will have different names than spans on the Linux guest, which resolve hcsshim/internal/guest/prot/protocol.go Lines 150 to 206 in 25b6734 |
aadc6de to
98bdbbc
Compare
98bdbbc to
593c1e7
Compare
|
Rebased to resolve merge conflicts |
|
|
||
| func (c *Container) waitBackground() { | ||
| ctx, span := oc.StartSpan(context.Background(), "gcs::Container::waitBackground") | ||
| _, span := oc.StartSpan(context.Background(), "gcs::Container::waitBackground") |
There was a problem hiding this comment.
why are we keeping this span but not the others on a Container?
There was a problem hiding this comment.
all the other Container operations eventually end up with a gcs and then bridge call, so they have spans being created for them.
This is the only one that doesn't. Plus this is a long running call that is launched asynchronously, so we dont have any other way of tracking it, which i felt made it worth of a dedicated span
| var err error | ||
| _, span := oc.StartSpan(context.Background(), "gcs::Process::waitBackground") |
There was a problem hiding this comment.
same thing here. Why are we starting a span here?
internal/guest/bridge/bridge_v2.go
Outdated
| ctx := r.Context | ||
| log.G(ctx).Trace("opengcs::bridge::modifySettingsV2") |
There was a problem hiding this comment.
Same question here. Should this and dumpStacksV2 and deleteContainerStateV2 use logEntry?
There was a problem hiding this comment.
dumpStacksV2 doesnt apply to a particular container, right?
but yes, deleteContainerStateV2 should
internal/log/hook.go
Outdated
|
|
||
| // Duration format converts a [time.Duration] fields to an appropriate encoding. | ||
| // | ||
| // Default is [DurationFormatSeconds]. |
There was a problem hiding this comment.
As in, using milliseconds is a better representation? I figured since some of the OCI requests are on the order of seconds, seconds was appropriate, but I can swap out milliseconds if that makes more sense
| if s.isSandbox { | ||
| span.AddAttributes(trace.StringAttribute("pod-id", s.tid)) | ||
| } | ||
| s.logEntry(ctx).Trace("State") |
There was a problem hiding this comment.
so we're no longer logging request information?
There was a problem hiding this comment.
We do, we now add the entire request to the auto-generated span, instead of manually adding bits of the request to the individual spans we create here
hcsshim/pkg/octtrpc/interceptor.go
Lines 122 to 124 in 593c1e7
99417e1 to
081c6d9
Compare
081c6d9 to
66588a1
Compare
|
rebased |
66588a1 to
22fee56
Compare
| defer func() { | ||
| if err == nil { | ||
| span.AddAttributes(trace.StringAttribute("response", log.Format(ctx, resp))) | ||
| } | ||
| setSpanStatus(span, err) | ||
| }() | ||
| if len(o.attrs) > 0 { | ||
| span.AddAttributes(o.attrs...) | ||
| } | ||
|
|
||
| return method(ctx, unmarshal) | ||
| return method(ctx, func(req interface{}) (err error) { | ||
| if err = unmarshal(req); err == nil { | ||
| span.AddAttributes(trace.StringAttribute("request", log.Format(ctx, req))) | ||
| } | ||
| return err | ||
| }) |
There was a problem hiding this comment.
I'm concerned about always logging the request/response:
- These could be quite large (compared to a normal span) in many cases, which would bloat logging data or lead to worse performance.
- These could contain sensitive information that should not be logged.
There was a problem hiding this comment.
I'll validate/update the scrubbing for the requests that contain spans, but we already log most components of the pieces of the requests/responses in the span, but we manually add the fields:
hcsshim/cmd/containerd-shim-runhcs-v1/service.go
Lines 126 to 150 in fa3b77d
The only exception is Exec process, which carries the OCI spec, and that we log latter on regardless.
Also, we log all requests/responses over the bridge (and log them again on the Linux GCS side):
hcsshim/internal/gcs/bridge.go
Lines 399 to 402 in fa3b77d
hcsshim/internal/guest/bridge/bridge.go
Line 322 in fa3b77d
hcsshim/internal/gcs/bridge.go
Lines 298 to 301 in fa3b77d
hcsshim/internal/guest/bridge/bridge.go
Line 386 in fa3b77d
Performance wise, the formatting uses google.golang.org/protobuf/encoding/protojson, which should be quite performant:
hcsshim/internal/log/format.go
Lines 71 to 79 in fa3b77d
22fee56 to
88090d0
Compare
ttrpc interceptor automatically formats (as JSON) and adds server requests and responses as span attributes. Host and guest bridge automatically creates spans for RPCs. Replaced redundant `cmd/containerd-shim-runhcs-v1/service.go.(Service)` `internal/gcs`, and `interna\guest\bridge` spans with `.Trace` log entries, since they include all the same (or less) information as those automatically generated by the ttrp interceptor and `(*bridge).RPC` Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Allow enabling adding the request/response to the ttrpc span as attributes, since there are other customers of our code. Allow scrubbing the `proto.Message` request and response messages in the ttrpc server interceptor by specifying an arbitrary function to update a clone of the payloads. Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
88090d0 to
d455f6d
Compare
(Host) Bridge automatically creates spans for RPCs.
Update ttrpc interceptor to add request and response structs to span attributes.
Update gcs bridge to create spans for outgoing calls automatically.
Replaced redundant
cmd/containerd-shim-runhcs-v1.(*Service)and
internal/gcsspans with.Tracelog entries, since they includeall the same (or less) information as those automatically generated by
the ttrp interceptor and
(*bridge).RPCRelies on PRs:
rpcProcstring conversion #1391This PR gets rid of the
cmd/containerd-shim-runhcs-v1.(*Service)spans (eg,"State","Create")and instead only exports the TTRPC spans (eg,
"containerd.task.v2.Task.State","containerd.task.v2.Task.Create")