Skip to content
This repository was archived by the owner on Sep 17, 2025. It is now read-only.

Fix gRPC client interceptor for channels reused across traces#539

Merged
c24t merged 4 commits intocensus-instrumentation:masterfrom
nikhaldi:fix/grpc-tracing
Mar 11, 2019
Merged

Fix gRPC client interceptor for channels reused across traces#539
c24t merged 4 commits intocensus-instrumentation:masterfrom
nikhaldi:fix/grpc-tracing

Conversation

@nikhaldi
Copy link
Copy Markdown
Contributor

@nikhaldi nikhaldi commented Mar 5, 2019

In a typical web server gRPC clients/channels are reused across multiple
requests and hence across multiple traces. Previously the
OpenCensusClientInterceptor was instantiated for each channel with the
current tracer from the execution context. This would then lead to all
rpcs going through that channel to have the same tracer, essentially
grouping all rpcs under whatever happened to be the current trace when the
channel was created.

Instead instantiate OpenCensusClientInterceptor without a tracer by
default. The current tracer will be retrieved from the execution context at
the start of every rpc span.

In addition OpenCensusClientInterceptor was manipulating thread-local state
via the execution context. This seems unnecessary and misguided. The current
span state is already managed by the spans/context tracers. Setting the
tracer explicitly risks further subtle bugs.

Also removes unused method OpenCensusClientInterceptor._end_span_between_context.

Fixes #182

@nikhaldi nikhaldi requested review from a team, c24t, reyang and songy23 as code owners March 5, 2019 20:26
@nikhaldi
Copy link
Copy Markdown
Contributor Author

nikhaldi commented Mar 5, 2019

I can reproduce the underlying problem reliably with a standard Flask integration using the FlaskMiddleware. This change does fix the Flask server for me.

I don't fully understand what the intent was with instantiating the client interceptor this way (I looked at the git history, still unclear). It's clearly broken though for common server use cases. I may be missing some context and maybe there are other use cases where this made sense? Would love any hints on that.

In a typical web server gRPC clients/channels are reused across multiple
requests and hence across multiple traces. Previously the
`OpenCensusClientInterceptor` was instantiated for each channel with the
current tracer from the execution context. This would then lead to all
rpcs going through that channel to have the same tracer, essentially
grouping all rpcs under whatever happened to be the current trace when the
channel was created.

Instead instantiate `OpenCensusClientInterceptor` without a tracer by
default. The current tracer will be retrieved from the execution context at
the start of every rpc span.

In addition `OpenCensusClientInterceptor` was manipulating thread-local state
via the execution context. This seems unnecessary and misguided. The current
span state is already managed by the spans/context tracers. Setting the
tracer explicitly risks further subtle bugs.

Also removes unused method `OpenCensusClientInterceptor._end_span_between_context`.

Fixes census-instrumentation#182
@c24t
Copy link
Copy Markdown
Member

c24t commented Mar 6, 2019

I don't fully understand what the intent was with instantiating the client interceptor this way (I looked at the git history, still unclear).

I don't understand it either. If the goal was to be able to use OpenCensusClientInterceptor with a tracer other than the context's current tracer then we almost certainly wouldn't want to set that as the context's tracer.

Removing execution_context here seems right to me, but @liyanhui1228 or @wkiser may be able to shed some light on a use case we're missing.

Copy link
Copy Markdown
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unambiguously good change to me, let's see if @liyanhui1228 or @wkiser have any objections.

@c24t c24t merged commit 724bd0b into census-instrumentation:master Mar 11, 2019
@odeke-em
Copy link
Copy Markdown
Member

Thanks for working on this @nikhaldi and @c24t for the review!

It would be nice to add a regression test too to lock-in this behavior and ensure that this bug never creeps back in.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants