otel: Segregate client and server RPCInfo used for metrics and traces#9081
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request resolves telemetry metadata collisions in relay scenarios by separating client and server RPC information within the context using distinct keys and helper functions. It also includes end-to-end tests to ensure that metrics and traces are correctly isolated when a single context is shared between server and client roles. The review feedback suggests further refining getOrCreateCallInfo to check for method changes during context reuse and identifies several redundant calls to context-setting functions that can be removed to optimize performance and reduce unnecessary allocations.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9081 +/- ##
==========================================
+ Coverage 83.06% 83.18% +0.11%
==========================================
Files 413 413
Lines 33135 33484 +349
==========================================
+ Hits 27523 27852 +329
- Misses 4199 4218 +19
- Partials 1413 1414 +1
🚀 New features to boost your workflow:
|
arjan-bal
left a comment
There was a problem hiding this comment.
I noticed you went with the first approach using separate client and server keys. Just wanted to quickly circle back to the second option: updating the logic from getOrCreateRPCAttemptInfo to createRPCAttemptInfo.
Did you run into any specific issues or blockers when looking into that second approach? I just want to make sure we've fully explored or ruled out that path since it avoids relying on the context chain restrictions.
|
Please update the release notes to describe user visible behaviour, omitting the internal implementation. For e.g: |
The general principle here is that we need segregated rpcInfo on opposite sides of an RPC (Server vs. Client), but we must allow sibling stats handlers (like metrics and tracing) on the same side to share that state. Furthermore, in cases of relay scenarios, we need both states to coexist in the context without risking one corrupting or shadowing the other when we have these RPCs running in parallel. Using segregated context keys intuitively solves both requirements. Also in the discussion with leads, the inclination was towards this approach, as it aligns much better with our existing patterns for handling metadata. |
| return nil, fmt.Errorf("failed to create relay client: %v", err) | ||
| } | ||
| defer relayCC.Close() | ||
| err = relayCC.Invoke(ctx, "/grpc.testing.TestService/UnregisteredCall", in, &testpb.SimpleResponse{}) |
There was a problem hiding this comment.
Is there a specific reason for using the low-level Invoke method instead of creating a test service client to call UnaryCall or EmptyCall? Typically, Invoke is reserved for generated code.
There was a problem hiding this comment.
The intent was just to have a different method name for the outbound call to easily detect attribute leakage. I can switch the outbound call to use the generated client to call EmptyCall (or another available method) instead of using Invoke with an unregistered name. That would keep the names distinct while using the generated client.
There was a problem hiding this comment.
I would suggest using EmptyCall and UnaryCall. Since both are unary methods, the implementation size would remain the same.
There are other tests using Invoke in this file, most probably because they're testing the case where the server doesn't have a handler for method being invoked by the client.
|
In the release notes, you can omit internal details and focus only on the user visible impact, i.e. "prevent outbound client calls from overwriting telemetry metadata in relay scenarios". |
arjan-bal
left a comment
There was a problem hiding this comment.
LGTM. We can cherry-pick this into the 1.81.x branch.
Original PR: #9081 RELEASE NOTES: * otel: Segregate client and server RPCInfo used for metrics and traces.
This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `v1.81.0` → `v1.81.1` |  |  | --- ### Release Notes <details> <summary>grpc/grpc-go (google.golang.org/grpc)</summary> ### [`v1.81.1`](https://github.com/grpc/grpc-go/releases/tag/v1.81.1): Release 1.81.1 [Compare Source](grpc/grpc-go@v1.81.0...v1.81.1) ### Security - xds/rbac: Fix a potential authorization bypass caused by incorrectly falling through URI/DNS SANs to Subject Distinguished Name (DN) when matching the authenticated principal name. With this fix, only the first non-empty identity source will be used, as per [gRFC A41](https://github.com/grpc/proposal/blob/master/A41-xds-rbac.md). ([#​9111](grpc/grpc-go#9111)) - Special Thanks: [@​al4an444](https://github.com/al4an444) ### Bug Fixes - otel: Segregate client and server RPC information used for metrics and traces, to avoid one overwriting the other. ([#​9081](grpc/grpc-go#9081)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/talos-mcp/pulls/2
Fixes #9053
Client and Server metrics and traces handling code during RPC was using the same RPCInfo. This lead to overwriting of context when an application was acting as server and then initiating other grpc call as a client. This PR segregates the rpcInfoKey context key into clientRPCInfoKey and serverRPCInfoKey so that it is not reused/overwritten between server and client code.
RELEASE NOTES: