stats/otel: Add subchannel metrics (A94)#8738
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8738 +/- ##
==========================================
+ Coverage 83.22% 83.37% +0.15%
==========================================
Files 419 418 -1
Lines 32454 32425 -29
==========================================
+ Hits 27009 27034 +25
+ Misses 4057 4010 -47
+ Partials 1388 1381 -7
🚀 New features to boost your workflow:
|
|
A94 states the following: How are we currently handling this in the pickfirst metrics? |
|
A94 states the following: Can you please ensure that we have an issue filed to track the removal of the old metrics and that it captures the correct release where it needs to be removed. |
Here is how pickfirst handles this:
|
Issue filed - #8752 |
Handled as per PR |
| t.Errorf("Unexpected data for metric %v, got: %v, want: %v", "grpc.lb.pick_first.disconnections", got, 0) | ||
| } | ||
|
|
||
| //Checking for subchannel metrics as well |
There was a problem hiding this comment.
Nit: Space between the // and the start of the comment, and please terminate comment sentences with a period.
There was a problem hiding this comment.
Here and elsewhere. There are still a bunch of places with comments that are not terminated with periods. See: go/go-style/decisions#comment-sentences
| // Wait for the SUCCESS metric to ensure recording logic has processed. | ||
| waitForMetric(ctx, t, tmr, "grpc.subchannel.connection_attempts_succeeded") | ||
|
|
||
| // Verify Success: Exactly 1 (The Winner). | ||
| if got, _ := tmr.Metric("grpc.subchannel.connection_attempts_succeeded"); got != 1 { | ||
| t.Errorf("Unexpected data for metric %v, got: %v, want: 1", "grpc.subchannel.connection_attempts_succeeded", got) | ||
| } |
There was a problem hiding this comment.
How does this actually ensure that we check the value of the metric after the first connection attempt is completely processed? We do call holds[0].Resume(), but does that guarantee that the subchannel code sees the connection being successful, but drops it since the subchannel has been deleted by the LB policy.
There was a problem hiding this comment.
We are waiting for the metric to be emitted. Connection attempt success will only be emitted if there is a successful connection. In case of cancellation of attempt - it will not be successful and in case of disconnection after establishing connection, it will still be recorded as a disconnection. In both scenarios, the attempts succeeded will always be 1.
easwars
left a comment
There was a problem hiding this comment.
Looks good mostly. Just some minor nits.
Can you also please update the PR description to note that the disconnection_reason will be plumbed in a follow-up PR. Thanks.
| _, ok := tmr.Metric(metricName) | ||
| if ok { |
There was a problem hiding this comment.
Nit: The assignment and the conditional can be moved into a single line:
if _, ok := tmr.Metric(metricName); ok {
...
}
done. |
…jo) (#12794) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `v1.75.0` → `v1.79.3` |  |  | --- ### gRPC-Go has an authorization bypass via missing leading slash in :path [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details ##### Impact _What kind of vulnerability is it? Who is impacted?_ It is an **Authorization Bypass** resulting from **Improper Input Validation** of the HTTP/2 `:path` pseudo-header. The gRPC-Go server was too lenient in its routing logic, accepting requests where the `:path` omitted the mandatory leading slash (e.g., `Service/Method` instead of `/Service/Method`). While the server successfully routed these requests to the correct handler, authorization interceptors (including the official `grpc/authz` package) evaluated the raw, non-canonical path string. Consequently, "deny" rules defined using canonical paths (starting with `/`) failed to match the incoming request, allowing it to bypass the policy if a fallback "allow" rule was present. **Who is impacted?** This affects gRPC-Go servers that meet both of the following criteria: 1. They use path-based authorization interceptors, such as the official RBAC implementation in `google.golang.org/grpc/authz` or custom interceptors relying on `info.FullMethod` or `grpc.Method(ctx)`. 2. Their security policy contains specific "deny" rules for canonical paths but allows other requests by default (a fallback "allow" rule). The vulnerability is exploitable by an attacker who can send raw HTTP/2 frames with malformed `:path` headers directly to the gRPC server. ##### Patches _Has the problem been patched? What versions should users upgrade to?_ Yes, the issue has been patched. The fix ensures that any request with a `:path` that does not start with a leading slash is immediately rejected with a `codes.Unimplemented` error, preventing it from reaching authorization interceptors or handlers with a non-canonical path string. Users should upgrade to the following versions (or newer): * **v1.79.3** * The latest **master** branch. It is recommended that all users employing path-based authorization (especially `grpc/authz`) upgrade as soon as the patch is available in a tagged release. ##### Workarounds _Is there a way for users to fix or remediate the vulnerability without upgrading?_ While upgrading is the most secure and recommended path, users can mitigate the vulnerability using one of the following methods: ##### 1. Use a Validating Interceptor (Recommended Mitigation) Add an "outermost" interceptor to your server that validates the path before any other authorization logic runs: ```go func pathValidationInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) { if info.FullMethod == "" || info.FullMethod[0] != '/' { return nil, status.Errorf(codes.Unimplemented, "malformed method name") } return handler(ctx, req) } // Ensure this is the FIRST interceptor in your chain s := grpc.NewServer( grpc.ChainUnaryInterceptor(pathValidationInterceptor, authzInterceptor), ) ``` ##### 2. Infrastructure-Level Normalization If your gRPC server is behind a reverse proxy or load balancer (such as Envoy, NGINX, or an L7 Cloud Load Balancer), ensure it is configured to enforce strict HTTP/2 compliance for pseudo-headers and reject or normalize requests where the `:path` header does not start with a leading slash. ##### 3. Policy Hardening Switch to a "default deny" posture in your authorization policies (explicitly listing all allowed paths and denying everything else) to reduce the risk of bypasses via malformed inputs. #### Severity - CVSS Score: 9.1 / 10 (Critical) - Vector String: `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N` #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) - [https://nvd.nist.gov/vuln/detail/CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) - [https://github.com/grpc/grpc-go](https://github.com/grpc/grpc-go) This data is provided by [OSV](https://osv.dev/vulnerability/GHSA-p77j-4mvh-x3m3) and the [GitHub Advisory Database](https://github.com/github/advisory-database) ([CC-BY 4.0](https://github.com/github/advisory-database/blob/main/LICENSE.md)). </details> --- ### Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc #### Severity Unknown #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) This data is provided by [OSV](https://osv.dev/vulnerability/GO-2026-4762) and the [Go Vulnerability Database](https://github.com/golang/vulndb) ([CC-BY 4.0](https://github.com/golang/vulndb#license)). </details> --- ### Release Notes <details> <summary>grpc/grpc-go (google.golang.org/grpc)</summary> ### [`v1.79.3`](https://github.com/grpc/grpc-go/releases/tag/v1.79.3): Release 1.79.3 [Compare Source](grpc/grpc-go@v1.79.2...v1.79.3) ### Security - server: fix an authorization bypass where malformed :path headers (missing the leading slash) could bypass path-based restricted "deny" rules in interceptors like `grpc/authz`. Any request with a non-canonical path is now immediately rejected with an `Unimplemented` error. ([#​8981](grpc/grpc-go#8981)) ### [`v1.79.2`](https://github.com/grpc/grpc-go/releases/tag/v1.79.2): Release 1.79.2 [Compare Source](grpc/grpc-go@v1.79.1...v1.79.2) ### Bug Fixes - stats: Prevent redundant error logging in health/ORCA producers by skipping stats/tracing processing when no stats handler is configured. ([#​8874](grpc/grpc-go#8874)) ### [`v1.79.1`](https://github.com/grpc/grpc-go/releases/tag/v1.79.1): Release 1.79.1 [Compare Source](grpc/grpc-go@v1.79.0...v1.79.1) ### Bug Fixes - grpc: Remove the `-dev` suffix from the User-Agent header. ([#​8902](grpc/grpc-go#8902)) ### [`v1.79.0`](https://github.com/grpc/grpc-go/releases/tag/v1.79.0): Release 1.79.0 [Compare Source](grpc/grpc-go@v1.78.0...v1.79.0) ### API Changes - mem: Add experimental API `SetDefaultBufferPool` to change the default buffer pool. ([#​8806](grpc/grpc-go#8806)) - Special Thanks: [@​vanja-p](https://github.com/vanja-p) - experimental/stats: Update `MetricsRecorder` to require embedding the new `UnimplementedMetricsRecorder` (a no-op struct) in all implementations for forward compatibility. ([#​8780](grpc/grpc-go#8780)) ### Behavior Changes - balancer/weightedtarget: Remove handling of `Addresses` and only handle `Endpoints` in resolver updates. ([#​8841](grpc/grpc-go#8841)) ### New Features - experimental/stats: Add support for asynchronous gauge metrics through the new `AsyncMetricReporter` and `RegisterAsyncReporter` APIs. ([#​8780](grpc/grpc-go#8780)) - pickfirst: Add support for weighted random shuffling of endpoints, as described in [gRFC A113](grpc/proposal#535). - This is enabled by default, and can be turned off using the environment variable `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. ([#​8864](grpc/grpc-go#8864)) - xds: Implement `:authority` rewriting, as specified in [gRFC A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md). ([#​8779](grpc/grpc-go#8779)) - balancer/randomsubsetting: Implement the `random_subsetting` LB policy, as specified in [gRFC A68](https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md). ([#​8650](grpc/grpc-go#8650)) - Special Thanks: [@​marek-szews](https://github.com/marek-szews) ### Bug Fixes - credentials/tls: Fix a bug where the port was not stripped from the authority override before validation. ([#​8726](grpc/grpc-go#8726)) - Special Thanks: [@​Atul1710](https://github.com/Atul1710) - xds/priority: Fix a bug causing delayed failover to lower-priority clusters when a higher-priority cluster is stuck in `CONNECTING` state. ([#​8813](grpc/grpc-go#8813)) - health: Fix a bug where health checks failed for clients using legacy compression options (`WithDecompressor` or `RPCDecompressor`). ([#​8765](grpc/grpc-go#8765)) - Special Thanks: [@​sanki92](https://github.com/sanki92) - transport: Fix an issue where the HTTP/2 server could skip header size checks when terminating a stream early. ([#​8769](grpc/grpc-go#8769)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) - server: Propagate status detail headers, if available, when terminating a stream during request header processing. ([#​8754](grpc/grpc-go#8754)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) ### Performance Improvements - credentials/alts: Optimize read buffer alignment to reduce copies. ([#​8791](grpc/grpc-go#8791)) - mem: Optimize pooling and creation of `buffer` objects. ([#​8784](grpc/grpc-go#8784)) - transport: Reduce slice re-allocations by reserving slice capacity. ([#​8797](grpc/grpc-go#8797)) ### [`v1.78.0`](https://github.com/grpc/grpc-go/releases/tag/v1.78.0): Release 1.78.0 [Compare Source](grpc/grpc-go@v1.77.0...v1.78.0) ### Behavior Changes - client: Align URL validation with Go 1.26+ to now reject target URLs with unbracketed colons in the hostname. ([#​8716](grpc/grpc-go#8716)) - Special Thanks: [@​neild](https://github.com/neild) - transport/client : Return status code `Unknown` on malformed grpc-status. ([#​8735](grpc/grpc-go#8735)) - - xds/resolver: - Drop previous route resources and report an error when no matching virtual host is found. - Only log LDS/RDS configuration errors following a successful update and retain the last valid resource to prevent transient failures. ([#​8711](grpc/grpc-go#8711)) ### New Features - stats/otel: Add backend service label to weighted round robin metrics as part of A89. ([#​8737](grpc/grpc-go#8737)) - stats/otel: Add subchannel metrics (without the disconnection reason) to eventually replace the pickfirst metrics. ([#​8738](grpc/grpc-go#8738)) - client: Wait for all pending goroutines to complete when closing a graceful switch balancer. ([#​8746](grpc/grpc-go#8746)) - Special Thanks: [@​twz123](https://github.com/twz123) - client: Add `experimental.AcceptCompressors` so callers can restrict the `grpc-accept-encoding` header advertised for a call. ([#​8718](grpc/grpc-go#8718)) - Special Thanks: [@​iblancasa](https://github.com/iblancasa) ### Bug Fixes - xds: Fix a bug in `StringMatcher` where regexes would match incorrectly when ignore\_case is set to true. ([#​8723](grpc/grpc-go#8723)) - client: - Change connectivity state to CONNECTING when creating the name resolver (as part of exiting IDLE). - Change connectivity state to TRANSIENT\_FAILURE if name resolver creation fails (as part of exiting IDLE). - Change connectivity state to IDLE after idle timeout expires even when current state is TRANSIENT\_FAILURE. - Fix a bug that resulted in `OnFinish` call option not being invoked for RPCs where stream creation failed. ([#​8710](grpc/grpc-go#8710)) - xdsclient: Fix a race in the xdsClient that could lead to resource-not-found errors. ([#​8627](grpc/grpc-go#8627)) ### Performance Improvements - mem: Round up to nearest 4KiB for pool allocations larger than 1MiB. ([#​8705](grpc/grpc-go#8705)) - Special Thanks: [@​cjc25](https://github.com/cjc25) ### [`v1.77.0`](https://github.com/grpc/grpc-go/releases/tag/v1.77.0): Release 1.77.0 [Compare Source](grpc/grpc-go@v1.76.0...v1.77.0) ### API Changes - mem: Replace the `Reader` interface with a struct for better performance and maintainability. ([#​8669](grpc/grpc-go#8669)) ### Behavior Changes - balancer/pickfirst: Remove support for the old `pick_first` LB policy via the environment variable `GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST=false`. The new `pick_first` has been the default since `v1.71.0`. ([#​8672](grpc/grpc-go#8672)) ### Bug Fixes - xdsclient: Fix a race condition in the ADS stream implementation that could result in `resource-not-found` errors, causing the gRPC client channel to move to `TransientFailure`. ([#​8605](grpc/grpc-go#8605)) - client: Ignore HTTP status header for gRPC streams. ([#​8548](grpc/grpc-go#8548)) - client: Set a read deadline when closing a transport to prevent it from blocking indefinitely on a broken connection. ([#​8534](grpc/grpc-go#8534)) - Special Thanks: [@​jgold2-stripe](https://github.com/jgold2-stripe) - client: Fix a bug where default port 443 was not automatically added to addresses without a specified port when sent to a proxy. - Setting environment variable `GRPC_EXPERIMENTAL_ENABLE_DEFAULT_PORT_FOR_PROXY_TARGET=false` disables this change; please file a bug if any problems are encountered as we will remove this option soon. ([#​8613](grpc/grpc-go#8613)) - balancer/pickfirst: Fix a bug where duplicate addresses were not being ignored as intended. ([#​8611](grpc/grpc-go#8611)) - server: Fix a bug that caused overcounting of channelz metrics for successful and failed streams. ([#​8573](grpc/grpc-go#8573)) - Special Thanks: [@​hugehoo](https://github.com/hugehoo) - balancer/pickfirst: When configured, shuffle addresses in resolver updates that lack endpoints. Since gRPC automatically adds endpoints to resolver updates, this bug only affects custom LB policies that delegate to `pick_first` but don't set endpoints. ([#​8610](grpc/grpc-go#8610)) - mem: Clear large buffers before re-using. ([#​8670](grpc/grpc-go#8670)) ### Performance Improvements - transport: Reduce heap allocations to reduce time spent in garbage collection. ([#​8624](grpc/grpc-go#8624), [#​8630](grpc/grpc-go#8630), [#​8639](grpc/grpc-go#8639), [#​8668](grpc/grpc-go#8668)) - transport: Avoid copies when reading and writing Data frames. ([#​8657](grpc/grpc-go#8657), [#​8667](grpc/grpc-go#8667)) - mem: Avoid clearing newly allocated buffers. ([#​8670](grpc/grpc-go#8670)) ### New Features - outlierdetection: Add metrics specified in [gRFC A91](https://github.com/grpc/proposal/blob/master/A91-outlier-detection-metrics.md). ([#​8644](grpc/grpc-go#8644)) - Special Thanks: [@​davinci26](https://github.com/davinci26), [@​PardhuKonakanchi](https://github.com/PardhuKonakanchi) - stats/opentelemetry: Add support for optional label `grpc.lb.backend_service` in per-call metrics ([#​8637](grpc/grpc-go#8637)) - xds: Add support for JWT Call Credentials as specified in [gRFC A97](https://github.com/grpc/proposal/blob/master/A97-xds-jwt-call-creds.md). Set environment variable `GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS=true` to enable this feature. ([#​8536](grpc/grpc-go#8536)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) - experimental/stats: Add support for up/down counters. ([#​8581](grpc/grpc-go#8581)) ### [`v1.76.0`](https://github.com/grpc/grpc-go/releases/tag/v1.76.0): Release 1.76.0 [Compare Source](grpc/grpc-go@v1.75.1...v1.76.0) ### Dependencies - Minimum supported Go version is now 1.24 ([#​8509](grpc/grpc-go#8509)) - Special Thanks: [@​kevinGC](https://github.com/kevinGC) ### Bug Fixes - client: Return status `INTERNAL` when a server sends zero response messages for a unary or client-streaming RPC. ([#​8523](grpc/grpc-go#8523)) - client: Fail RPCs with status `INTERNAL` instead of `UNKNOWN` upon receiving http headers with status 1xx and `END_STREAM` flag set. ([#​8518](grpc/grpc-go#8518)) - Special Thanks: [@​vinothkumarr227](https://github.com/vinothkumarr227) - pick\_first: Fix race condition that could cause pick\_first to get stuck in `IDLE` state on backend address change. ([#​8615](grpc/grpc-go#8615)) ### New Features - credentials: Add `credentials/jwt` package providing file-based JWT PerRPCCredentials (A97). ([#​8431](grpc/grpc-go#8431)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) ### Performance Improvements - client: Improve HTTP/2 header size estimate to reduce re-allocations. ([#​8547](grpc/grpc-go#8547)) - encoding/proto: Avoid redundant message size calculation when marshaling. ([#​8569](grpc/grpc-go#8569)) - Special Thanks: [@​rs-unity](https://github.com/rs-unity) ### [`v1.75.1`](https://github.com/grpc/grpc-go/releases/tag/v1.75.1): Release 1.75.1 [Compare Source](grpc/grpc-go@v1.75.0...v1.75.1) ### Bug Fixes - transport: Fix a data race while copying headers for stats handlers in the std lib http2 server transport. ([#​8519](grpc/grpc-go#8519)) - xdsclient: - Fix a data race caused while reporting load to LRS. ([#​8483](grpc/grpc-go#8483)) - Fix regression preventing empty node IDs when creating an LRS client. ([#​8483](grpc/grpc-go#8483)) - server: Fix a regression preventing streams from being cancelled or timed out when blocked on flow control. ([#​8528](grpc/grpc-go#8528)) </details> --- ### Configuration 📅 **Schedule**: (UTC) - Branch creation - "" - Automerge - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xOTUuMSIsInVwZGF0ZWRJblZlciI6IjQzLjE5NS4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/12794 Reviewed-by: Mathieu Fenniak <mfenniak@noreply.codeberg.org>
Addresses : https://github.com/grpc/proposal/blob/master/A94-subchannel-otel-metrics.md
This PR adds subchannel metrics with applicable labels as per the RFC proposal.
disconnection_reasonwill be added as a follow up PR.RELEASE NOTES: