client: Change connectivity state to CONNECTING when creating the name resolver#8710
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8710 +/- ##
==========================================
+ Coverage 83.28% 83.34% +0.05%
==========================================
Files 416 418 +2
Lines 32267 32385 +118
==========================================
+ Hits 26874 26990 +116
- Misses 4019 4024 +5
+ Partials 1374 1371 -3
🚀 New features to boost your workflow:
|
|
|
||
| func (s stringerVal) String() string { return s.s } | ||
|
|
||
| const errResolverBuildercheme = "test-resolver-build-failure" |
| // https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md | ||
| // defines CONNECTING as follows: | ||
| // - The channel is trying to establish a connection and is waiting to | ||
| // make progress on one of the steps involved in name resolution, TCP | ||
| // connection establishment or TLS handshake. This may be used as the | ||
| // initial state for channels upon creation. | ||
| // | ||
| // We are starting the name resolver here as part of exiting IDLE, so | ||
| // transitioning to CONNECTING is the right thing to do. |
There was a problem hiding this comment.
IMO comments should be short and to the point.
Short comments make the code take up less space, which makes it easier to read and understand. Long comments make long functions extremely long and not fit on the page.
Honestly, I think a comment for this action isn't even necessary. But if you think we need one, this could be:
// Set state to CONNECTING before building the name resolver
// so the channel does not remain in IDLE.| if state := cc.GetState(); state != connectivity.Idle { | ||
| t.Fatalf("Expected initial state to be IDLE, got %v", state) | ||
| } |
There was a problem hiding this comment.
The AwaitState above already tested this IIUC
| // Ensure that the client is in IDLE before connecting. | ||
| ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout) | ||
| defer cancel() | ||
| testutils.AwaitState(ctx, t, cc, connectivity.Idle) |
There was a problem hiding this comment.
This doesn't need an Await right? It should just check the current state, and never wait for changes, as we know it starts idle.
There was a problem hiding this comment.
That's true. Moved the check for the current state to here, and got rid of the Await.
| // Tests the case where the resolver reports an error to the channel before | ||
| // reporting an update. Verifies that the channel eventually moves to | ||
| // TransientFailure and a subsequent RPC returns the error reported by the | ||
| // TransientFailure and a subsequent RPCs returns the error reported by the |
| Authority: ccr.cc.authority, | ||
| MetricsRecorder: ccr.cc.metricsRecorderList, | ||
| } | ||
|
|
There was a problem hiding this comment.
Please revert this diff & file.
|
@dfawley |
| dopts := []grpc.DialOption{ | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| grpc.WithResolvers(&testResolverBuilder{logger: t, manualR: mr}), | ||
| grpc.WithIdleTimeout(time.Second), |
There was a problem hiding this comment.
Can we avoid assuming that a 1s interval is sufficient for 2 attempts to fail?
One solution might be to split this into two tests:
- Test that setting a sufficiently large idle timeout keeps the channel in TF and doesn't trigger the resolver builder again.
- Set a defaultTestShortIdleTimeout and ensure the channel exits TF, calling the resolver builder again.
There was a problem hiding this comment.
We had a way to enter IDLE mode from tests. But looks like that goes through the regular idleness logic. So, I added another one to forcefully put the channel in IDLE (for testing purposes), and changed the test to forcefully enter IDLE after it enters TF.
There was a problem hiding this comment.
@easwars the way the test can flake is if there's a gap of more than 1s b/w the two RPC attempts. The sequence of events is:
- RPC 1 is made, results in
testResolverBuilder.Buildbeing called and failing. RPC fails with code UNAVAILABLE. - 1s passes, the channel enters IDLE mode.
- RPC 2 is made, results in
testResolverBuilder.Buildbeing called. Since the builder only returns an error for the first call, theBuildcall succeeds this time. The RPC doesn't fail with the expected error.
There was a problem hiding this comment.
Yes, that is a valid sequence of events. But there is nothing else happening in this test at all when that for loop making the two RPCs is running. And the RPCs don't even get far enough to create an LB policy, create subchannels, create transports, none of that good stuff is happening. While I agree it could possible take a second after the first RPC runs before the second one gets to run, I feel the probability is miniscule.
There was a problem hiding this comment.
I agree 1s is sufficient, but it introduces a fixed latency—the test will always take the full second even if it could finish in milliseconds. It's not a blocker for this PR, but something to keep in mind.
dfawley
left a comment
There was a problem hiding this comment.
This looks great now, I think this is what we want. I think we should take about naming a bit (which is just textual changes), and I'm concerned about the ForceIdle thing. Otherwise LGTM!
| // MarkAsExitedIdleMode instructs the Manager to update its internal state to | ||
| // indicate that the channel has exited IDLE mode. This is only used by the gRPC | ||
| // client when it exits IDLE mode manually from Dial. | ||
| func (m *Manager) MarkAsExitedIdleMode() { |
There was a problem hiding this comment.
This probably needs warnings on it. Generally you can't ever use it in a situation where you can possibly race with anything else calling ExitIdleMode or OnCallBegin, and you must know you are really idle.
Bikeshedding:
UnsafeSetNotIdle()? UnsafeSetActive()? InitializeAsActive() (to convey this is only to be used as part of creation)?
Aside but very related: Do we already say "active" is the opposite of "idle" anywhere? Or are the only states "in idle mode" and "not in idle mode". The latter feels a little awkward. Maybe we can rename things to use "active" vs "idle":
ExitIdleMode() -> Activate()
EnterIdleModeForTesting() -> GoIdle() // Do we really need to ForTesting this? It seems safe to use.
Further bikeshedding that's unrelated: I never found the "Enforcer" name to be intuitive either. I've always thought of the manager as the thing doing the enforcing. The object it's controlling is just enacting the idle manager's commands. How about Actor or Agent or Worker or Channel instead?
There was a problem hiding this comment.
This probably needs warnings on it
Improved the docstring to make it sound more scary.
Bikeshedding
Went with UnsafeSetNotIdle
Do we already say "active" is the opposite of "idle" anywhere?
Yeah, we have never used the term "active" to mean the opposite of "idle" anywhere so far.
Maybe we can rename things to use "active" vs "idle"
We probably could do that. Can we do it in a follow up though?
EnterIdleModeForTesting() -> GoIdle() // Do we really need to ForTesting this? It seems safe to use.
Yes, we need this to test the race between the channel going idle after idle_timeout firing and an RPC trying to keep it from not going idle.
I never found the "Enforcer" name to be intuitive either.
I liked Channel since it conveyed the meaning directly, but eventually decided to go with ClientConn to keep it consistent with other places where an interface's functionality is provided by the client channel.
|
|
||
| // ForceEnterIdleModeForTesting bypasses the usual checks that happen before | ||
| // entering idle mode, and forcefully enter idle mode for testing purposes. | ||
| func (m *Manager) ForceEnterIdleModeForTesting() { |
There was a problem hiding this comment.
Why do we need this? It feels pretty dangerous to even have it. What are we doing that requires us to enter idle mode even with pending calls?
There was a problem hiding this comment.
I added this so that we can force entering IDLE from tests after making RPCs (and completing those RPCs), but not having to set a small idle_timeout and be vulnerable to timing flakes.
Essentially instead of setting an idle_timeout of say 1s, making an RPC, and waiting for a second (or a little more) for the channel to enter idle, the test could make an RPC, and once the RPC is complete, it can force the channel to enter idle becuase it knows there are no pending calls.
The existing EnterIdleModeForTesting which I've renamed to TryEnterIdleModeForTesting works within the parameters of idleness and moves the channel to idle mode only if there has been no activity on the channel. We could have named it as FireIdleTimeoutForTesting because that is essentially what is happening as part of this method and is required for testing races between when the channel is trying to enter idle (as part of the idle timeout firing) and exit idle (as part an RPC).
There was a problem hiding this comment.
Coming back to this thread though: #8710 (comment)
Can we avoid assuming that a 1s interval is sufficient for 2 attempts to fail?
Actually the code wasn't assuming that a 1s interval was sufficient for 2 attempts to fail. The 1s interval just determined when the idle timeout would fire and everytime it fired, it would check if the channel had no activity since the last time it fired. I'm going to go back to the previous approach for the following reasons:
1sshould be plenty for these two attempts to fail because nothing happens in these two attempts, except the attempt to build the resolver which would fail, which would result in the RPC failing- Even if the two attempts took more than
1s, it wouldn't affect the correctness of the test as the idle timeout would fire and the idleness manager would see that there was activity on the channel and therefore would do nothing. And the next time it fires (or the next time), it would eventually find no activity on the channel and would move it to idle
This would allow me to get rid of this ForceEnterIdleModeForTesting.
There was a problem hiding this comment.
I don't understand. Why would we want to simulate the idle timeout firing? Anything operating at that low of a level should be internal to the idle package. Everything else should have a safe way to say "go idle now -- unless there are pending RPCs by the time this call gets in and the idle lock is taken".
There was a problem hiding this comment.
Are you concerned more about the name or the functionality of that API itself? This is how it was before the change also. After reading your last comment, I agree that the new name FireIdleTimeoutForTesting seems very low level and can be changed back to EnterIdleModeForTesting. The problem though is that it is possible that it doesn't enter idle mode, because it uses the same logic that non-test code would use to decide if it should try entering idle mode or not. The problem with using separate logic in tests to enter idle mode is that we might not be testing the right set of things for example when we want to test the race between idle timeout firing (and trying to put the channel in idle) and an RPC happening at the same time (that is trying to keep the channel from entering idle).
There was a problem hiding this comment.
Added a comment in the initial thread about how the 1s interval can cause the test to flake: #8710 (comment)
There was a problem hiding this comment.
we might not be testing the right set of things for example when we want to test the race between idle timeout firing (and trying to put the channel in idle) and an RPC happening at the same time (that is trying to keep the channel from entering idle).
We shouldn't need these kinds of tests in the channel, though, should we? From the channel's perspective, it needs to properly deal with the race between RPC start and a call to EnterIdle() from the idle manager. It doesn't know or care about the mechanics of the timer firing. Testing the idle manager might require such unit-style tests, in which case "simulating the timer firing" can be a non-exported function. Maybe I'm missing something though.
There was a problem hiding this comment.
We shouldn't need these kinds of tests in the channel, though, should we?
Looks like we have a unit style test for exactly the same thing and it does exactly what you are saying: one goroutine trying to fire the idle timeout and another one trying to call OnCallBeing to start an RPC
So, I ended up removing the test that triggers the same from the channel.
arjan-bal
left a comment
There was a problem hiding this comment.
LGTM, there's still a discussion about removing ForceEnterIdleModeForTesting. I'm fine with either outcome.
| m.tryEnterIdleMode() | ||
| // FireIdleTimeoutForTesting forcefully triggers the idle timeout to fire. | ||
| func (m *Manager) FireIdleTimeoutForTesting() { | ||
| m.handleIdleTimeout() |
There was a problem hiding this comment.
From your other comment:
Are you concerned more about the name or the functionality of that API itself? This is how it was before the change also.
But...this is a behavior change, right?
There was a problem hiding this comment.
I've brought back EnterIdleModeForTesting now, but the implementation takes inspiration from how to exit idle from Dial. So the implementation now calls the unexported enterIdleMode on the clientconn and calls a method on the idleness manager to let it know that we have entered idle (so go update your internal state).
This also means that I don't have to use the 1s idle_timeout from one of those tests, and also not directly cause the idle timeout to fire from the test, but instead use a higher level API to ask the clientconn to enter idle mode.
|
@dfawley |
| // | ||
| // N.B. This method is intended only for testing purposes. The caller must | ||
| // ensure that there are no ongoing RPCs | ||
| func (m *Manager) UnsafeSetIdleForTesting() error { |
There was a problem hiding this comment.
Why do you prefer this over the old version? This is dangerous while the old one was safe. And the old one will work in all the same scenarios that the new one will work in (when you know for sure the channel is not able to perform RPCs), but not vice-versa.
There was a problem hiding this comment.
The reason I prefer this one over the previous version is because the idleness manager not only tracks the ongoing RPCs, but also when the last one completed. The latter means that when tests have performed an RPC before wanting to push the channel into idle mode, they need to wait for idle_timeout to fire. Or set the idle_timeout to a really low value and be susceptible (sometimes with very low probabilities) to timing flakes. All previous usages of the old removed EnterIdleModeForTesting were from tests that never made an RPC. But now we have tests that do.
| if channelz.IsOn() { | ||
| cc.incrCallsFailed() | ||
| } | ||
| // Invoke all the registered OnFinish call options explicitly. A | ||
| // non-nil error means that the stream wasn't created, and | ||
| // therefore these will be NOT be invoked as part of `cs.finish()`. | ||
| for _, o := range opts { | ||
| if o, ok := o.(OnFinishCallOption); ok { | ||
| o.OnFinish(err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Let's see if we can find a better way to unify these things?
Maybe we can have a shared endOfRPC function that is called by finish and from here?
| if checkActivity { | ||
| if atomic.LoadInt32(&m.activeSinceLastTimerCheck) == 1 { |
There was a problem hiding this comment.
These can be merged to remove an indent.
| var emptyMethodConfig = serviceconfig.MethodConfig{} | ||
|
|
||
| // endOfClientStream performs cleanup actions required for both successful and | ||
| // failed streams. This includes incrementing channelz stats and invoke all |
There was a problem hiding this comment.
| // failed streams. This includes incrementing channelz stats and invoke all | |
| // failed streams. This includes incrementing channelz stats and invoking all |
## Summary #6011 wants to bump `google.golang.org/grpc` from v1.77.0 to v1.78.0. [1] That is causing TestLoadSessionHandle in github.com/pomerium/pomerium/proxy to time out after 10 minutes. The tests create new proxies and call ServeHTTP on test requests. That execution gets all the way to [GetDataBrokerRecord in querier.go](https://github.com/pomerium/pomerium/blob/ba0fcffe81f27f33dd5aa858dba3a1276208ddcd/pkg/storage/querier.go#L70-L95), where it stalls out on `res, err := q.Query(ctx, req, grpc.WaitForReady(true))`. What I think is happening is `grpc.WaitForReady(true)` makes `q.Query` wait for the GRPC client to connect to the server. [grpc-go v1.78.0](https://github.com/grpc/grpc-go/releases/tag/v1.78.0) fixes a [bug](grpc/grpc-go#8710) where the client would stay in the IDLE state while trying to connect to the GRPC server. What happens now is the client moves to the TRANSIENT_FAILURE state. The WaitForReady option sees TRANSIENT_FAILURE and waits until that failure fixes itself. By setting a timeout in the context, we pass that context to `q.Query` eventually, the timeout fires, and GetDataBrokerRecord returns, which is what the test expected all along. [1]: I bisected the go.mod file in that branch to figure out which dependency was causing the test to time out. ## Related issues #6011 grpc/grpc-go#8710 ## User Explanation This fixes a bug that was causing a test to fail. The test was relying on a bug in the grpc-go client, which was fixed. ## Checklist - [x] reference any related issues - [x] updated unit tests - [x] add appropriate label (`enhancement`, `bug`, `breaking`, `dependencies`, `ci`) - [x] ready for review
…jo) (#12794) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `v1.75.0` → `v1.79.3` |  |  | --- ### gRPC-Go has an authorization bypass via missing leading slash in :path [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details ##### Impact _What kind of vulnerability is it? Who is impacted?_ It is an **Authorization Bypass** resulting from **Improper Input Validation** of the HTTP/2 `:path` pseudo-header. The gRPC-Go server was too lenient in its routing logic, accepting requests where the `:path` omitted the mandatory leading slash (e.g., `Service/Method` instead of `/Service/Method`). While the server successfully routed these requests to the correct handler, authorization interceptors (including the official `grpc/authz` package) evaluated the raw, non-canonical path string. Consequently, "deny" rules defined using canonical paths (starting with `/`) failed to match the incoming request, allowing it to bypass the policy if a fallback "allow" rule was present. **Who is impacted?** This affects gRPC-Go servers that meet both of the following criteria: 1. They use path-based authorization interceptors, such as the official RBAC implementation in `google.golang.org/grpc/authz` or custom interceptors relying on `info.FullMethod` or `grpc.Method(ctx)`. 2. Their security policy contains specific "deny" rules for canonical paths but allows other requests by default (a fallback "allow" rule). The vulnerability is exploitable by an attacker who can send raw HTTP/2 frames with malformed `:path` headers directly to the gRPC server. ##### Patches _Has the problem been patched? What versions should users upgrade to?_ Yes, the issue has been patched. The fix ensures that any request with a `:path` that does not start with a leading slash is immediately rejected with a `codes.Unimplemented` error, preventing it from reaching authorization interceptors or handlers with a non-canonical path string. Users should upgrade to the following versions (or newer): * **v1.79.3** * The latest **master** branch. It is recommended that all users employing path-based authorization (especially `grpc/authz`) upgrade as soon as the patch is available in a tagged release. ##### Workarounds _Is there a way for users to fix or remediate the vulnerability without upgrading?_ While upgrading is the most secure and recommended path, users can mitigate the vulnerability using one of the following methods: ##### 1. Use a Validating Interceptor (Recommended Mitigation) Add an "outermost" interceptor to your server that validates the path before any other authorization logic runs: ```go func pathValidationInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) { if info.FullMethod == "" || info.FullMethod[0] != '/' { return nil, status.Errorf(codes.Unimplemented, "malformed method name") } return handler(ctx, req) } // Ensure this is the FIRST interceptor in your chain s := grpc.NewServer( grpc.ChainUnaryInterceptor(pathValidationInterceptor, authzInterceptor), ) ``` ##### 2. Infrastructure-Level Normalization If your gRPC server is behind a reverse proxy or load balancer (such as Envoy, NGINX, or an L7 Cloud Load Balancer), ensure it is configured to enforce strict HTTP/2 compliance for pseudo-headers and reject or normalize requests where the `:path` header does not start with a leading slash. ##### 3. Policy Hardening Switch to a "default deny" posture in your authorization policies (explicitly listing all allowed paths and denying everything else) to reduce the risk of bypasses via malformed inputs. #### Severity - CVSS Score: 9.1 / 10 (Critical) - Vector String: `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N` #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) - [https://nvd.nist.gov/vuln/detail/CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) - [https://github.com/grpc/grpc-go](https://github.com/grpc/grpc-go) This data is provided by [OSV](https://osv.dev/vulnerability/GHSA-p77j-4mvh-x3m3) and the [GitHub Advisory Database](https://github.com/github/advisory-database) ([CC-BY 4.0](https://github.com/github/advisory-database/blob/main/LICENSE.md)). </details> --- ### Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc #### Severity Unknown #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) This data is provided by [OSV](https://osv.dev/vulnerability/GO-2026-4762) and the [Go Vulnerability Database](https://github.com/golang/vulndb) ([CC-BY 4.0](https://github.com/golang/vulndb#license)). </details> --- ### Release Notes <details> <summary>grpc/grpc-go (google.golang.org/grpc)</summary> ### [`v1.79.3`](https://github.com/grpc/grpc-go/releases/tag/v1.79.3): Release 1.79.3 [Compare Source](grpc/grpc-go@v1.79.2...v1.79.3) ### Security - server: fix an authorization bypass where malformed :path headers (missing the leading slash) could bypass path-based restricted "deny" rules in interceptors like `grpc/authz`. Any request with a non-canonical path is now immediately rejected with an `Unimplemented` error. ([#​8981](grpc/grpc-go#8981)) ### [`v1.79.2`](https://github.com/grpc/grpc-go/releases/tag/v1.79.2): Release 1.79.2 [Compare Source](grpc/grpc-go@v1.79.1...v1.79.2) ### Bug Fixes - stats: Prevent redundant error logging in health/ORCA producers by skipping stats/tracing processing when no stats handler is configured. ([#​8874](grpc/grpc-go#8874)) ### [`v1.79.1`](https://github.com/grpc/grpc-go/releases/tag/v1.79.1): Release 1.79.1 [Compare Source](grpc/grpc-go@v1.79.0...v1.79.1) ### Bug Fixes - grpc: Remove the `-dev` suffix from the User-Agent header. ([#​8902](grpc/grpc-go#8902)) ### [`v1.79.0`](https://github.com/grpc/grpc-go/releases/tag/v1.79.0): Release 1.79.0 [Compare Source](grpc/grpc-go@v1.78.0...v1.79.0) ### API Changes - mem: Add experimental API `SetDefaultBufferPool` to change the default buffer pool. ([#​8806](grpc/grpc-go#8806)) - Special Thanks: [@​vanja-p](https://github.com/vanja-p) - experimental/stats: Update `MetricsRecorder` to require embedding the new `UnimplementedMetricsRecorder` (a no-op struct) in all implementations for forward compatibility. ([#​8780](grpc/grpc-go#8780)) ### Behavior Changes - balancer/weightedtarget: Remove handling of `Addresses` and only handle `Endpoints` in resolver updates. ([#​8841](grpc/grpc-go#8841)) ### New Features - experimental/stats: Add support for asynchronous gauge metrics through the new `AsyncMetricReporter` and `RegisterAsyncReporter` APIs. ([#​8780](grpc/grpc-go#8780)) - pickfirst: Add support for weighted random shuffling of endpoints, as described in [gRFC A113](grpc/proposal#535). - This is enabled by default, and can be turned off using the environment variable `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. ([#​8864](grpc/grpc-go#8864)) - xds: Implement `:authority` rewriting, as specified in [gRFC A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md). ([#​8779](grpc/grpc-go#8779)) - balancer/randomsubsetting: Implement the `random_subsetting` LB policy, as specified in [gRFC A68](https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md). ([#​8650](grpc/grpc-go#8650)) - Special Thanks: [@​marek-szews](https://github.com/marek-szews) ### Bug Fixes - credentials/tls: Fix a bug where the port was not stripped from the authority override before validation. ([#​8726](grpc/grpc-go#8726)) - Special Thanks: [@​Atul1710](https://github.com/Atul1710) - xds/priority: Fix a bug causing delayed failover to lower-priority clusters when a higher-priority cluster is stuck in `CONNECTING` state. ([#​8813](grpc/grpc-go#8813)) - health: Fix a bug where health checks failed for clients using legacy compression options (`WithDecompressor` or `RPCDecompressor`). ([#​8765](grpc/grpc-go#8765)) - Special Thanks: [@​sanki92](https://github.com/sanki92) - transport: Fix an issue where the HTTP/2 server could skip header size checks when terminating a stream early. ([#​8769](grpc/grpc-go#8769)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) - server: Propagate status detail headers, if available, when terminating a stream during request header processing. ([#​8754](grpc/grpc-go#8754)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) ### Performance Improvements - credentials/alts: Optimize read buffer alignment to reduce copies. ([#​8791](grpc/grpc-go#8791)) - mem: Optimize pooling and creation of `buffer` objects. ([#​8784](grpc/grpc-go#8784)) - transport: Reduce slice re-allocations by reserving slice capacity. ([#​8797](grpc/grpc-go#8797)) ### [`v1.78.0`](https://github.com/grpc/grpc-go/releases/tag/v1.78.0): Release 1.78.0 [Compare Source](grpc/grpc-go@v1.77.0...v1.78.0) ### Behavior Changes - client: Align URL validation with Go 1.26+ to now reject target URLs with unbracketed colons in the hostname. ([#​8716](grpc/grpc-go#8716)) - Special Thanks: [@​neild](https://github.com/neild) - transport/client : Return status code `Unknown` on malformed grpc-status. ([#​8735](grpc/grpc-go#8735)) - - xds/resolver: - Drop previous route resources and report an error when no matching virtual host is found. - Only log LDS/RDS configuration errors following a successful update and retain the last valid resource to prevent transient failures. ([#​8711](grpc/grpc-go#8711)) ### New Features - stats/otel: Add backend service label to weighted round robin metrics as part of A89. ([#​8737](grpc/grpc-go#8737)) - stats/otel: Add subchannel metrics (without the disconnection reason) to eventually replace the pickfirst metrics. ([#​8738](grpc/grpc-go#8738)) - client: Wait for all pending goroutines to complete when closing a graceful switch balancer. ([#​8746](grpc/grpc-go#8746)) - Special Thanks: [@​twz123](https://github.com/twz123) - client: Add `experimental.AcceptCompressors` so callers can restrict the `grpc-accept-encoding` header advertised for a call. ([#​8718](grpc/grpc-go#8718)) - Special Thanks: [@​iblancasa](https://github.com/iblancasa) ### Bug Fixes - xds: Fix a bug in `StringMatcher` where regexes would match incorrectly when ignore\_case is set to true. ([#​8723](grpc/grpc-go#8723)) - client: - Change connectivity state to CONNECTING when creating the name resolver (as part of exiting IDLE). - Change connectivity state to TRANSIENT\_FAILURE if name resolver creation fails (as part of exiting IDLE). - Change connectivity state to IDLE after idle timeout expires even when current state is TRANSIENT\_FAILURE. - Fix a bug that resulted in `OnFinish` call option not being invoked for RPCs where stream creation failed. ([#​8710](grpc/grpc-go#8710)) - xdsclient: Fix a race in the xdsClient that could lead to resource-not-found errors. ([#​8627](grpc/grpc-go#8627)) ### Performance Improvements - mem: Round up to nearest 4KiB for pool allocations larger than 1MiB. ([#​8705](grpc/grpc-go#8705)) - Special Thanks: [@​cjc25](https://github.com/cjc25) ### [`v1.77.0`](https://github.com/grpc/grpc-go/releases/tag/v1.77.0): Release 1.77.0 [Compare Source](grpc/grpc-go@v1.76.0...v1.77.0) ### API Changes - mem: Replace the `Reader` interface with a struct for better performance and maintainability. ([#​8669](grpc/grpc-go#8669)) ### Behavior Changes - balancer/pickfirst: Remove support for the old `pick_first` LB policy via the environment variable `GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST=false`. The new `pick_first` has been the default since `v1.71.0`. ([#​8672](grpc/grpc-go#8672)) ### Bug Fixes - xdsclient: Fix a race condition in the ADS stream implementation that could result in `resource-not-found` errors, causing the gRPC client channel to move to `TransientFailure`. ([#​8605](grpc/grpc-go#8605)) - client: Ignore HTTP status header for gRPC streams. ([#​8548](grpc/grpc-go#8548)) - client: Set a read deadline when closing a transport to prevent it from blocking indefinitely on a broken connection. ([#​8534](grpc/grpc-go#8534)) - Special Thanks: [@​jgold2-stripe](https://github.com/jgold2-stripe) - client: Fix a bug where default port 443 was not automatically added to addresses without a specified port when sent to a proxy. - Setting environment variable `GRPC_EXPERIMENTAL_ENABLE_DEFAULT_PORT_FOR_PROXY_TARGET=false` disables this change; please file a bug if any problems are encountered as we will remove this option soon. ([#​8613](grpc/grpc-go#8613)) - balancer/pickfirst: Fix a bug where duplicate addresses were not being ignored as intended. ([#​8611](grpc/grpc-go#8611)) - server: Fix a bug that caused overcounting of channelz metrics for successful and failed streams. ([#​8573](grpc/grpc-go#8573)) - Special Thanks: [@​hugehoo](https://github.com/hugehoo) - balancer/pickfirst: When configured, shuffle addresses in resolver updates that lack endpoints. Since gRPC automatically adds endpoints to resolver updates, this bug only affects custom LB policies that delegate to `pick_first` but don't set endpoints. ([#​8610](grpc/grpc-go#8610)) - mem: Clear large buffers before re-using. ([#​8670](grpc/grpc-go#8670)) ### Performance Improvements - transport: Reduce heap allocations to reduce time spent in garbage collection. ([#​8624](grpc/grpc-go#8624), [#​8630](grpc/grpc-go#8630), [#​8639](grpc/grpc-go#8639), [#​8668](grpc/grpc-go#8668)) - transport: Avoid copies when reading and writing Data frames. ([#​8657](grpc/grpc-go#8657), [#​8667](grpc/grpc-go#8667)) - mem: Avoid clearing newly allocated buffers. ([#​8670](grpc/grpc-go#8670)) ### New Features - outlierdetection: Add metrics specified in [gRFC A91](https://github.com/grpc/proposal/blob/master/A91-outlier-detection-metrics.md). ([#​8644](grpc/grpc-go#8644)) - Special Thanks: [@​davinci26](https://github.com/davinci26), [@​PardhuKonakanchi](https://github.com/PardhuKonakanchi) - stats/opentelemetry: Add support for optional label `grpc.lb.backend_service` in per-call metrics ([#​8637](grpc/grpc-go#8637)) - xds: Add support for JWT Call Credentials as specified in [gRFC A97](https://github.com/grpc/proposal/blob/master/A97-xds-jwt-call-creds.md). Set environment variable `GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS=true` to enable this feature. ([#​8536](grpc/grpc-go#8536)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) - experimental/stats: Add support for up/down counters. ([#​8581](grpc/grpc-go#8581)) ### [`v1.76.0`](https://github.com/grpc/grpc-go/releases/tag/v1.76.0): Release 1.76.0 [Compare Source](grpc/grpc-go@v1.75.1...v1.76.0) ### Dependencies - Minimum supported Go version is now 1.24 ([#​8509](grpc/grpc-go#8509)) - Special Thanks: [@​kevinGC](https://github.com/kevinGC) ### Bug Fixes - client: Return status `INTERNAL` when a server sends zero response messages for a unary or client-streaming RPC. ([#​8523](grpc/grpc-go#8523)) - client: Fail RPCs with status `INTERNAL` instead of `UNKNOWN` upon receiving http headers with status 1xx and `END_STREAM` flag set. ([#​8518](grpc/grpc-go#8518)) - Special Thanks: [@​vinothkumarr227](https://github.com/vinothkumarr227) - pick\_first: Fix race condition that could cause pick\_first to get stuck in `IDLE` state on backend address change. ([#​8615](grpc/grpc-go#8615)) ### New Features - credentials: Add `credentials/jwt` package providing file-based JWT PerRPCCredentials (A97). ([#​8431](grpc/grpc-go#8431)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) ### Performance Improvements - client: Improve HTTP/2 header size estimate to reduce re-allocations. ([#​8547](grpc/grpc-go#8547)) - encoding/proto: Avoid redundant message size calculation when marshaling. ([#​8569](grpc/grpc-go#8569)) - Special Thanks: [@​rs-unity](https://github.com/rs-unity) ### [`v1.75.1`](https://github.com/grpc/grpc-go/releases/tag/v1.75.1): Release 1.75.1 [Compare Source](grpc/grpc-go@v1.75.0...v1.75.1) ### Bug Fixes - transport: Fix a data race while copying headers for stats handlers in the std lib http2 server transport. ([#​8519](grpc/grpc-go#8519)) - xdsclient: - Fix a data race caused while reporting load to LRS. ([#​8483](grpc/grpc-go#8483)) - Fix regression preventing empty node IDs when creating an LRS client. ([#​8483](grpc/grpc-go#8483)) - server: Fix a regression preventing streams from being cancelled or timed out when blocked on flow control. ([#​8528](grpc/grpc-go#8528)) </details> --- ### Configuration 📅 **Schedule**: (UTC) - Branch creation - "" - Automerge - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xOTUuMSIsInVwZGF0ZWRJblZlciI6IjQzLjE5NS4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/12794 Reviewed-by: Mathieu Fenniak <mfenniak@noreply.codeberg.org>
Fixes #7686
Current Behavior
Connectbeing called or because of an RPC), if name resolver creation fails, we stay in IDLE.New Behavior
Connectbeing called or because of an RPC), we have already moved to CONNECTING (because of the previous bullet point). If name resolver creation fails, we will move to TRANSIENT_FAILURE and start the idle timer and move back to IDLE when the timer firesImplementation details:
Builduses a new unsafe API on the idleness manager to mark the channel as exited IDLE.ExitIdleMode(which now does not return an error) and updates internal state to reflect that it is no longer in IDLE.OnFinishcall options are now invoked even if stream creation fails during an RPC. This fulfills the guarantee for these options and ensures the idleness Manager’sactiveCallsCountremains accurate.RELEASE NOTES:
OnFinishcall option not being invoked for RPCs where stream creation failed.