mem: Optimize buffer object re-use#8784
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8784 +/- ##
==========================================
+ Coverage 83.22% 83.43% +0.21%
==========================================
Files 418 417 -1
Lines 32385 32952 +567
==========================================
+ Hits 26952 27494 +542
- Misses 4050 4064 +14
- Partials 1383 1394 +11
🚀 New features to boost your workflow:
|
45c3231 to
e1a28ac
Compare
7619ed9 to
c33b2ae
Compare
c33b2ae to
3331987
Compare
| // initialized enables sanity checks without the overhead of atomic | ||
| // operations. This field is not safe for concurrent access and is used in a | ||
| // best-effort manner for assertion purposes only. It does not play a role | ||
| // in the concurrent logic of reference counting. |
There was a problem hiding this comment.
Couple of things here:
- The
Bufferinterface does that a buffer is not safe for concurrent access. Given that, do we need this to be mentioned here? - Do you have an idea of how much overhead the atomic operation of checking if the ref count is zero causes? The reason I'm asking is because this new field (and the checks associated with it) are sprinkled across multiple methods and I'm wondering if the code complexity (and the maintenance costs) are worth it?
There was a problem hiding this comment.
I'm also a little confused about this line from the docstring:
// Note that a Buffer is not safe for concurrent access and instead each
// goroutine should use its own reference to the data, which can be acquired via
// a call to Ref().
A call to Ref simply increments the reference count. It does not return a new reference to the existing buffer that can be used concurrently. Do we ever use buffers concurrently?
Also, why did we earlier have a pointer to an atomic and not store the atomic by value?
There was a problem hiding this comment.
The
Bufferinterface documentation states that a buffer is not safe for concurrent access. Given that, do we need to explicitly mention this here?
A call to Ref simply increments the reference count. It does not return a new reference to the existing buffer that can be used concurrently. Do we ever use buffers concurrently?
In the initial design, buf.Ref() likely returned a new object intended to be transferred to a separate goroutine:
ref := buf.Ref()
go func() {
// use ref here
}()
buf.Free()However, in the merged implementation, Ref does not return a new object. So, the usage pattern becomes:
buf.Ref()
go func() {
// use buf here
}()
buf.Free()Technically, this implies buf is being accessed concurrently. However, the specific pattern that is unsafe is attempting to reference buf in a new goroutine without incrementing the count first:
go func() {
// Unsafe: Race condition with buf.Free() below
ref := buf.Ref()
}()
buf.Free()Source: #8209 (comment)
Yes, we do follow the safe pattern above by pushing data frame buffers into an unbounded channel to be consumed by another goroutine.
There was a problem hiding this comment.
Do you have an idea of how much overhead the atomic operation of checking if the ref count is zero causes? The reason I'm asking is because this new field (and the checks associated with it) are sprinkled across multiple methods and I'm wondering if the code complexity (and the maintenance costs) are worth it?
Earlier there was a check if b.refs == nil, which is not possible using a non-pointer field. Using initialized provides the test coverage.
There are some methods such are Ref and Free which perform atomic operations anyways, so we can check the return value for validation. However, for method like ReadData that don't perform atomic operations, the overhead is significant. According to Gemini, an atomic operation is roughly 10x-15x slower than a similar non-atomic operation under low contention and the difference becomes orders of magnitude larger under high contention.
There was a problem hiding this comment.
Also, why did we earlier have a pointer to an atomic and not store the atomic by value?
Previously, the new buffer created by SplitUnsafe pointed to the same atomic.Int32 as the original buffer, which required the field to be a pointer. Now, the new object maintains its own ref count and stores a pointer to the original buffer instead. Therefore, the reference count (atomic.Uint32) no longer needs to be a pointer.
There was a problem hiding this comment.
Thank you for the information. That helps.
I would still like to see if there is actually any significant performance improvement by having the initialized field. The if b.refs == nil check could also be replaced with a if b.refs.Load() == 0 if there is no significant performance impact.
There was a problem hiding this comment.
I ran the following microbenchmark for measuring the impact of introducing an atomic load operation in ReadOnlyData:
func BenchmarkSplit(b *testing.B) {
pool := mem.DefaultBufferPool()
size := 1 << 15 // 32 KB
slice := pool.Get(size)
buf := mem.NewBuffer(slice, pool)
b.Run("read-only-data", func(b *testing.B) {
for b.Loop() {
_ = buf.ReadOnlyData()
}
})
buf.Free()
}Here are the results:
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
│ non-atomic.txt │ atomic.txt │ master.txt │
│ sec/op │ sec/op vs base │ sec/op vs base │
Split/read-only-data-48 2.005n ± 1% 2.020n ± 2% ~ (p=0.137 n=10) 2.124n ± 1% +5.94% (p=0.000 n=10)
│ non-atomic.txt │ atomic.txt │ master.txt │
│ B/op │ B/op vs base │ B/op vs base │
Split/read-only-data-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
│ non-atomic.txt │ atomic.txt │ master.txt │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │
Split/read-only-data-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equalThe atomic version is ~2% slower than the non-atomic version. Both are faster than the master branch. I've updated the code to use the atomic for the sanity checks too.
There was a problem hiding this comment.
I realized that we can use therootBuf pointer to check if the buffer has been initialized in the read methods, avoiding the atomic operation since the rootbuf field is set during initialization and unset before sending the buffer back into the pool. This brings the performance to the same level as the non-atomic version.
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
│ non-atomic.txt │ rootbuf.txt │
│ sec/op │ sec/op vs base │
Split/read-only-data-48 2.005n ± 1% 2.014n ± 0% ~ (p=0.305 n=10)
│ non-atomic.txt │ rootbuf.txt │
│ B/op │ B/op vs base │
Split/read-only-data-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
│ non-atomic.txt │ rootbuf.txt │
│ allocs/op │ allocs/op vs base │
Split/read-only-data-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
easwars
left a comment
There was a problem hiding this comment.
The only thing I need convincing is about the use of the initialized field, as opposed to directly checking the reference count by doing an atomic read of the value. Otherwise LGTM.
| // initialized enables sanity checks without the overhead of atomic | ||
| // operations. This field is not safe for concurrent access and is used in a | ||
| // best-effort manner for assertion purposes only. It does not play a role | ||
| // in the concurrent logic of reference counting. |
There was a problem hiding this comment.
Thank you for the information. That helps.
I would still like to see if there is actually any significant performance improvement by having the initialized field. The if b.refs == nil check could also be replaced with a if b.refs.Load() == 0 if there is no significant performance impact.
[Splitting a `buffer`](https://github.com/grpc/grpc-go/blob/40466769682557e7179b8c74ba3820cc78d49b4b/mem/buffers.go#L172-L187) results in fetching a new `buffer` object from a `sync.Pool`. The `buffer` object is returned back to the pool only once [the shared ref count falls to 0](https://github.com/grpc/grpc-go/blob/40466769682557e7179b8c74ba3820cc78d49b4b/mem/buffers.go#L152-L155). As a result, only one of the `buffer` objects is returned back to the pool for re-use. The "leaked" buffer objects may cause noticeable allocations when buffers are split more frequently. I noticed this when [attempting to remove a buffer copy](https://github.com/grpc/grpc-go/compare/master...arjan-bal:zero-copy-buf-reader?expand=1) by replacing the bufio.Reader. ## Solution This PR introduces a root-owner model for the underlying `*[]byte` within `buffer` objects. The root object manages the slice's lifecycle, returning it to the pool only when its reference count reaches zero. When a `buffer` is split, the new `buffer` is treated as a child, incrementing the ref counts for both itself and the root. Once a child’s ref count hits zero, it returns itself to the pool and decrements the root’s count. Additionally, this PR removes the `sync.Pool` used for `*atomic.Int32` by embedding `atomic.Int32` as a value field within the `buffer` struct. By eliminating the second pool and the associated pointer indirection, we reduce allocation overhead and improve cache locality during buffer lifecycle events. ## Benchmarks A micro-benchmark showing the buffer object leak: ```go func BenchmarkSplit(b *testing.B) { pool := mem.DefaultBufferPool() b.Run("split", func(b *testing.B) { for b.Loop() { size := 1 << 15 // 32 KB slice := pool.Get(size) buf := mem.NewBuffer(slice, pool) left, right := mem.SplitUnsafe(buf, size/2) left.Free() right.Free() } }) b.Run("no-split", func(b *testing.B) { for b.Loop() { size := 1 << 15 // 32 KB slice := pool.Get(size) buf := mem.NewBuffer(slice, pool) buf.Free() } }) } ``` Result on master vs this PR. ```sh goos: linux goarch: amd64 pkg: google.golang.org/grpc/mem cpu: Intel(R) Xeon(R) CPU @ 2.60GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Split/split-48 418.2n ± 0% 263.9n ± 1% -36.89% (p=0.000 n=10) Split/no-split-48 221.1n ± 1% 208.5n ± 0% -5.70% (p=0.000 n=10) geomean 304.1n 234.6n -22.86% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ Split/split-48 64.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) Split/no-split-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomean │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ Split/split-48 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) Split/no-split-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomean ``` The effect on local gRPC benchmarks is negligible since the `SplitUnsafe` function isn't called very frequently. ```sh $ go run benchmark/benchresult/main.go unary-before unary-after unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurr entCalls_120-reqSize_16000B-respSize_16000B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-c lientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBuff erPool_simple-sharedWriteBuffer_false Title Before After Percentage TotalOps 2985694 3024364 1.30% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 74784.94 74784.99 0.00% Allocs/op 133.67 133.89 0.00% ReqT/op 6369480533.33 6451976533.33 1.30% RespT/op 6369480533.33 6451976533.33 1.30% 50th-Lat 2.410033ms 2.40116ms -0.37% 90th-Lat 3.145118ms 3.081771ms -2.01% 99th-Lat 3.563055ms 3.629663ms 1.87% Avg-Lat 2.410529ms 2.379513ms -1.29% GoVersion go1.24.8 go1.24.8 GrpcVersion 1.78.0-dev 1.78.0-dev ``` RELEASE NOTES: * mem: Improve pooling of `buffer` objects on using `SplitUnsafe`.
…jo) (#12794) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `v1.75.0` → `v1.79.3` |  |  | --- ### gRPC-Go has an authorization bypass via missing leading slash in :path [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details ##### Impact _What kind of vulnerability is it? Who is impacted?_ It is an **Authorization Bypass** resulting from **Improper Input Validation** of the HTTP/2 `:path` pseudo-header. The gRPC-Go server was too lenient in its routing logic, accepting requests where the `:path` omitted the mandatory leading slash (e.g., `Service/Method` instead of `/Service/Method`). While the server successfully routed these requests to the correct handler, authorization interceptors (including the official `grpc/authz` package) evaluated the raw, non-canonical path string. Consequently, "deny" rules defined using canonical paths (starting with `/`) failed to match the incoming request, allowing it to bypass the policy if a fallback "allow" rule was present. **Who is impacted?** This affects gRPC-Go servers that meet both of the following criteria: 1. They use path-based authorization interceptors, such as the official RBAC implementation in `google.golang.org/grpc/authz` or custom interceptors relying on `info.FullMethod` or `grpc.Method(ctx)`. 2. Their security policy contains specific "deny" rules for canonical paths but allows other requests by default (a fallback "allow" rule). The vulnerability is exploitable by an attacker who can send raw HTTP/2 frames with malformed `:path` headers directly to the gRPC server. ##### Patches _Has the problem been patched? What versions should users upgrade to?_ Yes, the issue has been patched. The fix ensures that any request with a `:path` that does not start with a leading slash is immediately rejected with a `codes.Unimplemented` error, preventing it from reaching authorization interceptors or handlers with a non-canonical path string. Users should upgrade to the following versions (or newer): * **v1.79.3** * The latest **master** branch. It is recommended that all users employing path-based authorization (especially `grpc/authz`) upgrade as soon as the patch is available in a tagged release. ##### Workarounds _Is there a way for users to fix or remediate the vulnerability without upgrading?_ While upgrading is the most secure and recommended path, users can mitigate the vulnerability using one of the following methods: ##### 1. Use a Validating Interceptor (Recommended Mitigation) Add an "outermost" interceptor to your server that validates the path before any other authorization logic runs: ```go func pathValidationInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) { if info.FullMethod == "" || info.FullMethod[0] != '/' { return nil, status.Errorf(codes.Unimplemented, "malformed method name") } return handler(ctx, req) } // Ensure this is the FIRST interceptor in your chain s := grpc.NewServer( grpc.ChainUnaryInterceptor(pathValidationInterceptor, authzInterceptor), ) ``` ##### 2. Infrastructure-Level Normalization If your gRPC server is behind a reverse proxy or load balancer (such as Envoy, NGINX, or an L7 Cloud Load Balancer), ensure it is configured to enforce strict HTTP/2 compliance for pseudo-headers and reject or normalize requests where the `:path` header does not start with a leading slash. ##### 3. Policy Hardening Switch to a "default deny" posture in your authorization policies (explicitly listing all allowed paths and denying everything else) to reduce the risk of bypasses via malformed inputs. #### Severity - CVSS Score: 9.1 / 10 (Critical) - Vector String: `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N` #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) - [https://nvd.nist.gov/vuln/detail/CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) - [https://github.com/grpc/grpc-go](https://github.com/grpc/grpc-go) This data is provided by [OSV](https://osv.dev/vulnerability/GHSA-p77j-4mvh-x3m3) and the [GitHub Advisory Database](https://github.com/github/advisory-database) ([CC-BY 4.0](https://github.com/github/advisory-database/blob/main/LICENSE.md)). </details> --- ### Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc #### Severity Unknown #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) This data is provided by [OSV](https://osv.dev/vulnerability/GO-2026-4762) and the [Go Vulnerability Database](https://github.com/golang/vulndb) ([CC-BY 4.0](https://github.com/golang/vulndb#license)). </details> --- ### Release Notes <details> <summary>grpc/grpc-go (google.golang.org/grpc)</summary> ### [`v1.79.3`](https://github.com/grpc/grpc-go/releases/tag/v1.79.3): Release 1.79.3 [Compare Source](grpc/grpc-go@v1.79.2...v1.79.3) ### Security - server: fix an authorization bypass where malformed :path headers (missing the leading slash) could bypass path-based restricted "deny" rules in interceptors like `grpc/authz`. Any request with a non-canonical path is now immediately rejected with an `Unimplemented` error. ([#​8981](grpc/grpc-go#8981)) ### [`v1.79.2`](https://github.com/grpc/grpc-go/releases/tag/v1.79.2): Release 1.79.2 [Compare Source](grpc/grpc-go@v1.79.1...v1.79.2) ### Bug Fixes - stats: Prevent redundant error logging in health/ORCA producers by skipping stats/tracing processing when no stats handler is configured. ([#​8874](grpc/grpc-go#8874)) ### [`v1.79.1`](https://github.com/grpc/grpc-go/releases/tag/v1.79.1): Release 1.79.1 [Compare Source](grpc/grpc-go@v1.79.0...v1.79.1) ### Bug Fixes - grpc: Remove the `-dev` suffix from the User-Agent header. ([#​8902](grpc/grpc-go#8902)) ### [`v1.79.0`](https://github.com/grpc/grpc-go/releases/tag/v1.79.0): Release 1.79.0 [Compare Source](grpc/grpc-go@v1.78.0...v1.79.0) ### API Changes - mem: Add experimental API `SetDefaultBufferPool` to change the default buffer pool. ([#​8806](grpc/grpc-go#8806)) - Special Thanks: [@​vanja-p](https://github.com/vanja-p) - experimental/stats: Update `MetricsRecorder` to require embedding the new `UnimplementedMetricsRecorder` (a no-op struct) in all implementations for forward compatibility. ([#​8780](grpc/grpc-go#8780)) ### Behavior Changes - balancer/weightedtarget: Remove handling of `Addresses` and only handle `Endpoints` in resolver updates. ([#​8841](grpc/grpc-go#8841)) ### New Features - experimental/stats: Add support for asynchronous gauge metrics through the new `AsyncMetricReporter` and `RegisterAsyncReporter` APIs. ([#​8780](grpc/grpc-go#8780)) - pickfirst: Add support for weighted random shuffling of endpoints, as described in [gRFC A113](grpc/proposal#535). - This is enabled by default, and can be turned off using the environment variable `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. ([#​8864](grpc/grpc-go#8864)) - xds: Implement `:authority` rewriting, as specified in [gRFC A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md). ([#​8779](grpc/grpc-go#8779)) - balancer/randomsubsetting: Implement the `random_subsetting` LB policy, as specified in [gRFC A68](https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md). ([#​8650](grpc/grpc-go#8650)) - Special Thanks: [@​marek-szews](https://github.com/marek-szews) ### Bug Fixes - credentials/tls: Fix a bug where the port was not stripped from the authority override before validation. ([#​8726](grpc/grpc-go#8726)) - Special Thanks: [@​Atul1710](https://github.com/Atul1710) - xds/priority: Fix a bug causing delayed failover to lower-priority clusters when a higher-priority cluster is stuck in `CONNECTING` state. ([#​8813](grpc/grpc-go#8813)) - health: Fix a bug where health checks failed for clients using legacy compression options (`WithDecompressor` or `RPCDecompressor`). ([#​8765](grpc/grpc-go#8765)) - Special Thanks: [@​sanki92](https://github.com/sanki92) - transport: Fix an issue where the HTTP/2 server could skip header size checks when terminating a stream early. ([#​8769](grpc/grpc-go#8769)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) - server: Propagate status detail headers, if available, when terminating a stream during request header processing. ([#​8754](grpc/grpc-go#8754)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) ### Performance Improvements - credentials/alts: Optimize read buffer alignment to reduce copies. ([#​8791](grpc/grpc-go#8791)) - mem: Optimize pooling and creation of `buffer` objects. ([#​8784](grpc/grpc-go#8784)) - transport: Reduce slice re-allocations by reserving slice capacity. ([#​8797](grpc/grpc-go#8797)) ### [`v1.78.0`](https://github.com/grpc/grpc-go/releases/tag/v1.78.0): Release 1.78.0 [Compare Source](grpc/grpc-go@v1.77.0...v1.78.0) ### Behavior Changes - client: Align URL validation with Go 1.26+ to now reject target URLs with unbracketed colons in the hostname. ([#​8716](grpc/grpc-go#8716)) - Special Thanks: [@​neild](https://github.com/neild) - transport/client : Return status code `Unknown` on malformed grpc-status. ([#​8735](grpc/grpc-go#8735)) - - xds/resolver: - Drop previous route resources and report an error when no matching virtual host is found. - Only log LDS/RDS configuration errors following a successful update and retain the last valid resource to prevent transient failures. ([#​8711](grpc/grpc-go#8711)) ### New Features - stats/otel: Add backend service label to weighted round robin metrics as part of A89. ([#​8737](grpc/grpc-go#8737)) - stats/otel: Add subchannel metrics (without the disconnection reason) to eventually replace the pickfirst metrics. ([#​8738](grpc/grpc-go#8738)) - client: Wait for all pending goroutines to complete when closing a graceful switch balancer. ([#​8746](grpc/grpc-go#8746)) - Special Thanks: [@​twz123](https://github.com/twz123) - client: Add `experimental.AcceptCompressors` so callers can restrict the `grpc-accept-encoding` header advertised for a call. ([#​8718](grpc/grpc-go#8718)) - Special Thanks: [@​iblancasa](https://github.com/iblancasa) ### Bug Fixes - xds: Fix a bug in `StringMatcher` where regexes would match incorrectly when ignore\_case is set to true. ([#​8723](grpc/grpc-go#8723)) - client: - Change connectivity state to CONNECTING when creating the name resolver (as part of exiting IDLE). - Change connectivity state to TRANSIENT\_FAILURE if name resolver creation fails (as part of exiting IDLE). - Change connectivity state to IDLE after idle timeout expires even when current state is TRANSIENT\_FAILURE. - Fix a bug that resulted in `OnFinish` call option not being invoked for RPCs where stream creation failed. ([#​8710](grpc/grpc-go#8710)) - xdsclient: Fix a race in the xdsClient that could lead to resource-not-found errors. ([#​8627](grpc/grpc-go#8627)) ### Performance Improvements - mem: Round up to nearest 4KiB for pool allocations larger than 1MiB. ([#​8705](grpc/grpc-go#8705)) - Special Thanks: [@​cjc25](https://github.com/cjc25) ### [`v1.77.0`](https://github.com/grpc/grpc-go/releases/tag/v1.77.0): Release 1.77.0 [Compare Source](grpc/grpc-go@v1.76.0...v1.77.0) ### API Changes - mem: Replace the `Reader` interface with a struct for better performance and maintainability. ([#​8669](grpc/grpc-go#8669)) ### Behavior Changes - balancer/pickfirst: Remove support for the old `pick_first` LB policy via the environment variable `GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST=false`. The new `pick_first` has been the default since `v1.71.0`. ([#​8672](grpc/grpc-go#8672)) ### Bug Fixes - xdsclient: Fix a race condition in the ADS stream implementation that could result in `resource-not-found` errors, causing the gRPC client channel to move to `TransientFailure`. ([#​8605](grpc/grpc-go#8605)) - client: Ignore HTTP status header for gRPC streams. ([#​8548](grpc/grpc-go#8548)) - client: Set a read deadline when closing a transport to prevent it from blocking indefinitely on a broken connection. ([#​8534](grpc/grpc-go#8534)) - Special Thanks: [@​jgold2-stripe](https://github.com/jgold2-stripe) - client: Fix a bug where default port 443 was not automatically added to addresses without a specified port when sent to a proxy. - Setting environment variable `GRPC_EXPERIMENTAL_ENABLE_DEFAULT_PORT_FOR_PROXY_TARGET=false` disables this change; please file a bug if any problems are encountered as we will remove this option soon. ([#​8613](grpc/grpc-go#8613)) - balancer/pickfirst: Fix a bug where duplicate addresses were not being ignored as intended. ([#​8611](grpc/grpc-go#8611)) - server: Fix a bug that caused overcounting of channelz metrics for successful and failed streams. ([#​8573](grpc/grpc-go#8573)) - Special Thanks: [@​hugehoo](https://github.com/hugehoo) - balancer/pickfirst: When configured, shuffle addresses in resolver updates that lack endpoints. Since gRPC automatically adds endpoints to resolver updates, this bug only affects custom LB policies that delegate to `pick_first` but don't set endpoints. ([#​8610](grpc/grpc-go#8610)) - mem: Clear large buffers before re-using. ([#​8670](grpc/grpc-go#8670)) ### Performance Improvements - transport: Reduce heap allocations to reduce time spent in garbage collection. ([#​8624](grpc/grpc-go#8624), [#​8630](grpc/grpc-go#8630), [#​8639](grpc/grpc-go#8639), [#​8668](grpc/grpc-go#8668)) - transport: Avoid copies when reading and writing Data frames. ([#​8657](grpc/grpc-go#8657), [#​8667](grpc/grpc-go#8667)) - mem: Avoid clearing newly allocated buffers. ([#​8670](grpc/grpc-go#8670)) ### New Features - outlierdetection: Add metrics specified in [gRFC A91](https://github.com/grpc/proposal/blob/master/A91-outlier-detection-metrics.md). ([#​8644](grpc/grpc-go#8644)) - Special Thanks: [@​davinci26](https://github.com/davinci26), [@​PardhuKonakanchi](https://github.com/PardhuKonakanchi) - stats/opentelemetry: Add support for optional label `grpc.lb.backend_service` in per-call metrics ([#​8637](grpc/grpc-go#8637)) - xds: Add support for JWT Call Credentials as specified in [gRFC A97](https://github.com/grpc/proposal/blob/master/A97-xds-jwt-call-creds.md). Set environment variable `GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS=true` to enable this feature. ([#​8536](grpc/grpc-go#8536)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) - experimental/stats: Add support for up/down counters. ([#​8581](grpc/grpc-go#8581)) ### [`v1.76.0`](https://github.com/grpc/grpc-go/releases/tag/v1.76.0): Release 1.76.0 [Compare Source](grpc/grpc-go@v1.75.1...v1.76.0) ### Dependencies - Minimum supported Go version is now 1.24 ([#​8509](grpc/grpc-go#8509)) - Special Thanks: [@​kevinGC](https://github.com/kevinGC) ### Bug Fixes - client: Return status `INTERNAL` when a server sends zero response messages for a unary or client-streaming RPC. ([#​8523](grpc/grpc-go#8523)) - client: Fail RPCs with status `INTERNAL` instead of `UNKNOWN` upon receiving http headers with status 1xx and `END_STREAM` flag set. ([#​8518](grpc/grpc-go#8518)) - Special Thanks: [@​vinothkumarr227](https://github.com/vinothkumarr227) - pick\_first: Fix race condition that could cause pick\_first to get stuck in `IDLE` state on backend address change. ([#​8615](grpc/grpc-go#8615)) ### New Features - credentials: Add `credentials/jwt` package providing file-based JWT PerRPCCredentials (A97). ([#​8431](grpc/grpc-go#8431)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) ### Performance Improvements - client: Improve HTTP/2 header size estimate to reduce re-allocations. ([#​8547](grpc/grpc-go#8547)) - encoding/proto: Avoid redundant message size calculation when marshaling. ([#​8569](grpc/grpc-go#8569)) - Special Thanks: [@​rs-unity](https://github.com/rs-unity) ### [`v1.75.1`](https://github.com/grpc/grpc-go/releases/tag/v1.75.1): Release 1.75.1 [Compare Source](grpc/grpc-go@v1.75.0...v1.75.1) ### Bug Fixes - transport: Fix a data race while copying headers for stats handlers in the std lib http2 server transport. ([#​8519](grpc/grpc-go#8519)) - xdsclient: - Fix a data race caused while reporting load to LRS. ([#​8483](grpc/grpc-go#8483)) - Fix regression preventing empty node IDs when creating an LRS client. ([#​8483](grpc/grpc-go#8483)) - server: Fix a regression preventing streams from being cancelled or timed out when blocked on flow control. ([#​8528](grpc/grpc-go#8528)) </details> --- ### Configuration 📅 **Schedule**: (UTC) - Branch creation - "" - Automerge - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xOTUuMSIsInVwZGF0ZWRJblZlciI6IjQzLjE5NS4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/12794 Reviewed-by: Mathieu Fenniak <mfenniak@noreply.codeberg.org>
Splitting a
bufferresults in fetching a newbufferobject from async.Pool. Thebufferobject is returned back to the pool only once the shared ref count falls to 0. As a result, only one of thebufferobjects is returned back to the pool for re-use. The "leaked" buffer objects may cause noticeable allocations when buffers are split more frequently. I noticed this when attempting to remove a buffer copy by replacing the bufio.Reader.Solution
This PR introduces a root-owner model for the underlying
*[]bytewithinbufferobjects. The root object manages the slice's lifecycle, returning it to the pool only when its reference count reaches zero.When a
bufferis split, the newbufferis treated as a child, incrementing the ref counts for both itself and the root. Once a child’s ref count hits zero, it returns itself to the pool and decrements the root’s count.Additionally, this PR removes the
sync.Poolused for*atomic.Int32by embeddingatomic.Int32as a value field within thebufferstruct. By eliminating the second pool and the associated pointer indirection, we reduce allocation overhead and improve cache locality during buffer lifecycle events.Benchmarks
A micro-benchmark showing the buffer object leak:
Result on master vs this PR.
goos: linux goarch: amd64 pkg: google.golang.org/grpc/mem cpu: Intel(R) Xeon(R) CPU @ 2.60GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Split/split-48 418.2n ± 0% 263.9n ± 1% -36.89% (p=0.000 n=10) Split/no-split-48 221.1n ± 1% 208.5n ± 0% -5.70% (p=0.000 n=10) geomean 304.1n 234.6n -22.86% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ Split/split-48 64.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) Split/no-split-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomean │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ Split/split-48 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) Split/no-split-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomeanThe effect on local gRPC benchmarks is negligible since the
SplitUnsafefunction isn't called very frequently.$ go run benchmark/benchresult/main.go unary-before unary-after unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurr entCalls_120-reqSize_16000B-respSize_16000B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-c lientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBuff erPool_simple-sharedWriteBuffer_false Title Before After Percentage TotalOps 2985694 3024364 1.30% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 74784.94 74784.99 0.00% Allocs/op 133.67 133.89 0.00% ReqT/op 6369480533.33 6451976533.33 1.30% RespT/op 6369480533.33 6451976533.33 1.30% 50th-Lat 2.410033ms 2.40116ms -0.37% 90th-Lat 3.145118ms 3.081771ms -2.01% 99th-Lat 3.563055ms 3.629663ms 1.87% Avg-Lat 2.410529ms 2.379513ms -1.29% GoVersion go1.24.8 go1.24.8 GrpcVersion 1.78.0-dev 1.78.0-devRELEASE NOTES:
bufferobjects on usingSplitUnsafe.