transport: Remove buffer copies while writing HTTP/2 Data frames#8667
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8667 +/- ##
==========================================
- Coverage 83.43% 82.88% -0.56%
==========================================
Files 415 415
Lines 32195 32261 +66
==========================================
- Hits 26863 26739 -124
- Misses 3980 4029 +49
- Partials 1352 1493 +141
🚀 New features to boost your workflow:
|
|
@dfawley : Again moving to your plate if you feel like having a second look. |
| } | ||
| if dSize > 0 { | ||
| var err error | ||
| l.writeBuf, err = reader.Peek(dSize, l.writeBuf) |
There was a problem hiding this comment.
It seems like this buffer can only grow and never shrinks.
- What happens if a slice holds a pointer to a huge amount of data? I believe it isn't possible to free it, but am not certain. E.g.
l.writeBuf = [][]byte{nil, nil, nil, nil, nil, nil, make([]byte, 10GB)}
l.writeBuf = l.writeBuf[:0]- What happens if
cap(l.writeBuf)grows to a large value and then we never need it to be that large ever again?
I think we need to have some way to scale this buffer back down.
There was a problem hiding this comment.
For point 1, I've updated the code to clear the buffer after calling Write. This releases references to all the slices and allows them to be GCed.
With respect to point 2, I've now set a limit of 64 on the buffer's length. If a buffer is longer than that, it's immediately freed after use instead of being cached.
Background on the 64-element limit: The BufferSlice from the proto codec is 1 element. With a potential gRPC header, the length is almost always 2. While custom codecs might produce larger slices, 64 is a generous limit that covers common cases without caching excessive memory.
This change also mitigates a worst-case memory scenario. Since Peek() filters empty slices, a 16KB http2 Data frame (the max size) could theoretically be split into 16K (16,384) distinct 1-byte slices. In that case, the memory overhead for the slice headers alone would be 24 bytes * 16 * 1024 (approx. 393KB), with the 64 size limit, the max held memory is approx 1.5KB. Also note that the framer already has a data buffer that grows up to 16KB, and after this change, that buffer should no longer be used for Data frames.
| if len(d) == 0 { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Would this be a bug if it were zero? I would have expected it to be.
If it is, then we should delete it. Write should handle a zero-length buffer as a nop already anyway.
There was a problem hiding this comment.
Removed. There should not be any empty buffers in the list, since Peek() filters them out. This was an artifact from the time I spent root-causing unexpected behavior on the local benchmarks with large payloads
| // This must never happen since the reader must have at least dSize | ||
| // bytes. | ||
| clear(l.writeBuf) | ||
| l.writeBuf = nil |
There was a problem hiding this comment.
If this is impossible then:
logger.Errorseems like a good idea, unless the caller already does that with what we return..- We probably don't need to bother with the clear/nil (and surely don't want to do both?)?
There was a problem hiding this comment.
Added an error log and removed the buffer resetting.
…c#8667) This PR removes 2 buffer copies while writing data frames to the underlying net.Conn: one [within gRPC](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/controlbuf.go#L1009-L1022) and the other [in the framer](https://cs.opensource.google/go/x/net/+/master:http2/frame.go;l=743;drc=6e243da531559f8c99439dabc7647dec07191f9b). Care is taken to avoid any extra heap allocations which can affect performance for smaller payloads. A [CL](https://go-review.git.corp.google.com/c/net/+/711620) is out for review which allows using the framer to write frame headers. This PR duplicates the header writing code as a temporary workaround. This PR will be merged only after the CL is merged. ## Results ### Small payloads Performance for small payloads increases slightly due to the reduction of a `deferred` statement. ``` $ go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \ -compression=off -maxConcurrentCalls=120 -trace=off \ -reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}" $ go run benchmark/benchresult/main.go unary-before unary-after Title Before After Percentage TotalOps 7600878 7653522 0.69% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 10007.07 10000.89 -0.07% Allocs/op 146.93 146.91 0.00% ReqT/op 101345040.00 102046960.00 0.69% RespT/op 101345040.00 102046960.00 0.69% 50th-Lat 833.724µs 830.041µs -0.44% 90th-Lat 1.281969ms 1.275336ms -0.52% 99th-Lat 2.403961ms 2.360606ms -1.80% Avg-Lat 946.123µs 939.734µs -0.68% GoVersion go1.24.8 go1.24.8 GrpcVersion 1.77.0-dev 1.77.0-dev ``` ### Large payloads Local benchmarks show a ~5-10% regression with 1 MB payloads on my dev machine. The profiles show increased time spent in the copy operation [inside the buffered writer](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/http_util.go#L334). Counterintuitively, copying the grpc header and message data into a larger buffer increased the performance by 4% (compared to master). To validate this behaviour (extra copy increasing performance) I ran [the k8s benchmark for 1MB payloads](https://github.com/grpc/grpc/blob/65c9be86830b0e423dd970c066c69a06a9240298/tools/run_tests/performance/scenario_config.py#L291-L305) and 100 concurrent streams which showed ~5% increase in QPS without the copies across multiple runs. Adding a copy reduced the performance. Load test config file: [loadtest.yaml](https://github.com/user-attachments/files/23055312/loadtest.yaml) ``` # 30 core client and server Before QPS: 498.284 (16.6095/server core) Latencies (50/90/95/99/99.9%-ile): 233256/275972/281250/291803/298533 us Server system time: 93.0164 Server user time: 142.533 Client system time: 97.2688 Client user time: 144.542 After QPS: 526.776 (17.5592/server core) Latencies (50/90/95/99/99.9%-ile): 211010/263189/270969/280656/288828 us Server system time: 96.5959 Server user time: 147.668 Client system time: 101.973 Client user time: 150.234 # 8 core client and server Before QPS: 291.049 (36.3811/server core) Latencies (50/90/95/99/99.9%-ile): 294552/685822/903554/1.48399e+06/1.50757e+06 us Server system time: 49.0355 Server user time: 87.1783 Client system time: 60.1945 Client user time: 103.633 After QPS: 334.119 (41.7649/server core) Latencies (50/90/95/99/99.9%-ile): 279395/518849/706327/1.09273e+06/1.11629e+06 us Server system time: 69.3136 Server user time: 102.549 Client system time: 80.9804 Client user time: 107.103 ``` RELEASE NOTES: * transport: Avoid two buffer copies when writing Data frames.
This PR remove the `grpchttp2` package since we no longer want to implement a custom http2 framer in grpc. We originally planned a custom HTTP/2 framer to get rid of internal copies. However, we have updated gRPC to control data frame I/O ([#8667](#8667), [#8657](#8657)) and will have changes that removes the bufio.Reader copy. This removes the need for a custom framer. RELEASE NOTES: None
This PR remove the `grpchttp2` package since we no longer want to implement a custom http2 framer in grpc. We originally planned a custom HTTP/2 framer to get rid of internal copies. However, we have updated gRPC to control data frame I/O ([grpc#8667](grpc#8667), [grpc#8657](grpc#8657)) and will have changes that removes the bufio.Reader copy. This removes the need for a custom framer. RELEASE NOTES: None
This PR remove the `grpchttp2` package since we no longer want to implement a custom http2 framer in grpc. We originally planned a custom HTTP/2 framer to get rid of internal copies. However, we have updated gRPC to control data frame I/O ([grpc#8667](grpc#8667), [grpc#8657](grpc#8657)) and will have changes that removes the bufio.Reader copy. This removes the need for a custom framer. RELEASE NOTES: None
…jo) (#12794) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `v1.75.0` → `v1.79.3` |  |  | --- ### gRPC-Go has an authorization bypass via missing leading slash in :path [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details ##### Impact _What kind of vulnerability is it? Who is impacted?_ It is an **Authorization Bypass** resulting from **Improper Input Validation** of the HTTP/2 `:path` pseudo-header. The gRPC-Go server was too lenient in its routing logic, accepting requests where the `:path` omitted the mandatory leading slash (e.g., `Service/Method` instead of `/Service/Method`). While the server successfully routed these requests to the correct handler, authorization interceptors (including the official `grpc/authz` package) evaluated the raw, non-canonical path string. Consequently, "deny" rules defined using canonical paths (starting with `/`) failed to match the incoming request, allowing it to bypass the policy if a fallback "allow" rule was present. **Who is impacted?** This affects gRPC-Go servers that meet both of the following criteria: 1. They use path-based authorization interceptors, such as the official RBAC implementation in `google.golang.org/grpc/authz` or custom interceptors relying on `info.FullMethod` or `grpc.Method(ctx)`. 2. Their security policy contains specific "deny" rules for canonical paths but allows other requests by default (a fallback "allow" rule). The vulnerability is exploitable by an attacker who can send raw HTTP/2 frames with malformed `:path` headers directly to the gRPC server. ##### Patches _Has the problem been patched? What versions should users upgrade to?_ Yes, the issue has been patched. The fix ensures that any request with a `:path` that does not start with a leading slash is immediately rejected with a `codes.Unimplemented` error, preventing it from reaching authorization interceptors or handlers with a non-canonical path string. Users should upgrade to the following versions (or newer): * **v1.79.3** * The latest **master** branch. It is recommended that all users employing path-based authorization (especially `grpc/authz`) upgrade as soon as the patch is available in a tagged release. ##### Workarounds _Is there a way for users to fix or remediate the vulnerability without upgrading?_ While upgrading is the most secure and recommended path, users can mitigate the vulnerability using one of the following methods: ##### 1. Use a Validating Interceptor (Recommended Mitigation) Add an "outermost" interceptor to your server that validates the path before any other authorization logic runs: ```go func pathValidationInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) { if info.FullMethod == "" || info.FullMethod[0] != '/' { return nil, status.Errorf(codes.Unimplemented, "malformed method name") } return handler(ctx, req) } // Ensure this is the FIRST interceptor in your chain s := grpc.NewServer( grpc.ChainUnaryInterceptor(pathValidationInterceptor, authzInterceptor), ) ``` ##### 2. Infrastructure-Level Normalization If your gRPC server is behind a reverse proxy or load balancer (such as Envoy, NGINX, or an L7 Cloud Load Balancer), ensure it is configured to enforce strict HTTP/2 compliance for pseudo-headers and reject or normalize requests where the `:path` header does not start with a leading slash. ##### 3. Policy Hardening Switch to a "default deny" posture in your authorization policies (explicitly listing all allowed paths and denying everything else) to reduce the risk of bypasses via malformed inputs. #### Severity - CVSS Score: 9.1 / 10 (Critical) - Vector String: `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N` #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) - [https://nvd.nist.gov/vuln/detail/CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) - [https://github.com/grpc/grpc-go](https://github.com/grpc/grpc-go) This data is provided by [OSV](https://osv.dev/vulnerability/GHSA-p77j-4mvh-x3m3) and the [GitHub Advisory Database](https://github.com/github/advisory-database) ([CC-BY 4.0](https://github.com/github/advisory-database/blob/main/LICENSE.md)). </details> --- ### Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc [CVE-2026-33186](https://nvd.nist.gov/vuln/detail/CVE-2026-33186) / [GHSA-p77j-4mvh-x3m3](GHSA-p77j-4mvh-x3m3) / [GO-2026-4762](https://pkg.go.dev/vuln/GO-2026-4762) <details> <summary>More information</summary> #### Details Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc #### Severity Unknown #### References - [https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3](https://github.com/grpc/grpc-go/security/advisories/GHSA-p77j-4mvh-x3m3) This data is provided by [OSV](https://osv.dev/vulnerability/GO-2026-4762) and the [Go Vulnerability Database](https://github.com/golang/vulndb) ([CC-BY 4.0](https://github.com/golang/vulndb#license)). </details> --- ### Release Notes <details> <summary>grpc/grpc-go (google.golang.org/grpc)</summary> ### [`v1.79.3`](https://github.com/grpc/grpc-go/releases/tag/v1.79.3): Release 1.79.3 [Compare Source](grpc/grpc-go@v1.79.2...v1.79.3) ### Security - server: fix an authorization bypass where malformed :path headers (missing the leading slash) could bypass path-based restricted "deny" rules in interceptors like `grpc/authz`. Any request with a non-canonical path is now immediately rejected with an `Unimplemented` error. ([#​8981](grpc/grpc-go#8981)) ### [`v1.79.2`](https://github.com/grpc/grpc-go/releases/tag/v1.79.2): Release 1.79.2 [Compare Source](grpc/grpc-go@v1.79.1...v1.79.2) ### Bug Fixes - stats: Prevent redundant error logging in health/ORCA producers by skipping stats/tracing processing when no stats handler is configured. ([#​8874](grpc/grpc-go#8874)) ### [`v1.79.1`](https://github.com/grpc/grpc-go/releases/tag/v1.79.1): Release 1.79.1 [Compare Source](grpc/grpc-go@v1.79.0...v1.79.1) ### Bug Fixes - grpc: Remove the `-dev` suffix from the User-Agent header. ([#​8902](grpc/grpc-go#8902)) ### [`v1.79.0`](https://github.com/grpc/grpc-go/releases/tag/v1.79.0): Release 1.79.0 [Compare Source](grpc/grpc-go@v1.78.0...v1.79.0) ### API Changes - mem: Add experimental API `SetDefaultBufferPool` to change the default buffer pool. ([#​8806](grpc/grpc-go#8806)) - Special Thanks: [@​vanja-p](https://github.com/vanja-p) - experimental/stats: Update `MetricsRecorder` to require embedding the new `UnimplementedMetricsRecorder` (a no-op struct) in all implementations for forward compatibility. ([#​8780](grpc/grpc-go#8780)) ### Behavior Changes - balancer/weightedtarget: Remove handling of `Addresses` and only handle `Endpoints` in resolver updates. ([#​8841](grpc/grpc-go#8841)) ### New Features - experimental/stats: Add support for asynchronous gauge metrics through the new `AsyncMetricReporter` and `RegisterAsyncReporter` APIs. ([#​8780](grpc/grpc-go#8780)) - pickfirst: Add support for weighted random shuffling of endpoints, as described in [gRFC A113](grpc/proposal#535). - This is enabled by default, and can be turned off using the environment variable `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. ([#​8864](grpc/grpc-go#8864)) - xds: Implement `:authority` rewriting, as specified in [gRFC A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md). ([#​8779](grpc/grpc-go#8779)) - balancer/randomsubsetting: Implement the `random_subsetting` LB policy, as specified in [gRFC A68](https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md). ([#​8650](grpc/grpc-go#8650)) - Special Thanks: [@​marek-szews](https://github.com/marek-szews) ### Bug Fixes - credentials/tls: Fix a bug where the port was not stripped from the authority override before validation. ([#​8726](grpc/grpc-go#8726)) - Special Thanks: [@​Atul1710](https://github.com/Atul1710) - xds/priority: Fix a bug causing delayed failover to lower-priority clusters when a higher-priority cluster is stuck in `CONNECTING` state. ([#​8813](grpc/grpc-go#8813)) - health: Fix a bug where health checks failed for clients using legacy compression options (`WithDecompressor` or `RPCDecompressor`). ([#​8765](grpc/grpc-go#8765)) - Special Thanks: [@​sanki92](https://github.com/sanki92) - transport: Fix an issue where the HTTP/2 server could skip header size checks when terminating a stream early. ([#​8769](grpc/grpc-go#8769)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) - server: Propagate status detail headers, if available, when terminating a stream during request header processing. ([#​8754](grpc/grpc-go#8754)) - Special Thanks: [@​joybestourous](https://github.com/joybestourous) ### Performance Improvements - credentials/alts: Optimize read buffer alignment to reduce copies. ([#​8791](grpc/grpc-go#8791)) - mem: Optimize pooling and creation of `buffer` objects. ([#​8784](grpc/grpc-go#8784)) - transport: Reduce slice re-allocations by reserving slice capacity. ([#​8797](grpc/grpc-go#8797)) ### [`v1.78.0`](https://github.com/grpc/grpc-go/releases/tag/v1.78.0): Release 1.78.0 [Compare Source](grpc/grpc-go@v1.77.0...v1.78.0) ### Behavior Changes - client: Align URL validation with Go 1.26+ to now reject target URLs with unbracketed colons in the hostname. ([#​8716](grpc/grpc-go#8716)) - Special Thanks: [@​neild](https://github.com/neild) - transport/client : Return status code `Unknown` on malformed grpc-status. ([#​8735](grpc/grpc-go#8735)) - - xds/resolver: - Drop previous route resources and report an error when no matching virtual host is found. - Only log LDS/RDS configuration errors following a successful update and retain the last valid resource to prevent transient failures. ([#​8711](grpc/grpc-go#8711)) ### New Features - stats/otel: Add backend service label to weighted round robin metrics as part of A89. ([#​8737](grpc/grpc-go#8737)) - stats/otel: Add subchannel metrics (without the disconnection reason) to eventually replace the pickfirst metrics. ([#​8738](grpc/grpc-go#8738)) - client: Wait for all pending goroutines to complete when closing a graceful switch balancer. ([#​8746](grpc/grpc-go#8746)) - Special Thanks: [@​twz123](https://github.com/twz123) - client: Add `experimental.AcceptCompressors` so callers can restrict the `grpc-accept-encoding` header advertised for a call. ([#​8718](grpc/grpc-go#8718)) - Special Thanks: [@​iblancasa](https://github.com/iblancasa) ### Bug Fixes - xds: Fix a bug in `StringMatcher` where regexes would match incorrectly when ignore\_case is set to true. ([#​8723](grpc/grpc-go#8723)) - client: - Change connectivity state to CONNECTING when creating the name resolver (as part of exiting IDLE). - Change connectivity state to TRANSIENT\_FAILURE if name resolver creation fails (as part of exiting IDLE). - Change connectivity state to IDLE after idle timeout expires even when current state is TRANSIENT\_FAILURE. - Fix a bug that resulted in `OnFinish` call option not being invoked for RPCs where stream creation failed. ([#​8710](grpc/grpc-go#8710)) - xdsclient: Fix a race in the xdsClient that could lead to resource-not-found errors. ([#​8627](grpc/grpc-go#8627)) ### Performance Improvements - mem: Round up to nearest 4KiB for pool allocations larger than 1MiB. ([#​8705](grpc/grpc-go#8705)) - Special Thanks: [@​cjc25](https://github.com/cjc25) ### [`v1.77.0`](https://github.com/grpc/grpc-go/releases/tag/v1.77.0): Release 1.77.0 [Compare Source](grpc/grpc-go@v1.76.0...v1.77.0) ### API Changes - mem: Replace the `Reader` interface with a struct for better performance and maintainability. ([#​8669](grpc/grpc-go#8669)) ### Behavior Changes - balancer/pickfirst: Remove support for the old `pick_first` LB policy via the environment variable `GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST=false`. The new `pick_first` has been the default since `v1.71.0`. ([#​8672](grpc/grpc-go#8672)) ### Bug Fixes - xdsclient: Fix a race condition in the ADS stream implementation that could result in `resource-not-found` errors, causing the gRPC client channel to move to `TransientFailure`. ([#​8605](grpc/grpc-go#8605)) - client: Ignore HTTP status header for gRPC streams. ([#​8548](grpc/grpc-go#8548)) - client: Set a read deadline when closing a transport to prevent it from blocking indefinitely on a broken connection. ([#​8534](grpc/grpc-go#8534)) - Special Thanks: [@​jgold2-stripe](https://github.com/jgold2-stripe) - client: Fix a bug where default port 443 was not automatically added to addresses without a specified port when sent to a proxy. - Setting environment variable `GRPC_EXPERIMENTAL_ENABLE_DEFAULT_PORT_FOR_PROXY_TARGET=false` disables this change; please file a bug if any problems are encountered as we will remove this option soon. ([#​8613](grpc/grpc-go#8613)) - balancer/pickfirst: Fix a bug where duplicate addresses were not being ignored as intended. ([#​8611](grpc/grpc-go#8611)) - server: Fix a bug that caused overcounting of channelz metrics for successful and failed streams. ([#​8573](grpc/grpc-go#8573)) - Special Thanks: [@​hugehoo](https://github.com/hugehoo) - balancer/pickfirst: When configured, shuffle addresses in resolver updates that lack endpoints. Since gRPC automatically adds endpoints to resolver updates, this bug only affects custom LB policies that delegate to `pick_first` but don't set endpoints. ([#​8610](grpc/grpc-go#8610)) - mem: Clear large buffers before re-using. ([#​8670](grpc/grpc-go#8670)) ### Performance Improvements - transport: Reduce heap allocations to reduce time spent in garbage collection. ([#​8624](grpc/grpc-go#8624), [#​8630](grpc/grpc-go#8630), [#​8639](grpc/grpc-go#8639), [#​8668](grpc/grpc-go#8668)) - transport: Avoid copies when reading and writing Data frames. ([#​8657](grpc/grpc-go#8657), [#​8667](grpc/grpc-go#8667)) - mem: Avoid clearing newly allocated buffers. ([#​8670](grpc/grpc-go#8670)) ### New Features - outlierdetection: Add metrics specified in [gRFC A91](https://github.com/grpc/proposal/blob/master/A91-outlier-detection-metrics.md). ([#​8644](grpc/grpc-go#8644)) - Special Thanks: [@​davinci26](https://github.com/davinci26), [@​PardhuKonakanchi](https://github.com/PardhuKonakanchi) - stats/opentelemetry: Add support for optional label `grpc.lb.backend_service` in per-call metrics ([#​8637](grpc/grpc-go#8637)) - xds: Add support for JWT Call Credentials as specified in [gRFC A97](https://github.com/grpc/proposal/blob/master/A97-xds-jwt-call-creds.md). Set environment variable `GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS=true` to enable this feature. ([#​8536](grpc/grpc-go#8536)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) - experimental/stats: Add support for up/down counters. ([#​8581](grpc/grpc-go#8581)) ### [`v1.76.0`](https://github.com/grpc/grpc-go/releases/tag/v1.76.0): Release 1.76.0 [Compare Source](grpc/grpc-go@v1.75.1...v1.76.0) ### Dependencies - Minimum supported Go version is now 1.24 ([#​8509](grpc/grpc-go#8509)) - Special Thanks: [@​kevinGC](https://github.com/kevinGC) ### Bug Fixes - client: Return status `INTERNAL` when a server sends zero response messages for a unary or client-streaming RPC. ([#​8523](grpc/grpc-go#8523)) - client: Fail RPCs with status `INTERNAL` instead of `UNKNOWN` upon receiving http headers with status 1xx and `END_STREAM` flag set. ([#​8518](grpc/grpc-go#8518)) - Special Thanks: [@​vinothkumarr227](https://github.com/vinothkumarr227) - pick\_first: Fix race condition that could cause pick\_first to get stuck in `IDLE` state on backend address change. ([#​8615](grpc/grpc-go#8615)) ### New Features - credentials: Add `credentials/jwt` package providing file-based JWT PerRPCCredentials (A97). ([#​8431](grpc/grpc-go#8431)) - Special Thanks: [@​dimpavloff](https://github.com/dimpavloff) ### Performance Improvements - client: Improve HTTP/2 header size estimate to reduce re-allocations. ([#​8547](grpc/grpc-go#8547)) - encoding/proto: Avoid redundant message size calculation when marshaling. ([#​8569](grpc/grpc-go#8569)) - Special Thanks: [@​rs-unity](https://github.com/rs-unity) ### [`v1.75.1`](https://github.com/grpc/grpc-go/releases/tag/v1.75.1): Release 1.75.1 [Compare Source](grpc/grpc-go@v1.75.0...v1.75.1) ### Bug Fixes - transport: Fix a data race while copying headers for stats handlers in the std lib http2 server transport. ([#​8519](grpc/grpc-go#8519)) - xdsclient: - Fix a data race caused while reporting load to LRS. ([#​8483](grpc/grpc-go#8483)) - Fix regression preventing empty node IDs when creating an LRS client. ([#​8483](grpc/grpc-go#8483)) - server: Fix a regression preventing streams from being cancelled or timed out when blocked on flow control. ([#​8528](grpc/grpc-go#8528)) </details> --- ### Configuration 📅 **Schedule**: (UTC) - Branch creation - "" - Automerge - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xOTUuMSIsInVwZGF0ZWRJblZlciI6IjQzLjE5NS4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/12794 Reviewed-by: Mathieu Fenniak <mfenniak@noreply.codeberg.org>
This PR removes 2 buffer copies while writing data frames to the underlying net.Conn: one within gRPC and the other in the framer. Care is taken to avoid any extra heap allocations which can affect performance for smaller payloads.
A CL is out for review which allows using the framer to write frame headers. This PR duplicates the header writing code as a temporary workaround. This PR will be merged only after the CL is merged.
Results
Small payloads
Performance for small payloads increases slightly due to the reduction of a
deferredstatement.Large payloads
Local benchmarks show a ~5-10% regression with 1 MB payloads on my dev machine. The profiles show increased time spent in the copy operation inside the buffered writer. Counterintuitively, copying the grpc header and message data into a larger buffer increased the performance by 4% (compared to master).
To validate this behaviour (extra copy increasing performance) I ran the k8s benchmark for 1MB payloads and 100 concurrent streams which showed ~5% increase in QPS without the copies across multiple runs. Adding a copy reduced the performance.
Load test config file: loadtest.yaml
RELEASE NOTES: