Cherrypick #8657 and #8667 to v1.77.x by arjan-bal · Pull Request #8690 · grpc/grpc-go

arjan-bal · 2025-11-03T10:30:07Z

Original PRs: #8657, #8667

RELEASE NOTES:

transport: Avoid copies when reading and writing Data frames.

This change incorporates changes from golang/go#73560 to split reading HTTP/2 frame headers and payloads. If the frame is not a Data frame, it's read through the standard library framer as before. For Data frames, the payload is read directly into a buffer from the buffer pool to avoid copying it from the framer's buffer. ## Testing For 1 MB payloads, this results in ~4% improvement in throughput. ```sh # test command go run benchmark/benchmain/main.go -benchtime=60s -workloads=streaming \ -compression=off -maxConcurrentCalls=120 -trace=off \ -reqSizeBytes=1000000 -respSizeBytes=1000000 -networkMode=Local -resultFile="${RUN_NAME}" # comparison go run benchmark/benchresult/main.go streaming-before streaming-after Title Before After Percentage TotalOps 87536 91120 4.09% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 4074102.92 4070489.30 -0.09% Allocs/op 83.60 76.55 -8.37% ReqT/op 11671466666.67 12149333333.33 4.09% RespT/op 11671466666.67 12149333333.33 4.09% 50th-Lat 78.209875ms 75.159943ms -3.90% 90th-Lat 117.764228ms 107.8697ms -8.40% 99th-Lat 146.935704ms 139.069685ms -5.35% Avg-Lat 82.310691ms 79.073282ms -3.93% GoVersion go1.24.7 go1.24.7 GrpcVersion 1.77.0-dev 1.77.0-dev ``` For smaller payloads, the difference in minor. ```sh go run benchmark/benchmain/main.go -benchtime=60s -workloads=streaming \ -compression=off -maxConcurrentCalls=120 -trace=off \ -reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}" go run benchmark/benchresult/main.go streaming-before streaming-after Title Before After Percentage TotalOps 21490752 21477822 -0.06% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 1902.92 1902.94 0.00% Allocs/op 29.21 29.21 0.00% ReqT/op 286543360.00 286370960.00 -0.06% RespT/op 286543360.00 286370960.00 -0.06% 50th-Lat 352.505µs 352.247µs -0.07% 90th-Lat 433.446µs 434.907µs 0.34% 99th-Lat 536.445µs 539.759µs 0.62% Avg-Lat 333.403µs 333.457µs 0.02% GoVersion go1.24.7 go1.24.7 GrpcVersion 1.77.0-dev 1.77.0-dev ``` RELEASE NOTES: * transport: Avoid a buffer copy when reading data.

…c#8667) This PR removes 2 buffer copies while writing data frames to the underlying net.Conn: one [within gRPC](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/controlbuf.go#L1009-L1022) and the other [in the framer](https://cs.opensource.google/go/x/net/+/master:http2/frame.go;l=743;drc=6e243da531559f8c99439dabc7647dec07191f9b). Care is taken to avoid any extra heap allocations which can affect performance for smaller payloads. A [CL](https://go-review.git.corp.google.com/c/net/+/711620) is out for review which allows using the framer to write frame headers. This PR duplicates the header writing code as a temporary workaround. This PR will be merged only after the CL is merged. ## Results ### Small payloads Performance for small payloads increases slightly due to the reduction of a `deferred` statement. ``` $ go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \ -compression=off -maxConcurrentCalls=120 -trace=off \ -reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}" $ go run benchmark/benchresult/main.go unary-before unary-after Title Before After Percentage TotalOps 7600878 7653522 0.69% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 10007.07 10000.89 -0.07% Allocs/op 146.93 146.91 0.00% ReqT/op 101345040.00 102046960.00 0.69% RespT/op 101345040.00 102046960.00 0.69% 50th-Lat 833.724µs 830.041µs -0.44% 90th-Lat 1.281969ms 1.275336ms -0.52% 99th-Lat 2.403961ms 2.360606ms -1.80% Avg-Lat 946.123µs 939.734µs -0.68% GoVersion go1.24.8 go1.24.8 GrpcVersion 1.77.0-dev 1.77.0-dev ``` ### Large payloads Local benchmarks show a ~5-10% regression with 1 MB payloads on my dev machine. The profiles show increased time spent in the copy operation [inside the buffered writer](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/http_util.go#L334). Counterintuitively, copying the grpc header and message data into a larger buffer increased the performance by 4% (compared to master). To validate this behaviour (extra copy increasing performance) I ran [the k8s benchmark for 1MB payloads](https://github.com/grpc/grpc/blob/65c9be86830b0e423dd970c066c69a06a9240298/tools/run_tests/performance/scenario_config.py#L291-L305) and 100 concurrent streams which showed ~5% increase in QPS without the copies across multiple runs. Adding a copy reduced the performance. Load test config file: [loadtest.yaml](https://github.com/user-attachments/files/23055312/loadtest.yaml) ``` # 30 core client and server Before QPS: 498.284 (16.6095/server core) Latencies (50/90/95/99/99.9%-ile): 233256/275972/281250/291803/298533 us Server system time: 93.0164 Server user time: 142.533 Client system time: 97.2688 Client user time: 144.542 After QPS: 526.776 (17.5592/server core) Latencies (50/90/95/99/99.9%-ile): 211010/263189/270969/280656/288828 us Server system time: 96.5959 Server user time: 147.668 Client system time: 101.973 Client user time: 150.234 # 8 core client and server Before QPS: 291.049 (36.3811/server core) Latencies (50/90/95/99/99.9%-ile): 294552/685822/903554/1.48399e+06/1.50757e+06 us Server system time: 49.0355 Server user time: 87.1783 Client system time: 60.1945 Client user time: 103.633 After QPS: 334.119 (41.7649/server core) Latencies (50/90/95/99/99.9%-ile): 279395/518849/706327/1.09273e+06/1.11629e+06 us Server system time: 69.3136 Server user time: 102.549 Client system time: 80.9804 Client user time: 107.103 ``` RELEASE NOTES: * transport: Avoid two buffer copies when writing Data frames.

codecov · 2025-11-03T10:33:45Z

Codecov Report

❌ Patch coverage is 89.10256% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.23%. Comparing base (f959da6) to head (13bb904).
⚠️ Report is 1 commits behind head on v1.77.x.

Files with missing lines	Patch %	Lines
internal/transport/controlbuf.go	63.15%	3 Missing and 4 partials ⚠️
internal/transport/http_util.go	93.02%	3 Missing and 3 partials ⚠️
mem/buffer_slice.go	93.33%	1 Missing and 1 partial ⚠️
internal/transport/http2_client.go	90.90%	1 Missing ⚠️
internal/transport/http2_server.go	90.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           v1.77.x    #8690      +/-   ##
===========================================
+ Coverage    82.21%   83.23%   +1.01%     
===========================================
  Files          417      417              
  Lines        32198    32296      +98     
===========================================
+ Hits         26472    26880     +408     
- Misses        4021     4037      +16     
+ Partials      1705     1379     -326

Files with missing lines	Coverage Δ
mem/buffer_pool.go	`100.00% <ø> (ø)`
internal/transport/http2_client.go	`92.71% <90.90%> (+15.78%)`	⬆️
internal/transport/http2_server.go	`91.30% <90.00%> (ø)`
mem/buffer_slice.go	`96.45% <93.33%> (-0.85%)`	⬇️
internal/transport/http_util.go	`94.53% <93.02%> (-0.68%)`	⬇️
internal/transport/controlbuf.go	`89.50% <63.15%> (-0.75%)`	⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arjan-bal added 2 commits November 3, 2025 15:55

arjan-bal added this to the 1.77 Release milestone Nov 3, 2025

arjan-bal added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. labels Nov 3, 2025

arjan-bal requested review from easwars and eshitachandwani November 3, 2025 10:41

arjan-bal assigned easwars and eshitachandwani Nov 3, 2025

eshitachandwani approved these changes Nov 3, 2025

View reviewed changes

eshitachandwani assigned arjan-bal and unassigned easwars and eshitachandwani Nov 3, 2025

arjan-bal merged commit 4288cfc into grpc:v1.77.x Nov 3, 2025
17 checks passed

arjan-bal deleted the cherrypick-copyless-framer branch November 3, 2025 10:49

github-actions Bot locked as resolved and limited conversation to collaborators May 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherrypick #8657 and #8667 to v1.77.x#8690

Cherrypick #8657 and #8667 to v1.77.x#8690
arjan-bal merged 2 commits into
grpc:v1.77.xfrom
arjan-bal:cherrypick-copyless-framer

arjan-bal commented Nov 3, 2025

Uh oh!

codecov Bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arjan-bal commented Nov 3, 2025

Uh oh!

codecov Bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Nov 3, 2025 •

edited

Loading