Skip to content

rpc: compression with libdeflate#20665

Merged
lupin012 merged 9 commits into
mainfrom
lupin012/compression_with_libdeflate
Apr 29, 2026
Merged

rpc: compression with libdeflate#20665
lupin012 merged 9 commits into
mainfrom
lupin012/compression_with_libdeflate

Conversation

@lupin012

@lupin012 lupin012 commented Apr 19, 2026

Copy link
Copy Markdown
Contributor

close #17112

This PR replaces the existing single-path compress/gzip middleware in node/rpcstack.go with a dual-path implementation that distinguishes between streaming and non-streaming responses.

a) Non-streaming responses (standard JSON-RPC calls such as eth_getBlockByNumber):

  • are now compressed in one shot using go-libdeflate, a lightweight CGo wrapper around ebiggers/libdeflate. Benchmarks show ~1.75x speedup vs stdlib gzip on ~30 KB JSON payloads (4.6 ms vs 8.0 ms, 122 MB/s vs 70 MB/s); under high concurrency the advantage is larger due to lower CPU usage per request.
  • Since the compressed size is known before writing, Content-Length is now set to the exact compressed size, avoiding unnecessary Transfer-Encoding: chunked.
  • Responses smaller than 1 KB are sent uncompressed: at that size the CPU overhead of initialising the compressor outweighs the benefit, and for very small payloads the compressed output can exceed the input size due to gzip framing overhead.

b) Streaming responses (e.g. debug_traceTransaction, trace_filter) are detected via http.Flusher: when the RPC handler calls Flush() before writing, the middleware switches to stdlib compress/gzip in streaming mode, compressing trace data incrementally without buffering the full response.

Changes:

  • node/rpcstack.go: new gzipResponseWriter with buffer/stream dual mode; 4 sync.Pool instances for zero-allocation hot path
  • rpc/http.go: injects http.Flusher into request context via httpFlusherContextKey
  • rpc/handler.go: streaming methods call flush before writing to activate streaming mode
  • go.mod: adds github.com/erigontech/go-libdeflate v0.1.0

🚀 Performance Benchmarks: Gzip Optimization


1. Isolated Compression Benchmarks (libdeflate vs stdlib)

Diff

Benchmark (Gzip isolation) Latency Throughput Mem Alloc

  • BenchmarkStdlibGzip 8.0ms/req 70 MB/s 66KB
  • BenchmarkLibdeflateGzip 4.6ms/req 122 MB/s 63KB (2x faster)
    Note: We observed a ~1.75x speedup on a single thread. Under high concurrency, the advantage is even greater due to reduced CPU overhead.
2. eth_getBlockByNumber with txs (Old SW vs Main SW)

Old SW - Results with instability and errors:

Diff

  • [2. 1] qps: 2000 -> [R=97.74% max=9.103s error=503 Service Unavailable]
  • [3. 3] qps: 2300 -> [R=98.50% max=5.841s error=503 Service Unavailable]
  • [3. 4] qps: 2300 -> [R=99.80% max=5.634s error=503 Service Unavailable]
  • [3. 5] qps: 2300 -> [R=98.25% max=5.877s error=503 Service Unavailable]

Main SW - Stable results with 100% success rate:
Diff

  • [2. 1] qps: 2000 -> [R=100.00% max=141.22ms]
  • [2. 5] qps: 2000 -> [R=100.00% max=157.7ms]
  • [3. 1] qps: 2300 -> [R=100.00% max=217.23ms]
  • [3. 5] qps: 2300 -> [R=100.00% max=224.174ms]
3. trace_block
Metric Old SW Main SW Speedup
Throughput 375.9 r/s 455.0 r/s +21%
p50 Latency 114.4 ms 96.9 ms 1.18x
p90 Latency 232.0 ms 181.2 ms 1.28x
Mean Latency 132.4 ms 109.5 ms 1.21x
4. Executes all RPC using http, http-compressed and websockets

./run_all.sh -T http,http_comp,websocket
Run tests in parallel on localhost:8545/localhost:8551
Result directory: /home/simon/silkworm/tests/rpc-tests3/integration/results

Time: 2026-04-19 08:58:05.257624
Total round_trip time: 3:40:01.359044
Total marshalling time: 0:00:00.063308
Total unmarshalling time: 0:01:40.509521
No of json Diffs: 0
Test time-elapsed: 0:08:37.467460
Available tests: 1436
Available tested api: 112
Number of loop: 1
Number of executed tests: 4152
Number of NOT executed tests: 156
Number of success tests: 4152
Number of failed tests: 0

@lupin012 lupin012 changed the title rpc: compression with libdeflate [WIP] rpc: compression with libdeflate Apr 19, 2026
@AskAlexSharov AskAlexSharov requested a review from Copilot April 22, 2026 03:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the HTTP RPC compression middleware to use a dual-path gzip strategy: fast one-shot gzip for non-streaming JSON-RPC responses via go-libdeflate, while preserving incremental gzip streaming for streamable RPC methods by switching to stdlib compress/gzip when flushing is detected.

Changes:

  • Injects an http.Flusher hook into the per-request context to allow streamable RPC methods to trigger “streaming mode”.
  • Updates streamable RPC method execution to activate streaming compression before emitting any response bytes.
  • Replaces the previous single-path gzip middleware with a buffered (libdeflate) vs streaming (stdlib gzip) implementation using multiple sync.Pools.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rpc/http.go Adds a request context key to carry a flush hook derived from the HTTP response writer.
rpc/handler.go Streamable RPC methods call the context-provided flush hook before writing the JSON-RPC envelope.
node/rpcstack.go Implements a buffered/streaming gzip middleware, introducing libdeflate one-shot compression and a streaming fallback.
go.mod Adds the github.com/erigontech/go-libdeflate dependency.
go.sum Records checksums for the new dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread node/rpcstack.go
Comment thread node/rpcstack.go Outdated
Comment thread node/rpcstack.go
Comment thread node/rpcstack.go Outdated
Comment thread rpc/handler.go
@AskAlexSharov

AskAlexSharov commented Apr 22, 2026

Copy link
Copy Markdown
Collaborator

@lupin012
I see the potential issue. Looking at the current
implementation in node/rpcstack.go:511:

  var gzPool = sync.Pool{
      New: func() any {                                      
          w := gzip.NewWriter(io.Discard)  //              
  DefaultCompression = level 6                               
          return w
      },                                                     
  }      

The pool creates writers with gzip.NewWriter which uses
gzip.DefaultCompression (level 6) — optimized for
compression ratio, not speed. For an RPC server where
latency matters more than bandwidth, using gzip.BestSpeed
(level 1) would be far more appropriate.

What if reduce our default compression level of stdlib?

@lupin012

Copy link
Copy Markdown
Contributor Author

@AskAlexSharov ok I have changed the compression level of stdlib(streaming) from 6 to 1(BestSpeed) to have more speed and less compression
libdeflate compression level: unchanged at 6
libdeflate uses its own 1–12 scale and is already significantly faster than stdlib at any equivalent level, thanks to its use of SIMD instructions (SSE2/AVX2) that process multiple bytes in parallel with a single CPU instruction. At level 6, libdeflate provides a good balance between speed and ratio, and the latency difference between level 1 and level 6 is negligible compared to stdlib. Lowering it further would reduce compression ratio with no meaningful latency benefit.

@lupin012 lupin012 force-pushed the lupin012/compression_with_libdeflate branch from 6732a52 to 3b08fc4 Compare April 25, 2026 07:20
@lupin012 lupin012 changed the title [WIP] rpc: compression with libdeflate rpc: compression with libdeflate Apr 25, 2026
@lupin012 lupin012 marked this pull request as ready for review April 25, 2026 20:47
Comment thread node/rpcstack.go Outdated
Comment thread node/rpcstack.go Outdated
Comment thread node/rpcstack.go Outdated
Comment thread node/rpcstack.go Outdated
Comment thread node/rpcstack.go Outdated
Comment thread node/rpcstack.go Outdated
@lupin012 lupin012 added this pull request to the merge queue Apr 29, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 29, 2026
lupin012 and others added 9 commits April 29, 2026 23:17
…treaming)

- gzipResponseWriter now buffers non-streaming responses and compresses
  them in one shot with libdeflate for maximum throughput
- adds Flush() method to switch to stdlib gzip streaming mode for
  methods that produce large/trace responses incrementally
- pools buf, dst slice, compressor and gzip.Writer to avoid per-request
  allocations
- passes http.Flusher via context (httpFlusherContextKey) so runMethod
  can activate streaming compression before writing begins

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove local replace directive now that the module is published.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Return dst slice to pool only after w.Write() completes, not before.
The previous order caused gzip corruption under high QPS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Flush() now flushes gzw + underlying http.Flusher on every call (not
  just on first activation), so streaming RPC methods deliver output
  incrementally instead of buffering until Close.
- gzCompressorPool.New no longer panics on libdeflate init failure:
  logs once via sync.Once and returns nil; handler falls back to stdlib gzip.
- On libdeflate compress error, fall back to stdlib gzip instead of
  returning http.Error (which would overwrite the JSON-RPC payload).
- httpFlusherContextKey is now injected only by the gzip middleware via
  WithGzipStreamingHook, not from any generic http.Flusher, preventing
  premature HTTP header commit (e.g. 200 before 503) when gzip is off.
- gzBufPool and gzDstPool only retain buffers <= gzPoolBufCap (1 MiB)
  to bound steady-state RSS after large responses.
- stdlib gzip pool uses BestSpeed (level 1) to prioritise latency.
- Extract writeStdlibGzip helper to eliminate duplicated fallback logic.
- Add unit tests covering: non-streaming, streaming, status propagation,
  large body pool-cap path, Flush activation, and pool threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added gzip optimizations for non-streaming RPC responses: skip
compression for small payloads (< 1 KB), where the CPU overhead of
setting up the compressor outweighs the benefit and — for very small
responses — the compressed output can end up larger than the input due
to gzip framing overhead; also set Content-Length from the known
compressed size when using libdeflate, avoiding unnecessary
Transfer-Encoding: chunked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use libdeflate for one-shot (non-streaming) compression; fall back to
  stdlib gzip for streaming responses
- Add libdeflateDisabled atomic.Bool to short-circuit pool.Get after
  first init failure (sync.Pool discards nil, avoiding repeated NewCompressor calls)
- Extract sendGzipResponse and compressLibdeflate sub-functions so all
  pool returns use defer
- Add getBuf() helper to encapsulate Get+Reset and prevent missed resets
- Add gzDstGrow with append-style 2x capacity growth to amortize reallocs
- Store []byte directly in gzDstPool (was *[]byte)
- Use two defers (LIFO) in sendGzipResponse streaming path so Close
  runs before Put

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 force-pushed the lupin012/compression_with_libdeflate branch from eb664e6 to 992ac2b Compare April 29, 2026 21:18
@lupin012 lupin012 enabled auto-merge April 29, 2026 21:18
@lupin012 lupin012 added this pull request to the merge queue Apr 29, 2026
Merged via the queue into main with commit 5b0ba8d Apr 29, 2026
38 checks passed
@lupin012 lupin012 deleted the lupin012/compression_with_libdeflate branch April 29, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rpc: evaluate go-libdeflate for non-streaming endpoints

3 participants