ZSTD Support for Streaming Compression by sarthakaggarwal97 · Pull Request #3798 · valkey-io/valkey

sarthakaggarwal97 · 2026-05-20T22:46:11Z

Built on top #3531.

Zstd is great for compression ratio and would be useful for both replication compression and rdb compression. The save time worse as compared to LZ4 because it tries to compress much harder. The load time is faster because since it compressed harder, it has much lesser data to load from the disk.

Zstd is specially useful for replication compression across regions where network costs are quite significant as compared to compute. We want to save as much data as possible while transferring data over the regions.

Add streaming LZ4-backed RDB compression with rio decorators, stream envelope handling, integration changes, and the follow-up fixes and config cleanup needed on top of unstable. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

- Remove dead code: rdbIsValidMagic() and unused #include <string.h> in rdb.h - Remove redundant first RIO_FLAG_SKIP_RDB_CHECKSUM set in rdbSaveInternal - Remove unrelated changes: config_parse_depth, USE_FAST_FLOAT, write-make-settings - Validate full 8-byte VKCS envelope in aof.c rdbFileUsesStreamingCompression - Add SAFETY comment for rdbRioHasCorruptCompressedInput cast invariant - Rename all snake_case identifiers to camelCase per Valkey conventions: types (compression_algo_t -> compressionAlgo, stream_compressor_t -> streamCompressor, compress_rio_t -> compressRio, etc.), functions (stream_writer_create -> streamWriterCreate, compress_rio_finish -> compressRioFinish, write_vkcs_envelope -> writeVkcsEnvelope, etc.), and static variables (compression_lz4_codec_impl -> compressionLz4CodecImpl) - Drop _t suffix from all types to match Valkey convention - Fix typo: streamWriterIsErrord -> streamWriterIsErrored - Replace silent dummy buffer allocation with assert(needed > 0) in streamWriterEnsureOutBuf Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

* Address streaming RDB compression review * Skip RDB CRC for streaming compression * Remove brittle 32-bit compression unit test --------- Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

- rdbSaveInternal: the old comment claimed per-string LZF was disabled 'when algo != LZF', but the actual gate is RIO_FLAG_STREAMING_COMPRESSION on the wrapper rio. Standalone rios (DUMP, AOF rewrite, diskless) keep using LZF regardless of algo. - rdbInputStreamPrepare: flag the synchronous probe IO so a future non-blocking caller (replication) doesn't accidentally block the loop. - rdbRioHasCorruptCompressedInput: the cast is sound today only because one producer sets RIO_FLAG_STREAMING_DECOMPRESSION. Replace the 'SAFETY' assertion with a note telling the next person to add a type discriminator before adding a second producer. - rdbSaveRawString: minor wording cleanup on the per-string LZF gate. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Match the surrounding Valkey style: drop comments that restate the code, drop the Ownership/Threading/Returns boilerplate from headers, collapse repeated 'capacity-shortage is retriable' notes into one explanation per function. Behavior is unchanged. Net -285 lines; integration tests (21/21) and build are clean. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

coderabbitai · 2026-05-20T22:46:17Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 34811f4e-03a7-4d41-9810-9d8bd469c521

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-05-21T00:34:36Z

Codecov Report

❌ Patch coverage is 92.90414% with 151 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.18%. Comparing base (bee30d4) to head (23c1bbe).
⚠️ Report is 24 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/compression_stream.c	89.97%	37 Missing ⚠️
src/compression_rio.c	73.87%	29 Missing ⚠️
src/valkey-check-rdb.c	58.73%	26 Missing ⚠️
src/rio.c	67.18%	21 Missing ⚠️
src/rdb.c	85.56%	14 Missing ⚠️
src/compression.c	86.20%	8 Missing ⚠️
src/aof.c	70.00%	6 Missing ⚠️
src/compression_zstd.c	93.33%	5 Missing ⚠️
src/unit/test_compression.cpp	99.74%	3 Missing ⚠️
src/compression_lz4.c	97.40%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3798      +/-   ##
============================================
+ Coverage     76.82%   77.18%   +0.35%     
============================================
  Files           162      168       +6     
  Lines         81021    83083    +2062     
============================================
+ Hits          62248    64128    +1880     
- Misses        18773    18955     +182

Files with missing lines	Coverage Δ
src/config.c	`78.93% <ø> (ø)`
src/rdb.h	`100.00% <ø> (ø)`
src/rio.h	`100.00% <ø> (ø)`
src/server.h	`100.00% <ø> (ø)`
src/compression_lz4.c	`97.40% <97.40%> (ø)`
src/unit/test_compression.cpp	`99.74% <99.74%> (ø)`
src/compression_zstd.c	`93.33% <93.33%> (ø)`
src/aof.c	`80.17% <70.00%> (-0.14%)`	⬇️
src/compression.c	`86.20% <86.20%> (ø)`
src/rdb.c	`77.22% <85.56%> (+0.14%)`	⬆️
... and 4 more

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Adds replication wire compression on top of valkey-io#3531 with lz4 as the first supported codec for the incremental replication stream. The replication stream from primary to replica is wrapped in a VKCS envelope (using STREAM_KIND_REPL) and compressed as a single long-lived frame at the per-replica buffer layer. Default behavior is unchanged with 'replcompression no'; existing replicas without the new capability stay uncompressed. Negotiation is per-replica via the existing PSYNC handshake; a new REPLICA_CAPA_COMPRESSION capability lets each side opt in independently. Compression runs inline on the IO thread that owns the replica's write job; no dedicated compression thread, no IPC, no reordering. Optional sticky thread affinity (lazy ownership + event-driven rebalance) keeps the long-lived LZ4 frame state on a single IO thread for cache locality. Configs: replcompression bool, default no repl-compression-thread-affinity bool, default yes Internal constants: REPLICA_CAPA_COMPRESSION (1 << 4) REPL_COMPRESSION_ALGO ALGO_LZ4 REPL_COMPRESSION_LEVEL 0 (LZ4 fast mode) REPL_COMPRESSION_BATCH_LIMIT 1 MB raw input per dispatch REPL_STREAM_DECODER_OUTPUT_MAX 256 MB INFO replication per-replica fields: compression=lz4, compressed_bytes, uncompressed_bytes, compression_ratio, compression_errors, compression_cpu_usec, debug_compression_pending_drains, debug_thread_switches INFO replication server-level (replica side): repl_decompression_errors, repl_decompression_cpu_usec, repl_decompressed_bytes_total, repl_apply_cpu_usec, repl_apply_batches CI adds a test-replication-compression job that runs the replication-tagged integration tests with replcompression=yes to exercise compression across the broader replication test surface. Tests: 18 streamReader push-mode unit tests + 3 replCompression unit tests + 27 integration tests. Performance (BlockMesh tweets, 3M keys x ~315 byte JSON values, 1,073 MB uncompressed per replica, 30 clients, pipeline 50, 2 cross-region replicas): LZ4 level 0 (default): 0.48 ratio, 52% bandwidth saved, 2.5s compression CPU per replica, <1% throughput overhead vs uncompressed baseline. Affinity ON vs OFF: throughput unchanged (118.6K vs 118.1K keys/s) but thread switches drop from ~800K to ~30 per replica. ZSTD support follows in valkey-io#3798. Related to valkey-io#3531. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>

--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Drop the clamp detail (an internal streamReaderCreate concern) and keep only the caller-facing contract that buffer_size must be nonzero. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

The loaderr path exits the process, so resetting reading_config_file there is dead code. It was a leftover from the reverted config_parse_depth change and is not present upstream. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Collapse the save-notice if/else into one serverLog that always names the active algorithm (none/lzf/lz4), instead of branching the message. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Replace em dashes in the compression comments with ASCII punctuation and add a short note above streamWriterCreate describing the writer/reader push/pull model and the finish-before-free requirement. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

- Simplify the streaming APIs: caller-owned structs with Init/Free and descriptive self-parameter names; codec used only for the algorithm vtable. - Store the compression algorithm id directly in the VKCS envelope and drop the separate codec enum plus its translation functions. - Make the rio decorator opaque: track connection-backed streams with a flag instead of a transport-type enum. - Route LZ4 contexts through zmalloc and assert on caller errors; keep runtime errors for corrupt or external input. - Document the feed functions' streaming contract, mark the retriable capacity-shortage returns, and drop a redundant probe re-zero. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

valkey-check-rdb reported offsets via rioTell() on every read, and rdbLoadProgressCallback computed the transport offset via rioTell() on every read even when no progress event was due. Read processed_bytes directly for the offset, and only compute the transport position when a progress event is actually reported. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Replace the separate rdb-compression-algo config with a single rdbcompression no|yes|lz4-stream. 'yes' keeps the legacy per-string LZF path; 'lz4-stream' selects whole-file streaming LZ4 RDB saves. Update valkey.conf and the RDB compression, check-rdb, and replication AOF sync tests to use the new config value. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

sarthakaggarwal97 added 14 commits May 12, 2026 12:42

compression: add streaming rio-based RDB compression

5a92c67

Add streaming LZ4-backed RDB compression with rio decorators, stream envelope handling, integration changes, and the follow-up fixes and config cleanup needed on top of unstable. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

compression: naming cleanup for streaming compression

22e7164

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

fix streaming compression review issues

0f432b0

--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Address compression review feedback

b494e19

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Clarify streaming RDB compatibility errors

d7f2e65

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

deps: mark lz4 clean target phony

7ceb358

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Merge branch 'unstable' into streaming-compression-rio-pr

49718a5

compression: address streaming review feedback

8a1585a

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

compression: address streaming review nits

717bb74

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

compression: trim redundant codec checks

68b9d75

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

github-actions Bot assigned sarthakaggarwal97 May 20, 2026

sarthakaggarwal97 force-pushed the streaming-compression-rio-pr-zstd-v1.5.7 branch 3 times, most recently from 6054859 to 215df50 Compare May 21, 2026 00:08

roshkhatri mentioned this pull request May 27, 2026

replication: add incremental stream compression transport roshkhatri/valkey#15

Closed

roshkhatri mentioned this pull request May 28, 2026

Streaming Compression support for Replication #3853

Open

sarthakaggarwal97 added 6 commits June 1, 2026 08:16

Merge branch 'unstable' into streaming-compression-rio-pr

5f4d6ea

minor cast improvement

ce31888

--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

addressing review comments

65ed842

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

compression: trim buffer_size comment to the caller contract

b446a66

Drop the clamp detail (an internal streamReaderCreate concern) and keep only the caller-facing contract that buffer_size must be nonzero. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

rdb: log the active compression algorithm on a single line

3b39abc

Collapse the save-notice if/else into one serverLog that always names the active algorithm (none/lzf/lz4), instead of branching the message. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

sarthakaggarwal97 added 10 commits June 3, 2026 01:12

Merge branch 'unstable' into streaming-compression-rio-pr

32908ac

Merge branch 'unstable' into streaming-compression-rio-pr

c26b6ea

making VKCS based on offset

bf5d490

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Merge branch 'unstable' into streaming-compression-rio-pr

9ad8792

Rename streaming compression envelope to VCS

0f15410

--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Merge branch 'unstable' into streaming-compression-rio-pr

9068b22

sarthakaggarwal97 mentioned this pull request Jun 10, 2026

Streaming compression for replication (review) sarthakaggarwal97/valkey#277

Open

sarthakaggarwal97 added 6 commits June 10, 2026 10:18

Merge branch 'unstable' into streaming-compression-rio-pr

a5f6222

Merge branch 'unstable' into streaming-compression-rio-pr

709525a

Merge branch 'unstable' into streaming-compression-rio-pr

078d3d1

rdb: skip streaming compression for replication saves

52d4a47

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Merge branch 'unstable' into streaming-compression-rio-pr

109a659

Add zstd streaming compression support

5e484e7

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

sarthakaggarwal97 force-pushed the streaming-compression-rio-pr-zstd-v1.5.7 branch from 215df50 to 5e484e7 Compare June 23, 2026 15:28

sarthakaggarwal97 requested a review from hpatro June 23, 2026 15:29

Make zstd streaming compression optional

23c1bbe

sarthakaggarwal97 mentioned this pull request Jun 24, 2026

Streaming Compression support for RDB #3531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZSTD Support for Streaming Compression #3798

ZSTD Support for Streaming Compression #3798
sarthakaggarwal97 wants to merge 37 commits into
valkey-io:unstablefrom
sarthakaggarwal97:streaming-compression-rio-pr-zstd-v1.5.7

sarthakaggarwal97 commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sarthakaggarwal97 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sarthakaggarwal97 commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading