ZSTD Support for Streaming Compression #3798
Conversation
Add streaming LZ4-backed RDB compression with rio decorators, stream envelope handling, integration changes, and the follow-up fixes and config cleanup needed on top of unstable. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
- Remove dead code: rdbIsValidMagic() and unused #include <string.h> in rdb.h - Remove redundant first RIO_FLAG_SKIP_RDB_CHECKSUM set in rdbSaveInternal - Remove unrelated changes: config_parse_depth, USE_FAST_FLOAT, write-make-settings - Validate full 8-byte VKCS envelope in aof.c rdbFileUsesStreamingCompression - Add SAFETY comment for rdbRioHasCorruptCompressedInput cast invariant - Rename all snake_case identifiers to camelCase per Valkey conventions: types (compression_algo_t -> compressionAlgo, stream_compressor_t -> streamCompressor, compress_rio_t -> compressRio, etc.), functions (stream_writer_create -> streamWriterCreate, compress_rio_finish -> compressRioFinish, write_vkcs_envelope -> writeVkcsEnvelope, etc.), and static variables (compression_lz4_codec_impl -> compressionLz4CodecImpl) - Drop _t suffix from all types to match Valkey convention - Fix typo: streamWriterIsErrord -> streamWriterIsErrored - Replace silent dummy buffer allocation with assert(needed > 0) in streamWriterEnsureOutBuf Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
* Address streaming RDB compression review * Skip RDB CRC for streaming compression * Remove brittle 32-bit compression unit test --------- Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
- rdbSaveInternal: the old comment claimed per-string LZF was disabled 'when algo != LZF', but the actual gate is RIO_FLAG_STREAMING_COMPRESSION on the wrapper rio. Standalone rios (DUMP, AOF rewrite, diskless) keep using LZF regardless of algo. - rdbInputStreamPrepare: flag the synchronous probe IO so a future non-blocking caller (replication) doesn't accidentally block the loop. - rdbRioHasCorruptCompressedInput: the cast is sound today only because one producer sets RIO_FLAG_STREAMING_DECOMPRESSION. Replace the 'SAFETY' assertion with a note telling the next person to add a type discriminator before adding a second producer. - rdbSaveRawString: minor wording cleanup on the per-string LZF gate. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Match the surrounding Valkey style: drop comments that restate the code, drop the Ownership/Threading/Returns boilerplate from headers, collapse repeated 'capacity-shortage is retriable' notes into one explanation per function. Behavior is unchanged. Net -285 lines; integration tests (21/21) and build are clean. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
6054859 to
215df50
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3798 +/- ##
============================================
+ Coverage 76.82% 77.18% +0.35%
============================================
Files 162 168 +6
Lines 81021 83083 +2062
============================================
+ Hits 62248 64128 +1880
- Misses 18773 18955 +182
🚀 New features to boost your workflow:
|
Adds replication wire compression on top of valkey-io#3531 with lz4 as the first supported codec for the incremental replication stream. The replication stream from primary to replica is wrapped in a VKCS envelope (using STREAM_KIND_REPL) and compressed as a single long-lived frame at the per-replica buffer layer. Default behavior is unchanged with 'replcompression no'; existing replicas without the new capability stay uncompressed. Negotiation is per-replica via the existing PSYNC handshake; a new REPLICA_CAPA_COMPRESSION capability lets each side opt in independently. Compression runs inline on the IO thread that owns the replica's write job; no dedicated compression thread, no IPC, no reordering. Optional sticky thread affinity (lazy ownership + event-driven rebalance) keeps the long-lived LZ4 frame state on a single IO thread for cache locality. Configs: replcompression bool, default no repl-compression-thread-affinity bool, default yes Internal constants: REPLICA_CAPA_COMPRESSION (1 << 4) REPL_COMPRESSION_ALGO ALGO_LZ4 REPL_COMPRESSION_LEVEL 0 (LZ4 fast mode) REPL_COMPRESSION_BATCH_LIMIT 1 MB raw input per dispatch REPL_STREAM_DECODER_OUTPUT_MAX 256 MB INFO replication per-replica fields: compression=lz4, compressed_bytes, uncompressed_bytes, compression_ratio, compression_errors, compression_cpu_usec, debug_compression_pending_drains, debug_thread_switches INFO replication server-level (replica side): repl_decompression_errors, repl_decompression_cpu_usec, repl_decompressed_bytes_total, repl_apply_cpu_usec, repl_apply_batches CI adds a test-replication-compression job that runs the replication-tagged integration tests with replcompression=yes to exercise compression across the broader replication test surface. Tests: 18 streamReader push-mode unit tests + 3 replCompression unit tests + 27 integration tests. Performance (BlockMesh tweets, 3M keys x ~315 byte JSON values, 1,073 MB uncompressed per replica, 30 clients, pipeline 50, 2 cross-region replicas): LZ4 level 0 (default): 0.48 ratio, 52% bandwidth saved, 2.5s compression CPU per replica, <1% throughput overhead vs uncompressed baseline. Affinity ON vs OFF: throughput unchanged (118.6K vs 118.1K keys/s) but thread switches drop from ~800K to ~30 per replica. ZSTD support follows in valkey-io#3798. Related to valkey-io#3531. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Adds replication wire compression on top of valkey-io#3531 with lz4 as the first supported codec for the incremental replication stream. The replication stream from primary to replica is wrapped in a VKCS envelope (using STREAM_KIND_REPL) and compressed as a single long-lived frame at the per-replica buffer layer. Default behavior is unchanged with 'replcompression no'; existing replicas without the new capability stay uncompressed. Negotiation is per-replica via the existing PSYNC handshake; a new REPLICA_CAPA_COMPRESSION capability lets each side opt in independently. Compression runs inline on the IO thread that owns the replica's write job; no dedicated compression thread, no IPC, no reordering. Optional sticky thread affinity (lazy ownership + event-driven rebalance) keeps the long-lived LZ4 frame state on a single IO thread for cache locality. Configs: replcompression bool, default no repl-compression-thread-affinity bool, default yes Internal constants: REPLICA_CAPA_COMPRESSION (1 << 4) REPL_COMPRESSION_ALGO ALGO_LZ4 REPL_COMPRESSION_LEVEL 0 (LZ4 fast mode) REPL_COMPRESSION_BATCH_LIMIT 1 MB raw input per dispatch REPL_STREAM_DECODER_OUTPUT_MAX 256 MB INFO replication per-replica fields: compression=lz4, compressed_bytes, uncompressed_bytes, compression_ratio, compression_errors, compression_cpu_usec, debug_compression_pending_drains, debug_thread_switches INFO replication server-level (replica side): repl_decompression_errors, repl_decompression_cpu_usec, repl_decompressed_bytes_total, repl_apply_cpu_usec, repl_apply_batches CI adds a test-replication-compression job that runs the replication-tagged integration tests with replcompression=yes to exercise compression across the broader replication test surface. Tests: 18 streamReader push-mode unit tests + 3 replCompression unit tests + 27 integration tests. Performance (BlockMesh tweets, 3M keys x ~315 byte JSON values, 1,073 MB uncompressed per replica, 30 clients, pipeline 50, 2 cross-region replicas): LZ4 level 0 (default): 0.48 ratio, 52% bandwidth saved, 2.5s compression CPU per replica, <1% throughput overhead vs uncompressed baseline. Affinity ON vs OFF: throughput unchanged (118.6K vs 118.1K keys/s) but thread switches drop from ~800K to ~30 per replica. ZSTD support follows in valkey-io#3798. Related to valkey-io#3531. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Drop the clamp detail (an internal streamReaderCreate concern) and keep only the caller-facing contract that buffer_size must be nonzero. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
The loaderr path exits the process, so resetting reading_config_file there is dead code. It was a leftover from the reverted config_parse_depth change and is not present upstream. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Collapse the save-notice if/else into one serverLog that always names the active algorithm (none/lzf/lz4), instead of branching the message. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Replace em dashes in the compression comments with ASCII punctuation and add a short note above streamWriterCreate describing the writer/reader push/pull model and the finish-before-free requirement. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
- Simplify the streaming APIs: caller-owned structs with Init/Free and descriptive self-parameter names; codec used only for the algorithm vtable. - Store the compression algorithm id directly in the VKCS envelope and drop the separate codec enum plus its translation functions. - Make the rio decorator opaque: track connection-backed streams with a flag instead of a transport-type enum. - Route LZ4 contexts through zmalloc and assert on caller errors; keep runtime errors for corrupt or external input. - Document the feed functions' streaming contract, mark the retriable capacity-shortage returns, and drop a redundant probe re-zero. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
valkey-check-rdb reported offsets via rioTell() on every read, and rdbLoadProgressCallback computed the transport offset via rioTell() on every read even when no progress event was due. Read processed_bytes directly for the offset, and only compute the transport position when a progress event is actually reported. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Replace the separate rdb-compression-algo config with a single rdbcompression no|yes|lz4-stream. 'yes' keeps the legacy per-string LZF path; 'lz4-stream' selects whole-file streaming LZ4 RDB saves. Update valkey.conf and the RDB compression, check-rdb, and replication AOF sync tests to use the new config value. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
--------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
215df50 to
5e484e7
Compare
Built on top #3531.
Zstd is great for compression ratio and would be useful for both replication compression and rdb compression. The save time worse as compared to LZ4 because it tries to compress much harder. The load time is faster because since it compressed harder, it has much lesser data to load from the disk.
Zstd is specially useful for replication compression across regions where network costs are quite significant as compared to compute. We want to save as much data as possible while transferring data over the regions.