Skip to content

Add streaming compression for replication full sync#20

Open
roshkhatri wants to merge 1 commit into
replication-streaming-compression-prfrom
repl-streaming-comression-fullsync-pr
Open

Add streaming compression for replication full sync#20
roshkhatri wants to merge 1 commit into
replication-streaming-compression-prfrom
repl-streaming-comression-fullsync-pr

Conversation

@roshkhatri

Copy link
Copy Markdown
Owner

Adds streaming compression of the full-sync RDB payload for replicas that negotiate the compression capability, covering disk-based sync, diskless sync, and dual-channel sync. Builds on the streaming RDB codec (VCS/LZ4) and reuses the existing repl-compression config (no | lz4-stream); default off. Stacked on replication-streaming-compression-pr (the incremental-stream compression work).

Negotiation and the cohort rule

A replica advertises REPLCONF capa compression (REPLICA_CAPA_COMPRESSION) when repl-compression is enabled; the primary compresses only when it also has repl-compression enabled. Because a single full sync (a shared disk file, or one connset/pipe) serves a whole cohort of replicas, the decision is the AND of the capability over every replica in the attaching cohort: if any replica is not capable, the payload is sent plaintext so all of them can load it. This mirrors the existing skip_rdb_checksum cohort-AND.

Disk-based

The cohort AND is computed in startBgsaveForReplication (the generic rdbSaveBackground is not replication-aware) and passed via RDBFLAGS_REPL_COMPRESSED_SYNC; rdbSaveInternal applies the repl-compression gate, decoupled from the rdbcompression persistence setting.

Diskless

The cohort AND is computed inside rdbSaveToReplicasSockets alongside skip_rdb_checksum. Only the RDB body is wrapped as a VCS frame; the $EOF framing stays plaintext and the frame's own checksum replaces the RDB CRC64. The replica decompresses inline when it loads from the socket (repl-diskless-load enabled), or via the existing rdbLoad auto-detect when it receives the stream to disk first (the default, and dual-channel). The dual-channel handshake now advertises the compression capability.

streamReaderValidateFrameEnd is added so the replica can validate a closed compressed frame without consuming the caller-owned trailing EOF mark.

Testing

Disk grouping (all/mixed/none capable), diskless and dual-channel sync, default disk-receive load, checksum on/off, the compressed-full-sync to compressed-incremental handoff, REPLICAOF NO ONE teardown and resync, and the AOF-base BGREWRITEAOF fallback (a compression-capable replica's compressed disk sync RDB cannot be reused as the AOF base). valkey.conf documents the expanded repl-compression coverage.

Follow-up

A src/unit gtest for the streamReaderValidateFrameEnd split is tracked as a follow-up (currently covered by the integration tests).

@roshkhatri roshkhatri force-pushed the repl-streaming-comression-fullsync-pr branch from 25ab585 to acabef9 Compare June 24, 2026 23:49
@sarthakaggarwal97

Copy link
Copy Markdown

tests are failing :P

@roshkhatri roshkhatri force-pushed the repl-streaming-comression-fullsync-pr branch 5 times, most recently from 4e269f4 to efc525d Compare June 30, 2026 22:53
Compress the full-sync RDB payload for replicas that negotiate the
compression capability, covering disk-based, diskless, and dual-channel
sync. Reuses the streaming RDB codec (VCS/LZ4) and the existing
repl-compression config (no | lz4-stream); default off.

The payload is compressed only when every replica in the attaching cohort
advertised REPLICA_CAPA_COMPRESSION (cohort-AND); a mixed group falls back
to plaintext for all. Disk: decided in startBgsaveForReplication via
RDBFLAGS_REPL_COMPRESSED_SYNC and gated by repl-compression in
rdbSaveInternal. Diskless: decided in rdbSaveToReplicasSockets; the $EOF
framing stays plaintext and the frame checksum replaces the RDB CRC64. The
replica decompresses inline socket loads (both $EOF-mark and size-framed)
and via the existing rdbLoad auto-detect for disk-receive; the dual-channel
handshake advertises the capability. Adds streamReaderValidateFrameEnd and
STREAM_READER_ERROR_TRUNCATED so a link dropped mid-frame is a recoverable
resync rather than codec corruption.

Tests: disk grouping (all/mixed/none/off), diskless, dual-channel,
disk-receive load, checksum on/off, full->incremental handoff,
teardown+resync, mid-transfer truncation recovery, and an in-frame
corruption unit test.

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
@roshkhatri roshkhatri force-pushed the repl-streaming-comression-fullsync-pr branch from efc525d to ef02d03 Compare June 30, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants