Add streaming compression for replication full sync#20
Open
roshkhatri wants to merge 1 commit into
Open
Conversation
25ab585 to
acabef9
Compare
|
tests are failing :P |
4e269f4 to
efc525d
Compare
Compress the full-sync RDB payload for replicas that negotiate the compression capability, covering disk-based, diskless, and dual-channel sync. Reuses the streaming RDB codec (VCS/LZ4) and the existing repl-compression config (no | lz4-stream); default off. The payload is compressed only when every replica in the attaching cohort advertised REPLICA_CAPA_COMPRESSION (cohort-AND); a mixed group falls back to plaintext for all. Disk: decided in startBgsaveForReplication via RDBFLAGS_REPL_COMPRESSED_SYNC and gated by repl-compression in rdbSaveInternal. Diskless: decided in rdbSaveToReplicasSockets; the $EOF framing stays plaintext and the frame checksum replaces the RDB CRC64. The replica decompresses inline socket loads (both $EOF-mark and size-framed) and via the existing rdbLoad auto-detect for disk-receive; the dual-channel handshake advertises the capability. Adds streamReaderValidateFrameEnd and STREAM_READER_ERROR_TRUNCATED so a link dropped mid-frame is a recoverable resync rather than codec corruption. Tests: disk grouping (all/mixed/none/off), diskless, dual-channel, disk-receive load, checksum on/off, full->incremental handoff, teardown+resync, mid-transfer truncation recovery, and an in-frame corruption unit test. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
efc525d to
ef02d03
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds streaming compression of the full-sync RDB payload for replicas that negotiate the compression capability, covering disk-based sync, diskless sync, and dual-channel sync. Builds on the streaming RDB codec (VCS/LZ4) and reuses the existing
repl-compressionconfig (no|lz4-stream); default off. Stacked onreplication-streaming-compression-pr(the incremental-stream compression work).Negotiation and the cohort rule
A replica advertises
REPLCONF capa compression(REPLICA_CAPA_COMPRESSION) whenrepl-compressionis enabled; the primary compresses only when it also hasrepl-compressionenabled. Because a single full sync (a shared disk file, or one connset/pipe) serves a whole cohort of replicas, the decision is the AND of the capability over every replica in the attaching cohort: if any replica is not capable, the payload is sent plaintext so all of them can load it. This mirrors the existingskip_rdb_checksumcohort-AND.Disk-based
The cohort AND is computed in
startBgsaveForReplication(the genericrdbSaveBackgroundis not replication-aware) and passed viaRDBFLAGS_REPL_COMPRESSED_SYNC;rdbSaveInternalapplies therepl-compressiongate, decoupled from therdbcompressionpersistence setting.Diskless
The cohort AND is computed inside
rdbSaveToReplicasSocketsalongsideskip_rdb_checksum. Only the RDB body is wrapped as a VCS frame; the$EOFframing stays plaintext and the frame's own checksum replaces the RDB CRC64. The replica decompresses inline when it loads from the socket (repl-diskless-loadenabled), or via the existingrdbLoadauto-detect when it receives the stream to disk first (the default, and dual-channel). The dual-channel handshake now advertises the compression capability.streamReaderValidateFrameEndis added so the replica can validate a closed compressed frame without consuming the caller-owned trailing EOF mark.Testing
Disk grouping (all/mixed/none capable), diskless and dual-channel sync, default disk-receive load, checksum on/off, the compressed-full-sync to compressed-incremental handoff,
REPLICAOF NO ONEteardown and resync, and the AOF-baseBGREWRITEAOFfallback (a compression-capable replica's compressed disk sync RDB cannot be reused as the AOF base).valkey.confdocuments the expandedrepl-compressioncoverage.Follow-up
A
src/unitgtest for thestreamReaderValidateFrameEndsplit is tracked as a follow-up (currently covered by the integration tests).