[design] R2.5.7 — transient decompression model + Appendix E#21
Merged
Merged
Conversation
Adopts the transient-view approach for the read-path hook (S2.8) after deep analysis of three options (per-site / decompress-in-place / transient view). Design doc changes: - §2.5: rewrote R2.5.2 to note that the single decoder helper is called from inside lookupKey* when LOOKUP_READ_BYTES is set; added R2.5.7 describing the full transient model (decompress on lookup, pin via incrRefCount, save compressed buffer in side-map, flip encoding to RAW for the iteration; restore via pointer swap at beforeSleep using kvstore-slot pointer-equality for staleness detection). - §2.5.6 updated: cost is now "1 decompress per event-loop iteration that touches the key" rather than "1 decompress per read" — amortizes naturally for repeat reads. - §3.2 read-path seam: rewritten to match (lookupKey* with flag centralizes; 3 out-of-process paths bypass). - §4.2 db.c row: rewritten to describe the new flag plumbing. - New Appendix E: full design exploration including the codebase- sweep findings (62 sdslen(objectGetVal) sites, 9 dbUnshareStringValue callers, addReplyBulk catches only ~30% of byte access), the trade-off matrix across the three approaches, why approach 3 was chosen, and limitations / future considerations. Preserved for future contributors who may revisit the choice. Plan.md changes: - S2.8 description rewritten with the transient-model details, files touched per the codebase sweep, and explicit out-of-process paths that need separate wiring (AOF rewrite child, RDB-replication save). feedReplicationBufferWithObject is now noted as NOT needing wiring (operates on argv, not kvstore). Why option 3 over the alternatives: - Approach 1 (per-site) has 62-site leakage surface; future code can silently corrupt by missing a decompress call. - Approach 2 (decompress-in-place forever) violates R2.5.6 — read-hot keys lose memory savings; MEMORY USAGE / OBJECT ENCODING change after first read (operator surprise); sweep treadmill on read-hot patterns. - Approach 3 (transient view) preserves R2.5.6, is leak-proof for lookupKey paths, restoration is a free pointer swap (no re-compression cost), bounded memory inflation (≤ uncompressed baseline — feature can't make memory worse than the no-compression case), and reuses three v1 invariants (R2.4.4 COW, R2.5.2 single helper, PR #19 pointer equality). No code changes in this commit — design and plan only. S2.8 implementation lands in a follow-up PR per this design.
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) (#23) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
The S2.8 forward-reference in plan.md previously named the out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'. This PR closes that out: PR #24 IS that work. Update the entry to reference PR #24 and capture the as-implemented scope (which paths, which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding, which already covered by PR #23's transient-view hook). S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23, and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the @GilboaAWS track: S1.2 (training) — now safe to merge because S2.8 PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer panic when they encounter compressed values.
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
* [S2.8] Out-of-process explicit decompression (3/3) PR 3 of 3 in the S2.8 split. Wires explicit objectGetUncompressedView calls into the two paths that bypass lookupKey() and thus do not benefit from PR #23's transient-view hook: - rioWriteBulkObject (aof.c) — AOF rewrite child iterates kvstore via kvstoreIteratorNext; AOF on-disk format is uncompressed RESP per R2.6.5. - rdbSaveStringObject (rdb.c) — same iteration pattern; per R2.6.5 (AOF preamble within RDB) and R2.6.8 (full-sync replication RDB), uncompressed bytes are mandatory. Both today panic on OBJ_ENCODING_COMPRESSED (rioWriteBulkObject: serverPanic("Unknown string encoding"); rdbSaveStringObject: serverAssertWithInfo on sdsEncodedObject). The crash is latent only because no dictionary training has merged yet (S1.2 pending) — once training lands, BGSAVE/BGREWRITEAOF on a compressed key crashes the server. PR 3 is a prerequisite for S1.2 keeping unstable continuously safe. R2.6.1 (compressed disk RDB encoding via RDB_ENC_COMPRESSED + AUX dict entries) is intentionally deferred to S3.1/S3.2 (@GilboaAWS). Until that lands, the conservative choice for ALL RDB targets is uncompressed, matching pre-feature behavior. Disk RDBs pay the decompression cost once at save time and re-acquire compression at load time via the post-S1.2 sweeper. Dict lifetime safety in the forked child: registry is a fork-time snapshot; DDicts are immutable; child uses its own copy of the registry pointers. No interaction with the parent's QSBR generation counters. Replication feed (feedReplicationBufferWithObject) does NOT need wiring — already verified to operate on argv arguments and synthetic SELECT robjs only, never on kvstore values. DUMP/RESTORE/MIGRATE (R2.6.7) go through lookupKey() which now decompresses transparently per S2.8 PR 2; nothing to change. Tests deferred to the post-S1.2 transparency Tcl harness — that's where real compressed values exist and BGSAVE/BGREWRITEAOF round-trips can be exercised end-to-end. Verified locally: make -j2 -C src SERVER_CFLAGS=-Werror clean both with BUILD_ZSTD=yes (default) and BUILD_ZSTD=no; compression Tcl tests 10/10 pass. * docs(planning): mark S2.8 fully complete via PR #24 (3/3) The S2.8 forward-reference in plan.md previously named the out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'. This PR closes that out: PR #24 IS that work. Update the entry to reference PR #24 and capture the as-implemented scope (which paths, which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding, which already covered by PR #23's transient-view hook). S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23, and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the @GilboaAWS track: S1.2 (training) — now safe to merge because S2.8 PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer panic when they encounter compressed values.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Design + plan update for the read-path hook (S2.8) following deep analysis of three approaches.
Decision: adopt the transient view model —
lookupKey*with a newLOOKUP_READ_BYTESflag transparently decompresses, pins the robj viaincrRefCount, stores the original compressed buffer in a per-server side-map, and flips encoding to RAW for the duration of the event-loop iteration. At the nextbeforeSleepboundary,compressionRestoreTouchedKeys()restores the compressed view via pointer swap (free of CPU cost) for unmutated keys, or discards orphaned entries for mutated/overwritten/expired ones (detected via the same kvstore-slot pointer-equality check used by the write-path drain).No code changes in this CR — design doc and plan only. S2.8 implementation lands in a follow-up CR per this design.
What changed
design/detailed-design.mdobjectGetUncompressedView) is called from insidelookupKey*whenLOOKUP_READ_BYTESis set; out-of-process paths still call it directly.implementation/plan.mdWhy option 3 over the alternatives
Approach 3 reuses three existing v1 invariants:
dbUnshareStringValue) — the pin makes mutations COW automatically; mutation detection is free at restoration time.No new fundamental mechanism introduced.
Verified