Skip to content

[design] R2.5.7 — transient decompression model + Appendix E#21

Merged
ikolomi merged 1 commit into
unstablefrom
ikolomi/design-r2-5-7-transient-decompression
Jun 8, 2026
Merged

[design] R2.5.7 — transient decompression model + Appendix E#21
ikolomi merged 1 commit into
unstablefrom
ikolomi/design-r2-5-7-transient-decompression

Conversation

@ikolomi

@ikolomi ikolomi commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Design + plan update for the read-path hook (S2.8) following deep analysis of three approaches.

Decision: adopt the transient view model — lookupKey* with a new LOOKUP_READ_BYTES flag transparently decompresses, pins the robj via incrRefCount, stores the original compressed buffer in a per-server side-map, and flips encoding to RAW for the duration of the event-loop iteration. At the next beforeSleep boundary, compressionRestoreTouchedKeys() restores the compressed view via pointer swap (free of CPU cost) for unmutated keys, or discards orphaned entries for mutated/overwritten/expired ones (detected via the same kvstore-slot pointer-equality check used by the write-path drain).

No code changes in this CR — design doc and plan only. S2.8 implementation lands in a follow-up CR per this design.

What changed

design/detailed-design.md

  • R2.5.2 rewritten: notes that the single decoder helper (objectGetUncompressedView) is called from inside lookupKey* when LOOKUP_READ_BYTES is set; out-of-process paths still call it directly.
  • R2.5.6 updated: per-read CPU cost is now per-iteration (amortized), not per-read.
  • R2.5.7 new: full description of the transient model — side-map mechanics, pointer-equality stale check, ABA safety, memory bound (≤ uncompressed baseline), out-of-process exceptions.
  • §3.2 read-path seam rewritten.
  • §4.2 db.c row rewritten.
  • Appendix E new: design exploration with codebase-sweep findings (62 byte-access sites, 9 mutation paths, addReplyBulk covers only ~30%), trade-off matrix across the three approaches, justification for approach 3, and limitations / future-work notes. Preserved for future contributors.

implementation/plan.md

  • S2.8 description rewritten with the transient-model details and files touched per the codebase sweep.

Why option 3 over the alternatives

Approach Leak risk R2.5.6 preserved Read-hot memory savings Implementation
1 — per-site decompress High (62 audit sites) Yes Yes ~150–300 lines scattered
2 — decompress-in-place forever None No (violated) Lost ~50 lines, but operator-visible surprises (MEMORY USAGE, OBJECT ENCODING change after read; sweep treadmill)
3 — transient view (chosen) None for lookupKey paths Yes Yes ~250 lines in 3 files

Approach 3 reuses three existing v1 invariants:

  • R2.4.4 (refcount > 1 → COW via dbUnshareStringValue) — the pin makes mutations COW automatically; mutation detection is free at restoration time.
  • R2.5.2 single decoder helper — unchanged, just relocated to inside lookupKey.
  • PR [S2.7] Compression write-path hook #19's pointer-equality stale check — already a working primitive.

No new fundamental mechanism introduced.

Verified

  • Doc renders cleanly.
  • TOC includes new Appendix E (item 12).
  • All R2.5.x cross-references intact.
  • No code touched.

Adopts the transient-view approach for the read-path hook (S2.8) after
deep analysis of three options (per-site / decompress-in-place /
transient view).

Design doc changes:

- §2.5: rewrote R2.5.2 to note that the single decoder helper is
  called from inside lookupKey* when LOOKUP_READ_BYTES is set; added
  R2.5.7 describing the full transient model (decompress on lookup,
  pin via incrRefCount, save compressed buffer in side-map, flip
  encoding to RAW for the iteration; restore via pointer swap at
  beforeSleep using kvstore-slot pointer-equality for staleness
  detection).

- §2.5.6 updated: cost is now "1 decompress per event-loop iteration
  that touches the key" rather than "1 decompress per read" —
  amortizes naturally for repeat reads.

- §3.2 read-path seam: rewritten to match (lookupKey* with flag
  centralizes; 3 out-of-process paths bypass).

- §4.2 db.c row: rewritten to describe the new flag plumbing.

- New Appendix E: full design exploration including the codebase-
  sweep findings (62 sdslen(objectGetVal) sites, 9 dbUnshareStringValue
  callers, addReplyBulk catches only ~30% of byte access), the
  trade-off matrix across the three approaches, why approach 3 was
  chosen, and limitations / future considerations. Preserved for
  future contributors who may revisit the choice.

Plan.md changes:

- S2.8 description rewritten with the transient-model details, files
  touched per the codebase sweep, and explicit out-of-process paths
  that need separate wiring (AOF rewrite child, RDB-replication save).
  feedReplicationBufferWithObject is now noted as NOT needing wiring
  (operates on argv, not kvstore).

Why option 3 over the alternatives:

- Approach 1 (per-site) has 62-site leakage surface; future code can
  silently corrupt by missing a decompress call.
- Approach 2 (decompress-in-place forever) violates R2.5.6 — read-hot
  keys lose memory savings; MEMORY USAGE / OBJECT ENCODING change
  after first read (operator surprise); sweep treadmill on read-hot
  patterns.
- Approach 3 (transient view) preserves R2.5.6, is leak-proof for
  lookupKey paths, restoration is a free pointer swap (no
  re-compression cost), bounded memory inflation (≤ uncompressed
  baseline — feature can't make memory worse than the no-compression
  case), and reuses three v1 invariants (R2.4.4 COW, R2.5.2 single
  helper, PR #19 pointer equality).

No code changes in this commit — design and plan only. S2.8
implementation lands in a follow-up PR per this design.
@ikolomi ikolomi requested a review from GilboaAWS June 8, 2026 12:11
@ikolomi ikolomi merged commit 92ce383 into unstable Jun 8, 2026
4 checks passed
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3) (#23)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
The S2.8 forward-reference in plan.md previously named the
out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'.
This PR closes that out: PR #24 IS that work. Update the entry to
reference PR #24 and capture the as-implemented scope (which paths,
which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding,
which already covered by PR #23's transient-view hook).

S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23,
and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the
@GilboaAWS track: S1.2 (training) — now safe to merge because S2.8
PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer
panic when they encounter compressed values.
ikolomi added a commit that referenced this pull request Jun 9, 2026
* [S2.8] Out-of-process explicit decompression (3/3)

PR 3 of 3 in the S2.8 split. Wires explicit objectGetUncompressedView
calls into the two paths that bypass lookupKey() and thus do not
benefit from PR #23's transient-view hook:

  - rioWriteBulkObject (aof.c) — AOF rewrite child iterates kvstore
    via kvstoreIteratorNext; AOF on-disk format is uncompressed RESP
    per R2.6.5.
  - rdbSaveStringObject (rdb.c) — same iteration pattern; per R2.6.5
    (AOF preamble within RDB) and R2.6.8 (full-sync replication RDB),
    uncompressed bytes are mandatory.

Both today panic on OBJ_ENCODING_COMPRESSED (rioWriteBulkObject:
serverPanic("Unknown string encoding"); rdbSaveStringObject:
serverAssertWithInfo on sdsEncodedObject). The crash is latent only
because no dictionary training has merged yet (S1.2 pending) — once
training lands, BGSAVE/BGREWRITEAOF on a compressed key crashes the
server. PR 3 is a prerequisite for S1.2 keeping unstable continuously
safe.

R2.6.1 (compressed disk RDB encoding via RDB_ENC_COMPRESSED + AUX
dict entries) is intentionally deferred to S3.1/S3.2 (@GilboaAWS).
Until that lands, the conservative choice for ALL RDB targets is
uncompressed, matching pre-feature behavior. Disk RDBs pay the
decompression cost once at save time and re-acquire compression at
load time via the post-S1.2 sweeper.

Dict lifetime safety in the forked child: registry is a fork-time
snapshot; DDicts are immutable; child uses its own copy of the
registry pointers. No interaction with the parent's QSBR generation
counters.

Replication feed (feedReplicationBufferWithObject) does NOT need
wiring — already verified to operate on argv arguments and synthetic
SELECT robjs only, never on kvstore values.

DUMP/RESTORE/MIGRATE (R2.6.7) go through lookupKey() which now
decompresses transparently per S2.8 PR 2; nothing to change.

Tests deferred to the post-S1.2 transparency Tcl harness — that's
where real compressed values exist and BGSAVE/BGREWRITEAOF
round-trips can be exercised end-to-end.

Verified locally: make -j2 -C src SERVER_CFLAGS=-Werror clean both
with BUILD_ZSTD=yes (default) and BUILD_ZSTD=no; compression Tcl
tests 10/10 pass.

* docs(planning): mark S2.8 fully complete via PR #24 (3/3)

The S2.8 forward-reference in plan.md previously named the
out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'.
This PR closes that out: PR #24 IS that work. Update the entry to
reference PR #24 and capture the as-implemented scope (which paths,
which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding,
which already covered by PR #23's transient-view hook).

S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23,
and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the
@GilboaAWS track: S1.2 (training) — now safe to merge because S2.8
PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer
panic when they encounter compressed values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant