Skip to content

[S2.7] Compression write-path hook#19

Merged
ikolomi merged 4 commits into
unstablefrom
ikolomi/s2-write-path-hook
Jun 7, 2026
Merged

[S2.7] Compression write-path hook#19
ikolomi merged 4 commits into
unstablefrom
ikolomi/s2-write-path-hook

Conversation

@ikolomi

@ikolomi ikolomi commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

Wires compressionEnqueueCandidate into dbAddInternal and dbSetValue, and replaces the TODO(S2.7) placeholder in the drain handler with a real install path. With this change, writes to eligible STRING values get queued for background compression and the result is installed back into the kvstore as an OBJ_ENCODING_COMPRESSED robj.

The decoder (S2.6) is shipped but not yet wired into read paths (S2.8). As long as compression-enabled = no (default), behavior is unchanged. Once an operator turns the switch on, written values get compressed; reads return the compressed bytes until S2.8 lands. Existing transparency tests verify no regression in the default-off configuration.

Architecture

Concern Choice Why
Producer guards (1) eligibility predicate (R2.2), (2) active-dict check (R2.1.5), (3) incrRefCount(value) to pin Eligibility short-circuits when feature is disabled. Active-dict check saves allocator round-trip during the "compression-enabled yes but training not done yet" window. The pin protects bytes from in-place mutation (R2.4.4) AND reserves the robj address for the drain handler's pointer-equality ABA-safe stale check.
Stale detection At drain: kvstoreHashtableFindRef(db->keys, didx, key) == &job->value? Pointer equality. Holding incrRefCount(value) makes pointer equality decisive — see PR #18 review thread. Mutations via dbUnshareStringValue, overwrites via dbSetValue, and expires all produce different pointers. No version counter needed.
API change compressionWorkersEnqueue(robj *value, int dbid) (replaces (sds, int, uint64_t, sds)) Worker reads objectGetVal(value) once at enqueue (captured into job->src) and never touches the robj after. R2.11.4 intact. The version field is gone — pointer equality + pin is sufficient.
kvstore replace primitive dbReplaceValuedbSetValue(..., overwrite=0, ...) Does NOT call signalModifiedKey, moduleNotifyKeyUnlink, or signalDeletedKeyAsReady. Storage-only change per R2.9.2 — no WATCH dirty_cas, no client-side-caching invalidations, no keyspace notifications.
Test migration New testOnlyCompressionWorkersEnqueueRaw(sds src, int dbid) with job->value = NULL Tests extract jobs via testOnlyCompressionWorkersDrainOutbox before the production drain runs, so the install path's serverAssert(value!=NULL) never fires for test jobs.

Producer side (db.c)

Two seams added — both at the point where the value is now installed in the kvstore:

// end of dbAddInternal, after kvstoreHashtableAdd:
*valref = val;
compressionEnqueueCandidate(key, val, db->id);

// end of dbSetValue, after the value-replacement block:
*valref = new;
compressionEnqueueCandidate(key, new, db->id);

setKey dispatches to dbAdd / dbAddInternal / dbSetValue — covered transitively. RDB-load enqueue is deliberately skippedTODO(S2.10) notes that the sweep tick will rediscover RDB-loaded values without hammering the inbox during load.

Drain install path

static int compressionInstall(compressionJob *job) {
    serverDb *db = &server.db[job->dbid];
    sds key_sds = (sds)objectGetKey(job->value);
    void **slot = kvstoreHashtableFindRef(db->keys, didx, key_sds);

    if (slot == NULL || *slot != job->value) {
        zfree(job->dst);  /* stale: overwrite/expire/COW happened */
        return 0;
    }

    robj *compressed = createCompressedObject(OBJ_STRING, job->dst, job->dst_len);
    robj key_obj;
    initStaticStringObject(key_obj, key_sds);
    dbReplaceValue(db, &key_obj, &compressed);
    compressionRegistryIncRef(job->dict_id);
    return 1;
}

Pin released on every drain completion path: success, stale-discard, net-savings reject, ZSTD error, no-active-dict. Test-mode jobs (job->value == NULL from testOnlyCompressionWorkersEnqueueRaw) skip both install and decRef.

ABA safety — why pointer equality is decisive

Discussed at length in the PR #18 review (preserved here for record):

  • incrRefCount(value) prevents the robj from being freed.
  • Because the robj isn't freed, its memory region is still reserved by the allocator — jemalloc cannot return that address from a future zmalloc.
  • During the entire window between enqueue and drain, the address is held alive by our pin.
  • Any kvstore replace/delete/expire decrements the kvstore reference (2→1), but the robj lives.
  • A subsequent dbAdd / dbSetValue allocates a fresh robj at a different address.
  • A dbUnshareStringValue-driven COW sees refcount > 1 and creates a brand-new robj — also different address.

The drain-time *slot == job->value comparison is therefore a true equality check: same pointer ⇒ same value, by construction.

What this PR does NOT do

Concern Where it lands
Read-path hook (objectGetUncompressedView into getCommand etc.) S2.8 — without it, GETs of compressed values return garbage. Transparency tests pass today because feature defaults to off.
INFO counters (compression_compressed_objects, compression_compressions_per_sec, compression_live_ratio_10m EMA) S4.1TODO(S4.1): markers in compressionInstall and compressionEnqueueCandidate.
Sweep / cron integration S2.10 — only freshly-written values get queued in this PR. RDB-loaded and pre-existing keys stay uncompressed until the sweeper or COMPRESSION SWEEP.
Bounded inbox + back-pressure counters S2.11 — current mutexQueue is unbounded. Production safety net to follow.
Install-path gtest coverage Deferred to Tcl integration tests. The install path needs a fully-initialized server.db / kvstore that the unit-test environment doesn't construct. End-to-end coverage will come from the transparency harness once S2.8 lands.

Test migration

15 existing compressionWorkersEnqueue(key, dbid, version, val) call sites in test_compression_workers.cpp migrated to testOnlyCompressionWorkersEnqueueRaw(val, dbid). The new test-only enqueue sets job->value = NULL so the production install path's serverAssert(value != NULL) never fires for test jobs (tests already extract via testOnlyCompressionWorkersDrainOutbox before the production drain runs).

Verified locally

  • make -j2 -C src → clean (BUILD_ZSTD=yes default).
  • make -j2 -C src BUILD_ZSTD=no → clean.
  • ./runtest --single unit/type/compression → 10/10 pass.

Not verified locally (CI gates)

  • gtest unit tests (libgtest-dev not installed in this dev environment).
  • clang-format-18.

Diff stat

.../implementation/plan.md                |   4 +-
src/compression.c                         |  35 +++-
src/compression.h                         |  27 ++-
src/compression_workers.c                 | 185 +++++++++++++++------
src/compression_workers.h                 |  56 +++----
src/db.c                                  |  14 ++
src/unit/test_compression_workers.cpp     |  31 ++--
7 files changed, 244 insertions(+), 108 deletions(-)

Wires compressionEnqueueCandidate into dbAddInternal and dbSetValue,
and replaces the TODO(S2.7) placeholder in the drain handler with a
real install path. With this change, writes to eligible STRING values
get queued for background compression and the result is installed back
into the kvstore as an OBJ_ENCODING_COMPRESSED robj.

The decoder (S2.6) is shipped but not yet wired into read paths (S2.8),
so as long as compression-enabled stays no (default), behavior is
unchanged. Once an operator turns the switch on, written values get
compressed, but reads return the compressed bytes until S2.8 lands.
Existing transparency tests verify no regression in the default-off
configuration.

Producer side (compression.c, db.c)

  Two seams in db.c — end of dbAddInternal and end of dbSetValue —
  call compressionEnqueueCandidate(key, value, db->id). The candidate
  function applies four guards:
    1. Master switch (compression_enabled, via compressionIsEligible).
    2. R2.2 eligibility (type/encoding/size/hot-key — also via predicate).
    3. R2.1.5 active-dict check — saves an allocator round-trip when
       compression-enabled=yes but training hasn't completed.
    4. incrRefCount(value) — pins the bytes for the worker AND
       reserves the robj address for the drain handler's pointer-
       equality stale check (ABA-safe per R2.4.4 + the lifetime
       discussion in PR #18).

  If the worker pool refuses (not started; future S2.11 inbox full),
  the pin is released immediately. RDB-load enqueue is deliberately
  skipped — TODO(S2.10): the sweep tick will rediscover RDB-loaded
  values without hammering the inbox during load.

API change: compressionWorkersEnqueue

  Old: compressionWorkersEnqueue(sds key, int dbid, uint64_t version, sds src)
  New: compressionWorkersEnqueue(robj *value, int dbid)

  The new form requires a pinned robj; the worker reads
  objectGetVal(value) once at enqueue (captured into job->src) and
  never touches the robj afterwards (R2.11.4 intact). The drain
  handler uses job->value for the kvstore lookup and the pointer-
  equality stale check.

  The version field is gone — pointer equality, made ABA-safe by the
  pin, is sufficient. R2.4.4 explains why: holding incrRefCount(value)
  prevents the allocator from reusing the address while the job is
  in flight.

Drain install (compression_workers.c)

  New compressionInstall() helper:
    1. void **slot = kvstoreHashtableFindRef(db->keys, didx, key_sds);
    2. If slot == NULL OR *slot != job->value: stale (overwrite, expire,
       or COW). Discard.
    3. Else: createCompressedObject(OBJ_STRING, job->dst, job->dst_len);
       dbReplaceValue installs.
    4. compressionRegistryIncRef(job->dict_id) on success.

  dbReplaceValue routes through dbSetValue(..., overwrite=0, ...),
  which does NOT call signalModifiedKey, moduleNotifyKeyUnlink, or
  signalDeletedKeyAsReady. Background compression is a storage-only
  change per R2.9.2 — no WATCH dirty_cas, no client-side-caching
  invalidations, no keyspace notifications.

  Pin released on every drain completion path (success, stale-discard,
  net-savings reject, ZSTD error, no-active-dict). Test-mode jobs
  (job->value == NULL) skip both install and decRef.

Test migration

  The 15 existing test-fixture call sites passed raw sds + dummy
  version. Migrated to a new testOnlyCompressionWorkersEnqueueRaw(src,
  dbid) that sets job->value = NULL. Tests extract jobs via
  testOnlyCompressionWorkersDrainOutbox before the production drain
  runs, so production-only paths (install, decRef) are never reached
  by the value=NULL sentinel.

  No new gtest cases for the install path itself — that requires a
  fully-initialized server.db / kvstore that the unit-test environment
  doesn't construct. End-to-end coverage will come from the Tcl
  transparency harness once S2.8 wires the read path.

TODO(S4.1) markers added at:
  - compressionInstall: compression_compressions_per_sec, EMA fold,
    compression_compressed_objects.
  - compressionEnqueueCandidate: compression_candidates_dropped_total
    when S2.11 lands (today the pool-not-started rejection is a
    config state, not back-pressure).

Verified locally:
  - make -j2 -C src              → clean (BUILD_ZSTD=yes default).
  - make -j2 -C src BUILD_ZSTD=no → clean.
  - ./runtest --single unit/type/compression → 10/10 pass.

  gtest unit tests not runnable locally; CI validates.

Diff stat:

  .../implementation/plan.md                |   4 +-
  src/compression.c                         |  35 +++-
  src/compression.h                         |  27 ++-
  src/compression_workers.c                 | 185 +++++++++++++++------
  src/compression_workers.h                 |  56 +++----
  src/db.c                                  |  14 ++
  src/unit/test_compression_workers.cpp     |  31 ++--
  7 files changed, 244 insertions(+), 108 deletions(-)
@ikolomi ikolomi requested a review from GilboaAWS June 4, 2026 13:00
Comment thread src/compression_workers.h
Comment thread src/compression_workers.c Outdated
ikolomi added 3 commits June 7, 2026 11:43
Two reviewer threads addressed:

Thread #1 (T-3369017721) — production code carrying test concerns

  The drain handler had a `if (job->value == NULL)` branch that only
  existed to handle test-only jobs from
  testOnlyCompressionWorkersEnqueueRaw. Reviewer correctly pointed out
  that production code shouldn't carry test-only branches.

  Fix: replaced with serverAssert(job->value != NULL) at the top of
  the per-job loop. Production drain assumes every job has a real
  pinned robj; tests must extract their value=NULL jobs via
  testOnlyCompressionWorkersDrainOutbox before this drain runs.

  Side effect: removed the conditional `if (job->value != NULL)`
  guards around decrRefCount and the install branch — the top-of-loop
  assert means every code path can assume value is non-NULL.

Thread #2 (T-3356207626) — design doc out of sync with implementation

  Design §4.6 still described the original version-counter approach
  for staleness detection (`uint64_t version` field on compressionJob,
  "if version counter moved, discard"). The implementation has used
  pointer equality + the incrRefCount-pin since S2.4 PR #13.

  Fix: updated §4.6 to:
    - compressionJob struct: drop `version`, drop `robj *key`, add
      `robj *value` (pinned via incrRefCount), and `sds src` and
      `int dbid` separately, matching the actual struct.
    - Concurrency notes: replaced the "version counter moved" bullet
      with the pointer-equality + ABA-safety reasoning, naming the
      incrRefCount-reserves-the-address invariant as the protection
      mechanism (same property explained in PR #18 review).

Verified locally:
  - make -j2 -C src              → clean
  - ./runtest --single unit/type/compression  → 10/10 pass
build-32bit (and the 30+ downstream cells, all CI cells use -Werror):

  compression_workers.c:531:20: error: initialization of 'serverDb *'
    from incompatible pointer type 'serverDb **'
    [-Werror=incompatible-pointer-types]

`server.db` is `serverDb **` (array of pointers, one per DB). So
`server.db[i]` is already `serverDb *` — the address-of operator was
redundant and produced `serverDb **`.

Fix: drop the `&`. Matches the pattern used everywhere else in the
codebase (db.c, server.c, etc.).

Local make didn't catch this — the default SERVER_CFLAGS doesn't
include -Werror. CI does. Built locally with `make SERVER_CFLAGS=-Werror`
to confirm clean.
5 gtest cases failed on build-32bit (and would on every test cell)
with the new production-drain serverAssert(job->value != NULL):

  ASSERTION FAILED: compression_workers.c:591 'job->value != NULL'

  in: SingleJobRoundTrip, BurstOf256JobsOneWorker,
      BurstOf1024JobsFourWorkers, ResizeAcrossEnqueuedJobs,
      NetSavingsGuardRejectsIncompressible

Root cause: the previous commit's reviewer-driven hardening (PR #19
review thread #1) made the production drain assert that every job
has a non-NULL pinned robj. The premise was "tests use the testOnly
drain to extract jobs before the production drain runs". That premise
was wrong — many tests ALSO call compressionWorkersDrainOutbox
directly to consume-and-dispose test-mode jobs (the drainUntil helper
is the most-used path).

Fix: add testOnlyCompressionWorkersDrainAndDispose(budget) — pulls
jobs via the existing testOnlyCompressionWorkersDrainOutbox, frees
them via testOnlyCompressionWorkersFreeJob, returns count. Migrate
the test fixture's drainUntil helper and all 8 direct
compressionWorkersDrainOutbox call sites in the test file to the
new helper.

Production drain stays clean — no test concerns. Reviewer thread #1
intent preserved.

Verified locally:
  - make -j2 -C src SERVER_CFLAGS=-Werror   → clean
  - ./runtest --single unit/type/compression → 10/10 pass
Comment thread src/compression_workers.c
@ikolomi ikolomi merged commit 35e99d5 into unstable Jun 7, 2026
79 checks passed
ikolomi added a commit that referenced this pull request Jun 8, 2026
Adopts the transient-view approach for the read-path hook (S2.8) after
deep analysis of three options (per-site / decompress-in-place /
transient view).

Design doc changes:

- §2.5: rewrote R2.5.2 to note that the single decoder helper is
  called from inside lookupKey* when LOOKUP_READ_BYTES is set; added
  R2.5.7 describing the full transient model (decompress on lookup,
  pin via incrRefCount, save compressed buffer in side-map, flip
  encoding to RAW for the iteration; restore via pointer swap at
  beforeSleep using kvstore-slot pointer-equality for staleness
  detection).

- §2.5.6 updated: cost is now "1 decompress per event-loop iteration
  that touches the key" rather than "1 decompress per read" —
  amortizes naturally for repeat reads.

- §3.2 read-path seam: rewritten to match (lookupKey* with flag
  centralizes; 3 out-of-process paths bypass).

- §4.2 db.c row: rewritten to describe the new flag plumbing.

- New Appendix E: full design exploration including the codebase-
  sweep findings (62 sdslen(objectGetVal) sites, 9 dbUnshareStringValue
  callers, addReplyBulk catches only ~30% of byte access), the
  trade-off matrix across the three approaches, why approach 3 was
  chosen, and limitations / future considerations. Preserved for
  future contributors who may revisit the choice.

Plan.md changes:

- S2.8 description rewritten with the transient-model details, files
  touched per the codebase sweep, and explicit out-of-process paths
  that need separate wiring (AOF rewrite child, RDB-replication save).
  feedReplicationBufferWithObject is now noted as NOT needing wiring
  (operates on argv, not kvstore).

Why option 3 over the alternatives:

- Approach 1 (per-site) has 62-site leakage surface; future code can
  silently corrupt by missing a decompress call.
- Approach 2 (decompress-in-place forever) violates R2.5.6 — read-hot
  keys lose memory savings; MEMORY USAGE / OBJECT ENCODING change
  after first read (operator surprise); sweep treadmill on read-hot
  patterns.
- Approach 3 (transient view) preserves R2.5.6, is leak-proof for
  lookupKey paths, restoration is a free pointer swap (no
  re-compression cost), bounded memory inflation (≤ uncompressed
  baseline — feature can't make memory worse than the no-compression
  case), and reuses three v1 invariants (R2.4.4 COW, R2.5.2 single
  helper, PR #19 pointer equality).

No code changes in this commit — design and plan only. S2.8
implementation lands in a follow-up PR per this design.
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 8, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi added a commit that referenced this pull request Jun 9, 2026
…ess (writes) (2/3) (#23)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant