[S2.7] Compression write-path hook#19
Merged
Merged
Conversation
Wires compressionEnqueueCandidate into dbAddInternal and dbSetValue,
and replaces the TODO(S2.7) placeholder in the drain handler with a
real install path. With this change, writes to eligible STRING values
get queued for background compression and the result is installed back
into the kvstore as an OBJ_ENCODING_COMPRESSED robj.
The decoder (S2.6) is shipped but not yet wired into read paths (S2.8),
so as long as compression-enabled stays no (default), behavior is
unchanged. Once an operator turns the switch on, written values get
compressed, but reads return the compressed bytes until S2.8 lands.
Existing transparency tests verify no regression in the default-off
configuration.
Producer side (compression.c, db.c)
Two seams in db.c — end of dbAddInternal and end of dbSetValue —
call compressionEnqueueCandidate(key, value, db->id). The candidate
function applies four guards:
1. Master switch (compression_enabled, via compressionIsEligible).
2. R2.2 eligibility (type/encoding/size/hot-key — also via predicate).
3. R2.1.5 active-dict check — saves an allocator round-trip when
compression-enabled=yes but training hasn't completed.
4. incrRefCount(value) — pins the bytes for the worker AND
reserves the robj address for the drain handler's pointer-
equality stale check (ABA-safe per R2.4.4 + the lifetime
discussion in PR #18).
If the worker pool refuses (not started; future S2.11 inbox full),
the pin is released immediately. RDB-load enqueue is deliberately
skipped — TODO(S2.10): the sweep tick will rediscover RDB-loaded
values without hammering the inbox during load.
API change: compressionWorkersEnqueue
Old: compressionWorkersEnqueue(sds key, int dbid, uint64_t version, sds src)
New: compressionWorkersEnqueue(robj *value, int dbid)
The new form requires a pinned robj; the worker reads
objectGetVal(value) once at enqueue (captured into job->src) and
never touches the robj afterwards (R2.11.4 intact). The drain
handler uses job->value for the kvstore lookup and the pointer-
equality stale check.
The version field is gone — pointer equality, made ABA-safe by the
pin, is sufficient. R2.4.4 explains why: holding incrRefCount(value)
prevents the allocator from reusing the address while the job is
in flight.
Drain install (compression_workers.c)
New compressionInstall() helper:
1. void **slot = kvstoreHashtableFindRef(db->keys, didx, key_sds);
2. If slot == NULL OR *slot != job->value: stale (overwrite, expire,
or COW). Discard.
3. Else: createCompressedObject(OBJ_STRING, job->dst, job->dst_len);
dbReplaceValue installs.
4. compressionRegistryIncRef(job->dict_id) on success.
dbReplaceValue routes through dbSetValue(..., overwrite=0, ...),
which does NOT call signalModifiedKey, moduleNotifyKeyUnlink, or
signalDeletedKeyAsReady. Background compression is a storage-only
change per R2.9.2 — no WATCH dirty_cas, no client-side-caching
invalidations, no keyspace notifications.
Pin released on every drain completion path (success, stale-discard,
net-savings reject, ZSTD error, no-active-dict). Test-mode jobs
(job->value == NULL) skip both install and decRef.
Test migration
The 15 existing test-fixture call sites passed raw sds + dummy
version. Migrated to a new testOnlyCompressionWorkersEnqueueRaw(src,
dbid) that sets job->value = NULL. Tests extract jobs via
testOnlyCompressionWorkersDrainOutbox before the production drain
runs, so production-only paths (install, decRef) are never reached
by the value=NULL sentinel.
No new gtest cases for the install path itself — that requires a
fully-initialized server.db / kvstore that the unit-test environment
doesn't construct. End-to-end coverage will come from the Tcl
transparency harness once S2.8 wires the read path.
TODO(S4.1) markers added at:
- compressionInstall: compression_compressions_per_sec, EMA fold,
compression_compressed_objects.
- compressionEnqueueCandidate: compression_candidates_dropped_total
when S2.11 lands (today the pool-not-started rejection is a
config state, not back-pressure).
Verified locally:
- make -j2 -C src → clean (BUILD_ZSTD=yes default).
- make -j2 -C src BUILD_ZSTD=no → clean.
- ./runtest --single unit/type/compression → 10/10 pass.
gtest unit tests not runnable locally; CI validates.
Diff stat:
.../implementation/plan.md | 4 +-
src/compression.c | 35 +++-
src/compression.h | 27 ++-
src/compression_workers.c | 185 +++++++++++++++------
src/compression_workers.h | 56 +++----
src/db.c | 14 ++
src/unit/test_compression_workers.cpp | 31 ++--
7 files changed, 244 insertions(+), 108 deletions(-)
ikolomi
commented
Jun 4, 2026
ikolomi
commented
Jun 7, 2026
Two reviewer threads addressed: Thread #1 (T-3369017721) — production code carrying test concerns The drain handler had a `if (job->value == NULL)` branch that only existed to handle test-only jobs from testOnlyCompressionWorkersEnqueueRaw. Reviewer correctly pointed out that production code shouldn't carry test-only branches. Fix: replaced with serverAssert(job->value != NULL) at the top of the per-job loop. Production drain assumes every job has a real pinned robj; tests must extract their value=NULL jobs via testOnlyCompressionWorkersDrainOutbox before this drain runs. Side effect: removed the conditional `if (job->value != NULL)` guards around decrRefCount and the install branch — the top-of-loop assert means every code path can assume value is non-NULL. Thread #2 (T-3356207626) — design doc out of sync with implementation Design §4.6 still described the original version-counter approach for staleness detection (`uint64_t version` field on compressionJob, "if version counter moved, discard"). The implementation has used pointer equality + the incrRefCount-pin since S2.4 PR #13. Fix: updated §4.6 to: - compressionJob struct: drop `version`, drop `robj *key`, add `robj *value` (pinned via incrRefCount), and `sds src` and `int dbid` separately, matching the actual struct. - Concurrency notes: replaced the "version counter moved" bullet with the pointer-equality + ABA-safety reasoning, naming the incrRefCount-reserves-the-address invariant as the protection mechanism (same property explained in PR #18 review). Verified locally: - make -j2 -C src → clean - ./runtest --single unit/type/compression → 10/10 pass
build-32bit (and the 30+ downstream cells, all CI cells use -Werror):
compression_workers.c:531:20: error: initialization of 'serverDb *'
from incompatible pointer type 'serverDb **'
[-Werror=incompatible-pointer-types]
`server.db` is `serverDb **` (array of pointers, one per DB). So
`server.db[i]` is already `serverDb *` — the address-of operator was
redundant and produced `serverDb **`.
Fix: drop the `&`. Matches the pattern used everywhere else in the
codebase (db.c, server.c, etc.).
Local make didn't catch this — the default SERVER_CFLAGS doesn't
include -Werror. CI does. Built locally with `make SERVER_CFLAGS=-Werror`
to confirm clean.
5 gtest cases failed on build-32bit (and would on every test cell)
with the new production-drain serverAssert(job->value != NULL):
ASSERTION FAILED: compression_workers.c:591 'job->value != NULL'
in: SingleJobRoundTrip, BurstOf256JobsOneWorker,
BurstOf1024JobsFourWorkers, ResizeAcrossEnqueuedJobs,
NetSavingsGuardRejectsIncompressible
Root cause: the previous commit's reviewer-driven hardening (PR #19
review thread #1) made the production drain assert that every job
has a non-NULL pinned robj. The premise was "tests use the testOnly
drain to extract jobs before the production drain runs". That premise
was wrong — many tests ALSO call compressionWorkersDrainOutbox
directly to consume-and-dispose test-mode jobs (the drainUntil helper
is the most-used path).
Fix: add testOnlyCompressionWorkersDrainAndDispose(budget) — pulls
jobs via the existing testOnlyCompressionWorkersDrainOutbox, frees
them via testOnlyCompressionWorkersFreeJob, returns count. Migrate
the test fixture's drainUntil helper and all 8 direct
compressionWorkersDrainOutbox call sites in the test file to the
new helper.
Production drain stays clean — no test concerns. Reviewer thread #1
intent preserved.
Verified locally:
- make -j2 -C src SERVER_CFLAGS=-Werror → clean
- ./runtest --single unit/type/compression → 10/10 pass
ikolomi
commented
Jun 7, 2026
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
Adopts the transient-view approach for the read-path hook (S2.8) after deep analysis of three options (per-site / decompress-in-place / transient view). Design doc changes: - §2.5: rewrote R2.5.2 to note that the single decoder helper is called from inside lookupKey* when LOOKUP_READ_BYTES is set; added R2.5.7 describing the full transient model (decompress on lookup, pin via incrRefCount, save compressed buffer in side-map, flip encoding to RAW for the iteration; restore via pointer swap at beforeSleep using kvstore-slot pointer-equality for staleness detection). - §2.5.6 updated: cost is now "1 decompress per event-loop iteration that touches the key" rather than "1 decompress per read" — amortizes naturally for repeat reads. - §3.2 read-path seam: rewritten to match (lookupKey* with flag centralizes; 3 out-of-process paths bypass). - §4.2 db.c row: rewritten to describe the new flag plumbing. - New Appendix E: full design exploration including the codebase- sweep findings (62 sdslen(objectGetVal) sites, 9 dbUnshareStringValue callers, addReplyBulk catches only ~30% of byte access), the trade-off matrix across the three approaches, why approach 3 was chosen, and limitations / future considerations. Preserved for future contributors who may revisit the choice. Plan.md changes: - S2.8 description rewritten with the transient-model details, files touched per the codebase sweep, and explicit out-of-process paths that need separate wiring (AOF rewrite child, RDB-replication save). feedReplicationBufferWithObject is now noted as NOT needing wiring (operates on argv, not kvstore). Why option 3 over the alternatives: - Approach 1 (per-site) has 62-site leakage surface; future code can silently corrupt by missing a decompress call. - Approach 2 (decompress-in-place forever) violates R2.5.6 — read-hot keys lose memory savings; MEMORY USAGE / OBJECT ENCODING change after first read (operator surprise); sweep treadmill on read-hot patterns. - Approach 3 (transient view) preserves R2.5.6, is leak-proof for lookupKey paths, restoration is a free pointer swap (no re-compression cost), bounded memory inflation (≤ uncompressed baseline — feature can't make memory worse than the no-compression case), and reuses three v1 invariants (R2.4.4 COW, R2.5.2 single helper, PR #19 pointer equality). No code changes in this commit — design and plan only. S2.8 implementation lands in a follow-up PR per this design.
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 8, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
…ess (writes) (2/3) (#23) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires
compressionEnqueueCandidateintodbAddInternalanddbSetValue, and replaces theTODO(S2.7)placeholder in the drain handler with a real install path. With this change, writes to eligible STRING values get queued for background compression and the result is installed back into the kvstore as anOBJ_ENCODING_COMPRESSEDrobj.The decoder (S2.6) is shipped but not yet wired into read paths (S2.8). As long as
compression-enabled = no(default), behavior is unchanged. Once an operator turns the switch on, written values get compressed; reads return the compressed bytes until S2.8 lands. Existing transparency tests verify no regression in the default-off configuration.Architecture
incrRefCount(value)to pinkvstoreHashtableFindRef(db->keys, didx, key) == &job->value? Pointer equality.incrRefCount(value)makes pointer equality decisive — see PR #18 review thread. Mutations viadbUnshareStringValue, overwrites viadbSetValue, and expires all produce different pointers. No version counter needed.compressionWorkersEnqueue(robj *value, int dbid)(replaces(sds, int, uint64_t, sds))objectGetVal(value)once at enqueue (captured intojob->src) and never touches the robj after. R2.11.4 intact. The version field is gone — pointer equality + pin is sufficient.dbReplaceValue→dbSetValue(..., overwrite=0, ...)signalModifiedKey,moduleNotifyKeyUnlink, orsignalDeletedKeyAsReady. Storage-only change per R2.9.2 — noWATCHdirty_cas, no client-side-caching invalidations, no keyspace notifications.testOnlyCompressionWorkersEnqueueRaw(sds src, int dbid)withjob->value = NULLtestOnlyCompressionWorkersDrainOutboxbefore the production drain runs, so the install path's serverAssert(value!=NULL) never fires for test jobs.Producer side (db.c)
Two seams added — both at the point where the value is now installed in the kvstore:
setKeydispatches todbAdd/dbAddInternal/dbSetValue— covered transitively. RDB-load enqueue is deliberately skipped —TODO(S2.10)notes that the sweep tick will rediscover RDB-loaded values without hammering the inbox during load.Drain install path
Pin released on every drain completion path: success, stale-discard, net-savings reject, ZSTD error, no-active-dict. Test-mode jobs (
job->value == NULLfromtestOnlyCompressionWorkersEnqueueRaw) skip both install and decRef.ABA safety — why pointer equality is decisive
Discussed at length in the PR #18 review (preserved here for record):
incrRefCount(value)prevents the robj from being freed.zmalloc.dbAdd/dbSetValueallocates a fresh robj at a different address.dbUnshareStringValue-driven COW seesrefcount > 1and creates a brand-new robj — also different address.The drain-time
*slot == job->valuecomparison is therefore a true equality check: same pointer ⇒ same value, by construction.What this PR does NOT do
objectGetUncompressedViewintogetCommandetc.)compression_compressed_objects,compression_compressions_per_sec,compression_live_ratio_10mEMA)TODO(S4.1):markers incompressionInstallandcompressionEnqueueCandidate.COMPRESSION SWEEP.mutexQueueis unbounded. Production safety net to follow.server.db/kvstorethat the unit-test environment doesn't construct. End-to-end coverage will come from the transparency harness once S2.8 lands.Test migration
15 existing
compressionWorkersEnqueue(key, dbid, version, val)call sites intest_compression_workers.cppmigrated totestOnlyCompressionWorkersEnqueueRaw(val, dbid). The new test-only enqueue setsjob->value = NULLso the production install path'sserverAssert(value != NULL)never fires for test jobs (tests already extract viatestOnlyCompressionWorkersDrainOutboxbefore the production drain runs).Verified locally
make -j2 -C src→ clean (BUILD_ZSTD=yesdefault).make -j2 -C src BUILD_ZSTD=no→ clean../runtest --single unit/type/compression→ 10/10 pass.Not verified locally (CI gates)
libgtest-devnot installed in this dev environment).clang-format-18.Diff stat