Skip to content

[S2.8] Read-path activate: transient view (reads) + permanent decompress (writes) (2/3)#23

Merged
ikolomi merged 1 commit into
unstablefrom
ikolomi/s2-read-path-activate
Jun 9, 2026
Merged

[S2.8] Read-path activate: transient view (reads) + permanent decompress (writes) (2/3)#23
ikolomi merged 1 commit into
unstablefrom
ikolomi/s2-read-path-activate

Conversation

@ikolomi

@ikolomi ikolomi commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

PR 2 of 3 in the S2.8 split. Wires the transient-view model (R2.5.7 + Appendix E + Appendix E.7) into lookupKey() for the read path, AND adds a write-path optimization with re-compression auto-scheduling via signalModifiedKey.

The write-path optimization came out of the review discussion: there's no point holding the compressed form across a write since the kvstore slot always ends up pointing to a different robj (COW orphans the original) — the side-map entry is guaranteed to discard at beforeSleep. So for write-path lookups we permanently decompress instead of registering a transient view; for read-path lookups we use transient view as originally designed. The signalModifiedKey hook then auto-enqueues the post-mutation value for re-compression — the canonical "logical value changed" signal in Valkey, called by every byte-mutating command.

Architecture

Path Strategy Why
Read (no LOOKUP_WRITE) Transient view: decompress into temp sds, register in side-map, pin via incrRefCount, flip encoding to RAW. Restored at beforeSleep via free pointer-swap (or compressionPermanentlyDecompress when the savings-based cap is exceeded). Preserves compression across reads; only the temp sds + compressed buffer co-exist briefly within an event-loop iteration.
Write (LOOKUP_WRITE set, no LOOKUP_NO_BYTES) Permanent decompress: free compressed buffer, compressionRegistryDecRef the dict, install fresh sds as val_ptr, flip to RAW. No side-map entry, no pin. Value re-compresses after mutation via signalModifiedKey hook. The COW path always orphans the compressed robj at beforeSleep anyway. Doing the equivalent work upfront avoids side-map registration + pin-induced COW + kvstore re-fetch.
Metadata-only (LOOKUP_NO_BYTES) Skip both; preserve the truthful "compressed" encoding for introspection. TYPE/EXISTS/OBJECT ENCODING/TTL/etc. don't need bytes.

Memory cap (R2.5.7)

The naive transient view would let peak memory during an event-loop iteration grow as (1 + ratio) × no-compression baseline — a pathological MGET / Lua script touching many compressed values would OOM. To bound this, materialize is capped against the running compression-savings counter (transient_view_uncompressed_bytes ≤ compression_savings_bytes); cap exhaustion falls back to compressionPermanentlyDecompress and counts in compression_transient_view_capped_total. The cap is a derived signal — no operator-tunable knob — and its semantic ("transient memory ≤ what compression saved") makes the no-compression baseline an upper bound by construction.

File-level changes

File Change
src/server.h Renamed LOOKUP_READ_BYTESLOOKUP_NO_BYTES, inverted semantic (opt-out).
src/compression.{c,h} Side-map (hashtable.c primitive); compressionMaterializeTransientView; compressionPermanentlyDecompress; transientViewActive; compressionEnqueueModified; real compressionBeforeSleep body; savings counter + cap; discardTransientEntry helper shared by beforeSleep/shutdown/test.
src/db.c lookupKey() splits decode by LOOKUP_WRITE. signalModifiedKey calls compressionEnqueueModified. Removed PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue (replaced by the signalModifiedKey hook which catches both dbReplaceValue and in-place mutations). typeCommand/existsCommand/setKey opt out via LOOKUP_NO_BYTES.
src/object.c objectCommandLookup (OBJECT/MEMORY USAGE) opts out.
src/expire.c setExpireGenericCommand, ttlGenericCommand, persistCommand, touchCommand opt out.
src/debug.c DEBUG POPULATE existence check opts out.
src/networking.c isCopyAvoidPreferred returns 0 when transientViewActive(obj) (Appendix E.7 force-copy fix).
src/compression_header.c Savings counter accounting in createCompressedObject / freeCompressedObject.
src/unit/test_compression_transient_view.cpp (new) 11 gtest cases including memory-cap fallback and strict cap-enforcement tests.
design/detailed-design.md R2.5.7 + Appendix E.3/E.4 updated to describe the savings-based cap; LOOKUP_READ_BYTESLOOKUP_NO_BYTES swept.

Key design points (came out of review)

  • Why pointer-equality stale check, not a version field: holding incrRefCount(value) for the immutable-snapshot guarantee (R2.4.4) doubles as the ABA-safety mechanism — the allocator can't reuse the address while we hold a refcount, so a pointer match at beforeSleep is decisive. No new robj state required (PR [S2.7] Compression write-path hook #19 thread).
  • Why don't compute buf_len in the decoder: latent bug in PR [S2.6] Compression decoder path #18objectGetUncompressedView called sdslen() on a raw zmalloc'd buffer (val_ptr for OBJ_ENCODING_COMPRESSED is NOT an sds per createCompressedObject's contract). ASan caught the heap-buffer-overflow on test-sanitizer-address. The fix derives buffer length from the header (we wrote it ourselves at install time; createCompressedObject validates buffer_len == HEADER + compressed_len).
  • Why signalModifiedKey (not dbAddInternal/dbSetValue): the canonical "logical value at this key changed" signal in Valkey. Audit confirmed every byte-mutating command path fires it; the 3 paths that don't are exactly the ones we want to skip (RDB load → sweep handles per R2.6.2; worker drain dbReplaceValue per R2.9.2 — must not signal; module SETKEY_NO_SIGNAL opt-out). PR [S2.7] Compression write-path hook #19's wiring missed in-place mutations (APPEND/SETRANGE/BITOP); the hook catches them at end-of-mutation.
  • R2.9.2 invariant intact: direction is signalModifiedKey → compression-enqueue, not the reverse.
  • HASHTABLE_ITER_SAFE iteration: the side-map drain (beforeSleep, shutdown, test-helper) processes each entry in-place rather than collecting into a heap-allocated entries[] array (review thread [design] Clarify worker concurrency: CDict* passed in job, registry is main-thread only #6). Pauses incremental rehashing → safe to free entry structs inline; final hashtableEmpty(map, NULL) clears the bucket pointers.

What this PR does NOT do (PR 3 of 3 follows)

  • AOF rewrite child + RDB save for replication full-sync (R2.6.8) bypass lookupKey* and need explicit objectGetUncompressedView calls.
  • Disk RDB (R2.6.1) writes compressed bytes directly — no decompression needed there either.

Test coverage

src/unit/test_compression_transient_view.cpp — 11 gtest cases:

Test Asserts
EmptyMapNotActive initial transientViewActive returns 0
MaterializeRegistersAndFlips full state machine after read-path materialize
MaterializeNoOpOnNonCompressed defensive guard
MaterializeFailsOnCorruptBuffer decoder error → -1, robj unchanged
DrainEmptiesMapAndFreesEverything discard-all helper cleans up; ASan validates no leaks
MultipleObjsTrackedIndependently multiple registered objs each report active
PermanentlyDecompressFlipsAndFrees write-path: encoding RAW, refcount=1, no side-map entry, dict frame-ref decremented
PermanentlyDecompressNoOpOnNonCompressed defensive guard
PermanentlyDecompressFailsOnCorrupt decoder error → -1, robj unchanged
MemoryCapFallsBackToPermanent cap behavior when savings budget is tight
MemoryCapStrictlyEnforced first-materialize cap-trigger (single object's savings < its uncompressed_len by construction)

End-to-end coverage of beforeSleep's restore-vs-discard branches requires a live kvstore (not feasible in gtest); that lands as a Tcl integration test once S1.2 (training) enables real compression.

Verified locally

  • make -j2 -C src SERVER_CFLAGS=-Werror — clean (BUILD_ZSTD=yes default).
  • ./runtest --single unit/type/compression → 10/10 pass.
  • All 45 CI checks green at HEAD f43f89531; the latest amend (review-iteration cleanup) is force-pushed and re-running.

Diff stat

.../design/detailed-design.md                      |  24 +-
src/compression.c                                  | 583 +++++++++++++++++++--
src/compression.h                                  | 119 +++++
src/compression_header.c                           |  20 +
src/db.c                                           |  89 +-
src/debug.c                                        |   5 +-
src/expire.c                                       |  16 +-
src/networking.c                                   |  12 +
src/object.c                                       |   9 +-
src/server.h                                       |  26 +-
src/unit/test_compression_transient_view.cpp       | 561 ++++++++++++++++++
11 files changed, 1371 insertions(+), 93 deletions(-)

Comment thread src/compression.c Outdated
@ikolomi ikolomi force-pushed the ikolomi/s2-read-path-activate branch from f0e4852 to 7e1edbe Compare June 8, 2026 20:29
@ikolomi ikolomi changed the title [S2.8] Read-path activate: transient-view + force-copy fix (2/3) [S2.8] Read-path activate: transient view (reads) + permanent decompress (writes) (2/3) Jun 8, 2026
@ikolomi ikolomi force-pushed the ikolomi/s2-read-path-activate branch 8 times, most recently from 083dd87 to f43f895 Compare June 8, 2026 23:50
Comment thread src/compression.c Outdated
Comment thread src/compression.c Outdated
Comment thread src/compression.c Outdated
Comment thread src/compression.c Outdated
Comment thread src/compression.c
@ikolomi ikolomi force-pushed the ikolomi/s2-read-path-activate branch 2 times, most recently from 3a82172 to aa642a7 Compare June 9, 2026 08:34
…ess (writes) (2/3)

PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model
from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a
write-path permanent-decompress optimization with re-compression
auto-scheduling via signalModifiedKey().

== Read-path: transient view (R2.5.7 + Appendix E) ==

For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed
into a temp sds, the robj is registered in a per-server side-map and
pinned (incrRefCount); encoding flips to RAW for the duration of the
event-loop iteration. At the next compressionBeforeSleep() boundary,
the kvstore slot is re-fetched: pointer match → pointer-swap restore
(zero recompression cost), pointer mismatch → discard (mutation /
overwrite / expire / COW orphan). Memory bound is uncompressed-baseline
(at most the size the dataset would use without compression).

== Write-path: permanent decompress + signalModifiedKey enqueue ==

For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed
values are PERMANENTLY decompressed in place: free the compressed
buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip
encoding to RAW. No side-map entry, no pin. Refcount stays 1.

Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation
in place. Cheaper than transient view + COW, and avoids the wasted
beforeSleep kvstore re-fetch for a write that always discards.

The post-mutation value is re-enqueued for compression by hooking
signalModifiedKey(). signalModifiedKey is the canonical "logical value
at this key changed" signal — called by every byte-mutating command.
Audit confirmed: every relevant write path fires it; the 3 paths that
don't (RDB load per R2.6.2, worker drain per R2.9.2, module
SETKEY_NO_SIGNAL) are exactly the ones we want to skip.

R2.9.2 invariant intact: the direction is signalModifiedKey →
compression-enqueue, NOT the reverse. Compression infrastructure does
not call signalModifiedKey.

This replaces PR #19's compressionEnqueueCandidate calls in
dbAddInternal/dbSetValue, which fired too eagerly (worker-drain
re-enqueue → wasted predicate; empty initial-state values from
hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place
mutations like APPEND on a permanent-decompressed value (which never
go through dbReplaceValue).

== Implementation ==

src/server.h:
- Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to
  opt-out. Default behavior is "read bytes (decompress on demand)";
  metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST,
  TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check)
  pass LOOKUP_NO_BYTES to opt out.

src/compression.{c,h}:
- Side-map: hashtable.c primitive keyed by robj pointer (default
  pointer-bits hash + pointer-equality compare); lazy allocation;
  early-out when empty.
- compressionMaterializeTransientView(): read-path decompress +
  register + pin + flip.
- compressionPermanentlyDecompress(): write-path decompress + free
  buffer + decRef dict + flip; no side-map, no pin.
- transientViewActive(): O(1) presence check for the deferred-capture
  fix (Appendix E.7).
- compressionBeforeSleep(): iterate side-map, restore via
  pointer-equality stale check, free per entry.
- compressionEnqueueModified(): looked up by signalModifiedKey;
  no-op when feature disabled / key deleted / value ineligible.
- compressionShutdown(): drain side-map (treat as discard).
- testOnlyCompressionDrainTransientViewAsDiscard(),
  testOnlyCompressionTransientViewSize(): test-only helpers.

src/db.c:
- lookupKey() splits decode strategy by LOOKUP_WRITE: write context
  → permanent decompress; read context → transient view.
- signalModifiedKey() now calls compressionEnqueueModified() as the
  third side effect (alongside touchWatchedKey + trackingInvalidateKey).
- Removed compressionEnqueueCandidate from dbAddInternal and
  dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook.
- typeCommand, existsCommand, setKey existence check pass
  LOOKUP_NO_BYTES.

src/object.c, src/expire.c, src/debug.c:
- objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE
  pass LOOKUP_NO_BYTES.

src/networking.c:
- isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true
  (deferred-capture fix per Appendix E.7).

src/unit/test_compression_transient_view.cpp (new):
- 9 gtest cases covering side-map lifecycle, materialize state machine,
  permanent decompress side-effects, decoder-error recovery.

== Verified locally ==

- make -j2 -C src SERVER_CFLAGS=-Werror clean.
- ./runtest --single unit/type/compression 10/10 pass.
- Manual smoke: server with --compression-enabled yes
  --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/
  STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown.

== Not verified locally (CI gates) ==

- gtest unit tests (libgtest-dev not installed).
- clang-format-18.

== Diff stat ==

src/compression.c                            | 438 +++++++++++++++++++--
src/compression.h                            |  93 +++++
src/db.c                                     |  87 +++-
src/debug.c                                  |   5 +-
src/expire.c                                 |  16 +-
src/networking.c                             |  12 +
src/object.c                                 |   9 +-
src/server.h                                 |  26 +-
src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++
9 files changed, 1037 insertions(+), 46 deletions(-)
@ikolomi ikolomi force-pushed the ikolomi/s2-read-path-activate branch from aa642a7 to 0cc3e6c Compare June 9, 2026 08:41
@ikolomi ikolomi requested a review from GilboaAWS June 9, 2026 09:27
@ikolomi ikolomi merged commit 1e7886c into unstable Jun 9, 2026
93 of 94 checks passed
ikolomi added a commit that referenced this pull request Jun 9, 2026
The S2.8 forward-reference in plan.md previously named the
out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'.
This PR closes that out: PR #24 IS that work. Update the entry to
reference PR #24 and capture the as-implemented scope (which paths,
which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding,
which already covered by PR #23's transient-view hook).

S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23,
and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the
@GilboaAWS track: S1.2 (training) — now safe to merge because S2.8
PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer
panic when they encounter compressed values.
ikolomi added a commit that referenced this pull request Jun 9, 2026
* [S2.8] Out-of-process explicit decompression (3/3)

PR 3 of 3 in the S2.8 split. Wires explicit objectGetUncompressedView
calls into the two paths that bypass lookupKey() and thus do not
benefit from PR #23's transient-view hook:

  - rioWriteBulkObject (aof.c) — AOF rewrite child iterates kvstore
    via kvstoreIteratorNext; AOF on-disk format is uncompressed RESP
    per R2.6.5.
  - rdbSaveStringObject (rdb.c) — same iteration pattern; per R2.6.5
    (AOF preamble within RDB) and R2.6.8 (full-sync replication RDB),
    uncompressed bytes are mandatory.

Both today panic on OBJ_ENCODING_COMPRESSED (rioWriteBulkObject:
serverPanic("Unknown string encoding"); rdbSaveStringObject:
serverAssertWithInfo on sdsEncodedObject). The crash is latent only
because no dictionary training has merged yet (S1.2 pending) — once
training lands, BGSAVE/BGREWRITEAOF on a compressed key crashes the
server. PR 3 is a prerequisite for S1.2 keeping unstable continuously
safe.

R2.6.1 (compressed disk RDB encoding via RDB_ENC_COMPRESSED + AUX
dict entries) is intentionally deferred to S3.1/S3.2 (@GilboaAWS).
Until that lands, the conservative choice for ALL RDB targets is
uncompressed, matching pre-feature behavior. Disk RDBs pay the
decompression cost once at save time and re-acquire compression at
load time via the post-S1.2 sweeper.

Dict lifetime safety in the forked child: registry is a fork-time
snapshot; DDicts are immutable; child uses its own copy of the
registry pointers. No interaction with the parent's QSBR generation
counters.

Replication feed (feedReplicationBufferWithObject) does NOT need
wiring — already verified to operate on argv arguments and synthetic
SELECT robjs only, never on kvstore values.

DUMP/RESTORE/MIGRATE (R2.6.7) go through lookupKey() which now
decompresses transparently per S2.8 PR 2; nothing to change.

Tests deferred to the post-S1.2 transparency Tcl harness — that's
where real compressed values exist and BGSAVE/BGREWRITEAOF
round-trips can be exercised end-to-end.

Verified locally: make -j2 -C src SERVER_CFLAGS=-Werror clean both
with BUILD_ZSTD=yes (default) and BUILD_ZSTD=no; compression Tcl
tests 10/10 pass.

* docs(planning): mark S2.8 fully complete via PR #24 (3/3)

The S2.8 forward-reference in plan.md previously named the
out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'.
This PR closes that out: PR #24 IS that work. Update the entry to
reference PR #24 and capture the as-implemented scope (which paths,
which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding,
which already covered by PR #23's transient-view hook).

S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23,
and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the
@GilboaAWS track: S1.2 (training) — now safe to merge because S2.8
PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer
panic when they encounter compressed values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant