[S2.8] Read-path activate: transient view (reads) + permanent decompress (writes) (2/3)#23
Merged
Merged
Conversation
ikolomi
commented
Jun 8, 2026
f0e4852 to
7e1edbe
Compare
083dd87 to
f43f895
Compare
ikolomi
commented
Jun 9, 2026
ikolomi
commented
Jun 9, 2026
ikolomi
commented
Jun 9, 2026
ikolomi
commented
Jun 9, 2026
ikolomi
commented
Jun 9, 2026
3a82172 to
aa642a7
Compare
…ess (writes) (2/3) PR 2 of 3 in the S2.8 split. Wires the read-path transient-view model from PR #21 (R2.5.7) + Appendix E.7 into lookupKey(), AND adds a write-path permanent-decompress optimization with re-compression auto-scheduling via signalModifiedKey(). == Read-path: transient view (R2.5.7 + Appendix E) == For lookupKeyRead* (no LOOKUP_WRITE), compressed values are decompressed into a temp sds, the robj is registered in a per-server side-map and pinned (incrRefCount); encoding flips to RAW for the duration of the event-loop iteration. At the next compressionBeforeSleep() boundary, the kvstore slot is re-fetched: pointer match → pointer-swap restore (zero recompression cost), pointer mismatch → discard (mutation / overwrite / expire / COW orphan). Memory bound is uncompressed-baseline (at most the size the dataset would use without compression). == Write-path: permanent decompress + signalModifiedKey enqueue == For lookupKeyWrite* (LOOKUP_WRITE set, no LOOKUP_NO_BYTES), compressed values are PERMANENTLY decompressed in place: free the compressed buffer, decRef the dict frame-ref, install fresh sds as val_ptr, flip encoding to RAW. No side-map entry, no pin. Refcount stays 1. Subsequent dbUnshareStringValue sees refcount==1 RAW → no COW → mutation in place. Cheaper than transient view + COW, and avoids the wasted beforeSleep kvstore re-fetch for a write that always discards. The post-mutation value is re-enqueued for compression by hooking signalModifiedKey(). signalModifiedKey is the canonical "logical value at this key changed" signal — called by every byte-mutating command. Audit confirmed: every relevant write path fires it; the 3 paths that don't (RDB load per R2.6.2, worker drain per R2.9.2, module SETKEY_NO_SIGNAL) are exactly the ones we want to skip. R2.9.2 invariant intact: the direction is signalModifiedKey → compression-enqueue, NOT the reverse. Compression infrastructure does not call signalModifiedKey. This replaces PR #19's compressionEnqueueCandidate calls in dbAddInternal/dbSetValue, which fired too eagerly (worker-drain re-enqueue → wasted predicate; empty initial-state values from hashTypeLookupWriteOrCreate → wasted predicate) AND missed in-place mutations like APPEND on a permanent-decompressed value (which never go through dbReplaceValue). == Implementation == src/server.h: - Renamed LOOKUP_READ_BYTES → LOOKUP_NO_BYTES, semantic flipped to opt-out. Default behavior is "read bytes (decompress on demand)"; metadata-only lookups (TYPE, EXISTS, OBJECT ENCODING, TTL, PERSIST, TOUCH, DEBUG POPULATE, OBJECT/MEMORY USAGE, setKey existence check) pass LOOKUP_NO_BYTES to opt out. src/compression.{c,h}: - Side-map: hashtable.c primitive keyed by robj pointer (default pointer-bits hash + pointer-equality compare); lazy allocation; early-out when empty. - compressionMaterializeTransientView(): read-path decompress + register + pin + flip. - compressionPermanentlyDecompress(): write-path decompress + free buffer + decRef dict + flip; no side-map, no pin. - transientViewActive(): O(1) presence check for the deferred-capture fix (Appendix E.7). - compressionBeforeSleep(): iterate side-map, restore via pointer-equality stale check, free per entry. - compressionEnqueueModified(): looked up by signalModifiedKey; no-op when feature disabled / key deleted / value ineligible. - compressionShutdown(): drain side-map (treat as discard). - testOnlyCompressionDrainTransientViewAsDiscard(), testOnlyCompressionTransientViewSize(): test-only helpers. src/db.c: - lookupKey() splits decode strategy by LOOKUP_WRITE: write context → permanent decompress; read context → transient view. - signalModifiedKey() now calls compressionEnqueueModified() as the third side effect (alongside touchWatchedKey + trackingInvalidateKey). - Removed compressionEnqueueCandidate from dbAddInternal and dbSetValue (PR #19's wiring); replaced by the signalModifiedKey hook. - typeCommand, existsCommand, setKey existence check pass LOOKUP_NO_BYTES. src/object.c, src/expire.c, src/debug.c: - objectCommandLookup, expire/ttl/persist/touch commands, DEBUG POPULATE pass LOOKUP_NO_BYTES. src/networking.c: - isCopyAvoidPreferred returns 0 when transientViewActive(obj) is true (deferred-capture fix per Appendix E.7). src/unit/test_compression_transient_view.cpp (new): - 9 gtest cases covering side-map lifecycle, materialize state machine, permanent decompress side-effects, decoder-error recovery. == Verified locally == - make -j2 -C src SERVER_CFLAGS=-Werror clean. - ./runtest --single unit/type/compression 10/10 pass. - Manual smoke: server with --compression-enabled yes --compression-threads 2; PING/SET/GET/EXISTS/TYPE/OBJECT ENCODING/ STRLEN/APPEND/TTL/TOUCH/DEL all work; clean shutdown. == Not verified locally (CI gates) == - gtest unit tests (libgtest-dev not installed). - clang-format-18. == Diff stat == src/compression.c | 438 +++++++++++++++++++-- src/compression.h | 93 +++++ src/db.c | 87 +++- src/debug.c | 5 +- src/expire.c | 16 +- src/networking.c | 12 + src/object.c | 9 +- src/server.h | 26 +- src/unit/test_compression_transient_view.cpp | 397 ++++++++++++++++++ 9 files changed, 1037 insertions(+), 46 deletions(-)
aa642a7 to
0cc3e6c
Compare
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
The S2.8 forward-reference in plan.md previously named the out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'. This PR closes that out: PR #24 IS that work. Update the entry to reference PR #24 and capture the as-implemented scope (which paths, which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding, which already covered by PR #23's transient-view hook). S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23, and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the @GilboaAWS track: S1.2 (training) — now safe to merge because S2.8 PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer panic when they encounter compressed values.
ikolomi
added a commit
that referenced
this pull request
Jun 9, 2026
* [S2.8] Out-of-process explicit decompression (3/3) PR 3 of 3 in the S2.8 split. Wires explicit objectGetUncompressedView calls into the two paths that bypass lookupKey() and thus do not benefit from PR #23's transient-view hook: - rioWriteBulkObject (aof.c) — AOF rewrite child iterates kvstore via kvstoreIteratorNext; AOF on-disk format is uncompressed RESP per R2.6.5. - rdbSaveStringObject (rdb.c) — same iteration pattern; per R2.6.5 (AOF preamble within RDB) and R2.6.8 (full-sync replication RDB), uncompressed bytes are mandatory. Both today panic on OBJ_ENCODING_COMPRESSED (rioWriteBulkObject: serverPanic("Unknown string encoding"); rdbSaveStringObject: serverAssertWithInfo on sdsEncodedObject). The crash is latent only because no dictionary training has merged yet (S1.2 pending) — once training lands, BGSAVE/BGREWRITEAOF on a compressed key crashes the server. PR 3 is a prerequisite for S1.2 keeping unstable continuously safe. R2.6.1 (compressed disk RDB encoding via RDB_ENC_COMPRESSED + AUX dict entries) is intentionally deferred to S3.1/S3.2 (@GilboaAWS). Until that lands, the conservative choice for ALL RDB targets is uncompressed, matching pre-feature behavior. Disk RDBs pay the decompression cost once at save time and re-acquire compression at load time via the post-S1.2 sweeper. Dict lifetime safety in the forked child: registry is a fork-time snapshot; DDicts are immutable; child uses its own copy of the registry pointers. No interaction with the parent's QSBR generation counters. Replication feed (feedReplicationBufferWithObject) does NOT need wiring — already verified to operate on argv arguments and synthetic SELECT robjs only, never on kvstore values. DUMP/RESTORE/MIGRATE (R2.6.7) go through lookupKey() which now decompresses transparently per S2.8 PR 2; nothing to change. Tests deferred to the post-S1.2 transparency Tcl harness — that's where real compressed values exist and BGSAVE/BGREWRITEAOF round-trips can be exercised end-to-end. Verified locally: make -j2 -C src SERVER_CFLAGS=-Werror clean both with BUILD_ZSTD=yes (default) and BUILD_ZSTD=no; compression Tcl tests 10/10 pass. * docs(planning): mark S2.8 fully complete via PR #24 (3/3) The S2.8 forward-reference in plan.md previously named the out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'. This PR closes that out: PR #24 IS that work. Update the entry to reference PR #24 and capture the as-implemented scope (which paths, which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding, which already covered by PR #23's transient-view hook). S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23, and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the @GilboaAWS track: S1.2 (training) — now safe to merge because S2.8 PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer panic when they encounter compressed values.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 2 of 3 in the S2.8 split. Wires the transient-view model (R2.5.7 + Appendix E + Appendix E.7) into
lookupKey()for the read path, AND adds a write-path optimization with re-compression auto-scheduling viasignalModifiedKey.The write-path optimization came out of the review discussion: there's no point holding the compressed form across a write since the kvstore slot always ends up pointing to a different robj (COW orphans the original) — the side-map entry is guaranteed to discard at
beforeSleep. So for write-path lookups we permanently decompress instead of registering a transient view; for read-path lookups we use transient view as originally designed. ThesignalModifiedKeyhook then auto-enqueues the post-mutation value for re-compression — the canonical "logical value changed" signal in Valkey, called by every byte-mutating command.Architecture
LOOKUP_WRITE)incrRefCount, flip encoding to RAW. Restored atbeforeSleepvia free pointer-swap (orcompressionPermanentlyDecompresswhen the savings-based cap is exceeded).LOOKUP_WRITEset, noLOOKUP_NO_BYTES)compressionRegistryDecRefthe dict, install fresh sds asval_ptr, flip to RAW. No side-map entry, no pin. Value re-compresses after mutation viasignalModifiedKeyhook.beforeSleepanyway. Doing the equivalent work upfront avoids side-map registration + pin-induced COW + kvstore re-fetch.LOOKUP_NO_BYTES)Memory cap (R2.5.7)
The naive transient view would let peak memory during an event-loop iteration grow as
(1 + ratio) × no-compression baseline— a pathological MGET / Lua script touching many compressed values would OOM. To bound this, materialize is capped against the running compression-savings counter (transient_view_uncompressed_bytes ≤ compression_savings_bytes); cap exhaustion falls back tocompressionPermanentlyDecompressand counts incompression_transient_view_capped_total. The cap is a derived signal — no operator-tunable knob — and its semantic ("transient memory ≤ what compression saved") makes the no-compression baseline an upper bound by construction.File-level changes
src/server.hLOOKUP_READ_BYTES→LOOKUP_NO_BYTES, inverted semantic (opt-out).src/compression.{c,h}compressionMaterializeTransientView;compressionPermanentlyDecompress;transientViewActive;compressionEnqueueModified; realcompressionBeforeSleepbody; savings counter + cap;discardTransientEntryhelper shared by beforeSleep/shutdown/test.src/db.clookupKey()splits decode byLOOKUP_WRITE.signalModifiedKeycallscompressionEnqueueModified. Removed PR #19'scompressionEnqueueCandidatecalls indbAddInternal/dbSetValue(replaced by thesignalModifiedKeyhook which catches bothdbReplaceValueand in-place mutations).typeCommand/existsCommand/setKeyopt out viaLOOKUP_NO_BYTES.src/object.cobjectCommandLookup(OBJECT/MEMORY USAGE) opts out.src/expire.csetExpireGenericCommand,ttlGenericCommand,persistCommand,touchCommandopt out.src/debug.cDEBUG POPULATEexistence check opts out.src/networking.cisCopyAvoidPreferredreturns 0 whentransientViewActive(obj)(Appendix E.7 force-copy fix).src/compression_header.ccreateCompressedObject/freeCompressedObject.src/unit/test_compression_transient_view.cpp(new)design/detailed-design.mdLOOKUP_READ_BYTES→LOOKUP_NO_BYTESswept.Key design points (came out of review)
incrRefCount(value)for the immutable-snapshot guarantee (R2.4.4) doubles as the ABA-safety mechanism — the allocator can't reuse the address while we hold a refcount, so a pointer match atbeforeSleepis decisive. No new robj state required (PR [S2.7] Compression write-path hook #19 thread).buf_lenin the decoder: latent bug in PR [S2.6] Compression decoder path #18 —objectGetUncompressedViewcalledsdslen()on a raw zmalloc'd buffer (val_ptr forOBJ_ENCODING_COMPRESSEDis NOT an sds percreateCompressedObject's contract). ASan caught the heap-buffer-overflow on test-sanitizer-address. The fix derives buffer length from the header (we wrote it ourselves at install time;createCompressedObjectvalidatesbuffer_len == HEADER + compressed_len).signalModifiedKey(notdbAddInternal/dbSetValue): the canonical "logical value at this key changed" signal in Valkey. Audit confirmed every byte-mutating command path fires it; the 3 paths that don't are exactly the ones we want to skip (RDB load → sweep handles per R2.6.2; worker draindbReplaceValueper R2.9.2 — must not signal; moduleSETKEY_NO_SIGNALopt-out). PR [S2.7] Compression write-path hook #19's wiring missed in-place mutations (APPEND/SETRANGE/BITOP); the hook catches them at end-of-mutation.signalModifiedKey → compression-enqueue, not the reverse.entries[]array (review thread [design] Clarify worker concurrency: CDict* passed in job, registry is main-thread only #6). Pauses incremental rehashing → safe to free entry structs inline; finalhashtableEmpty(map, NULL)clears the bucket pointers.What this PR does NOT do (PR 3 of 3 follows)
lookupKey*and need explicitobjectGetUncompressedViewcalls.Test coverage
src/unit/test_compression_transient_view.cpp— 11 gtest cases:EmptyMapNotActivetransientViewActivereturns 0MaterializeRegistersAndFlipsMaterializeNoOpOnNonCompressedMaterializeFailsOnCorruptBufferDrainEmptiesMapAndFreesEverythingMultipleObjsTrackedIndependentlyPermanentlyDecompressFlipsAndFreesPermanentlyDecompressNoOpOnNonCompressedPermanentlyDecompressFailsOnCorruptMemoryCapFallsBackToPermanentMemoryCapStrictlyEnforcedEnd-to-end coverage of
beforeSleep's restore-vs-discard branches requires a live kvstore (not feasible in gtest); that lands as a Tcl integration test once S1.2 (training) enables real compression.Verified locally
make -j2 -C src SERVER_CFLAGS=-Werror— clean (BUILD_ZSTD=yesdefault)../runtest --single unit/type/compression→ 10/10 pass.f43f89531; the latest amend (review-iteration cleanup) is force-pushed and re-running.Diff stat