[S2.8] Out-of-process explicit decompression (3/3)#24
Merged
Conversation
PR 3 of 3 in the S2.8 split. Wires explicit objectGetUncompressedView calls into the two paths that bypass lookupKey() and thus do not benefit from PR #23's transient-view hook: - rioWriteBulkObject (aof.c) — AOF rewrite child iterates kvstore via kvstoreIteratorNext; AOF on-disk format is uncompressed RESP per R2.6.5. - rdbSaveStringObject (rdb.c) — same iteration pattern; per R2.6.5 (AOF preamble within RDB) and R2.6.8 (full-sync replication RDB), uncompressed bytes are mandatory. Both today panic on OBJ_ENCODING_COMPRESSED (rioWriteBulkObject: serverPanic("Unknown string encoding"); rdbSaveStringObject: serverAssertWithInfo on sdsEncodedObject). The crash is latent only because no dictionary training has merged yet (S1.2 pending) — once training lands, BGSAVE/BGREWRITEAOF on a compressed key crashes the server. PR 3 is a prerequisite for S1.2 keeping unstable continuously safe. R2.6.1 (compressed disk RDB encoding via RDB_ENC_COMPRESSED + AUX dict entries) is intentionally deferred to S3.1/S3.2 (@GilboaAWS). Until that lands, the conservative choice for ALL RDB targets is uncompressed, matching pre-feature behavior. Disk RDBs pay the decompression cost once at save time and re-acquire compression at load time via the post-S1.2 sweeper. Dict lifetime safety in the forked child: registry is a fork-time snapshot; DDicts are immutable; child uses its own copy of the registry pointers. No interaction with the parent's QSBR generation counters. Replication feed (feedReplicationBufferWithObject) does NOT need wiring — already verified to operate on argv arguments and synthetic SELECT robjs only, never on kvstore values. DUMP/RESTORE/MIGRATE (R2.6.7) go through lookupKey() which now decompresses transparently per S2.8 PR 2; nothing to change. Tests deferred to the post-S1.2 transparency Tcl harness — that's where real compressed values exist and BGSAVE/BGREWRITEAOF round-trips can be exercised end-to-end. Verified locally: make -j2 -C src SERVER_CFLAGS=-Werror clean both with BUILD_ZSTD=yes (default) and BUILD_ZSTD=no; compression Tcl tests 10/10 pass.
The S2.8 forward-reference in plan.md previously named the out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'. This PR closes that out: PR #24 IS that work. Update the entry to reference PR #24 and capture the as-implemented scope (which paths, which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding, which already covered by PR #23's transient-view hook). S2.8 (Read-path hook) is now fully landed across PRs #21, #22, #23, and #24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the @GilboaAWS track: S1.2 (training) — now safe to merge because S2.8 PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer panic when they encounter compressed values.
This was referenced Jun 9, 2026
ikolomi
added a commit
that referenced
this pull request
Jun 22, 2026
The COW-invariant test enables compression (master=compression,
sweeper, an imported dict) and leaves compressed values in the
keyspace. In --external shared-server mode this pollutes the server
for every subsequent test file. The next file, unit/other, runs
DEBUG DIGEST, whose computeDatasetDigest -> xorObjectDigest ->
mixStringObjectDigest path iterates the kvstore directly and calls
getDecodedObject() on the leftover compressed value, hitting
serverPanic("Unknown encoding type") (object.c) -> server crash ->
the external CI jobs (test-external-{standalone,cluster,nodebug})
fail with "I/O error reading reply".
Add an end-of-file cleanup test that flushes the keyspace (dropping
all compressed frames) and turns the feature off, leaving the shared
server clean. dict-max-versions is left at 16 (not reset to the
default 4) so later suites that import dicts into the shared registry
have headroom.
This is test hygiene only. The underlying DEBUG DIGEST crash on
compressed values is a real, pre-existing product bug (a kvstore-
direct reader that does not decompress, the same class PR #24 fixed
for rdbSaveStringObject / AOF rewrite but missed for the digest
path). It is tracked separately and must be fixed before transparency
mode (full corpus under --compression) lands.
ikolomi
added a commit
that referenced
this pull request
Jun 22, 2026
* test(compression): COW-invariant merge-blocker (S2.13)
Add tests/unit/compression-cow-invariant.tcl, the runtime form of the
R2.4.5 COW-audit checklist and a v1 merge blocker per design §7.2.
The feature relies on two correctness invariants when a value is, or is
being, compressed:
(1) Decompress-before-mutate. lookupKey(...,LOOKUP_WRITE) permanently
decompresses any OBJ_ENCODING_COMPRESSED value to RAW before the
command handler runs, so no in-place byte-mutating command ever
operates on a compressed frame (which would corrupt bytes or hit
getDecodedObject()'s serverPanic — COMPRESSED is neither
sdsEncodedObject nor INT).
(2) Worker-snapshot immutability (R2.4.4). While a worker reads a
value's sds bytes, the value is pinned (refcount>=2), so a
concurrent in-place mutation COWs via dbUnshareStringValue,
leaving the worker's bytes untouched.
Both are code-discipline, not type-enforced. The test exercises them at
runtime across every in-place string mutator (APPEND, SETRANGE, SETBIT,
BITFIELD SET, GETSET, GETDEL, SET-overwrite), a 200-iteration mutation
storm against the live worker pool, and a transient-view + write-path
interaction. Each asserts the result matches the value semantics
computed independently in Tcl, that compression_errors_total stays 0,
and that the mutated value re-compresses and round-trips.
The audit behind the test found no violators: t_string.c, bitops.c,
hyperloglog.c, module.c (DMA write / truncate / OpenKey), and debug.c
all honor the write-lookup decompress + dbUnshareStringValue discipline.
Auto-discovered via the tests/unit/*.tcl glob; skips cleanly under
BUILD_ZSTD=no (no gen-zstd-dict helper). Compression is reached via
COMPRESSION SWEEP FORCE (the deterministic operator trigger) rather than
async enqueue timing; startup raises compression-dict-max-versions and
flushes for --external shared-server headroom.
* test(compression): clean up shared-server state in COW-invariant test
The COW-invariant test enables compression (master=compression,
sweeper, an imported dict) and leaves compressed values in the
keyspace. In --external shared-server mode this pollutes the server
for every subsequent test file. The next file, unit/other, runs
DEBUG DIGEST, whose computeDatasetDigest -> xorObjectDigest ->
mixStringObjectDigest path iterates the kvstore directly and calls
getDecodedObject() on the leftover compressed value, hitting
serverPanic("Unknown encoding type") (object.c) -> server crash ->
the external CI jobs (test-external-{standalone,cluster,nodebug})
fail with "I/O error reading reply".
Add an end-of-file cleanup test that flushes the keyspace (dropping
all compressed frames) and turns the feature off, leaving the shared
server clean. dict-max-versions is left at 16 (not reset to the
default 4) so later suites that import dicts into the shared registry
have headroom.
This is test hygiene only. The underlying DEBUG DIGEST crash on
compressed values is a real, pre-existing product bug (a kvstore-
direct reader that does not decompress, the same class PR #24 fixed
for rdbSaveStringObject / AOF rewrite but missed for the digest
path). It is tracked separately and must be fixed before transparency
mode (full corpus under --compression) lands.
* test(compression): tag COW-invariant test external:skip
The previous cleanup-at-end approach did not fully solve the shared
external-server pollution: this test churns global compression state
and cannot restore a pristine registry (a dict that ever held frames
is not reliably reclaimable today, so it lingers). That broke
unit/type/compression's documented-defaults / clean-registry
assertions, which run after this file in --external mode.
Tag the file external:skip instead. The test deliberately reconfigures
the server globally (master switch, sweeper, imported dict, compressed
frames), which makes it a poor citizen on a shared, externally-managed
server. Its COW-correctness value is fully exercised in normal
(dedicated-server) mode, the primary CI mode, where the server is torn
down between files. This matches the precedent of other stateful tests
(dump, expire, introspection, ...).
Removes the now-unneeded end-of-file cleanup test.
* docs(plan): mark S2.13 (COW audit) + S6.1 (cow-invariant test) done
PR #41 delivers the S2.13 COW audit (no violators) and the S6.1
merge-blocker test tests/unit/compression-cow-invariant.tcl. Record the
audit outcome, the external:skip rationale, and the separately-tracked
DEBUG DIGEST / getDecodedObject crash discovered during the audit
(Phase-B blocker, follow-up PR). Update the Phase-A override note to
reflect S4.x (#39) and S2.13 (#41) as done.
* test(compression): fix stale comments after external:skip change
The startup and cow_configure comments still described running 'against
the same server as the integration suite in --external mode'. With the
file now tagged external:skip that's inaccurate — it runs only against a
fresh dedicated server. Reword to reflect that the startup flush + cap
headroom are defensive clean-slate setup, not shared-server handling.
ikolomi
added a commit
that referenced
this pull request
Jun 22, 2026
* fix(compression): decode compressed values in DEBUG DIGEST readers DEBUG DIGEST and DEBUG DIGEST-VALUE iterate the kvstore directly (computeDatasetDigest via kvstoreIteratorNext; DIGEST-VALUE via dbFind) so a debug command can digest logically-expired keys. They therefore bypass the lookupKey() transient-view decompression hook and observe raw OBJ_ENCODING_COMPRESSED robjs. Both funnel a string value through xorObjectDigest's OBJ_STRING branch into mixStringObjectDigest / xorStringObjectDigest, which called getDecodedObject() — that panics on any encoding other than RAW/EMBSTR/INT (serverPanic 'Unknown encoding type'), crashing the server on the first compressed value. Route the two digest helpers through objectGetUncompressedView (R2.5.2), the single decoder primitive PR #24 used for rdbSaveStringObject / rioWriteBulkObject. For a compressed value the helper decompresses into a scratch sds (read directly — not via getDecodedObject, since incrRefCount on the OBJ_STATIC_REFCOUNT view would panic); otherwise it falls back to getDecodedObject (still handles INT). No-op passthrough for non-compressed values and for BUILD_ZSTD=no. The digest is thus defined over logical (decompressed) bytes: identical whether a value is stored compressed or not. Pre-existing crash on unstable; a blocker for transparency mode (full Tcl corpus under --compression, which runs DEBUG DIGEST extensively). Test tests/unit/compression-debug-readers.tcl (external:skip): compressed values -> DEBUG DIGEST / DIGEST-VALUE don't crash, digest is byte-identical compressed vs decompressed, and a DEBUG RELOAD round-trip preserves both digest and values. * docs(compression): record S2.14 (DEBUG DIGEST reader fix) Mark S2.14 done in plan.md (the follow-up to the S2.13 audit finding) and list DEBUG DIGEST / DIGEST-VALUE among R2.5.7's explicit-decompression kvstore-direct readers in the design doc.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 3 of 3 in the S2.8 split. Wires explicit
objectGetUncompressedViewcalls into the two paths that bypasslookupKey()and thus do not benefit from PR #23's transient-view hook:OBJ_ENCODING_COMPRESSEDsrc/aof.crioWriteBulkObjectserverPanic("Unknown string encoding")src/rdb.crdbSaveStringObjectserverAssertWithInfo(NULL, obj, sdsEncodedObject(obj))Both crash the server. The crash is latent only because no dictionary training has merged yet (S1.2 pending) — once training lands,
BGSAVE/BGREWRITEAOFon a compressed key crashes the server. PR 3 is a prerequisite for S1.2 keepingunstablecontinuously safe.Why these paths bypass
lookupKey()The transient-view model (R2.5.7, PR #23) intercepts inside
lookupKey*— every read path that funnels through it gets decompression for free. Three paths bypasslookupKey*:rewriteAppendOnlyFileRio) — forked child iterateskvstoredirectly viakvstoreIteratorNext. No parent's side-map.RDB_ENC_COMPRESSED. No decompression needed, but this encoding isn't yet implemented (S3.1/S3.2).Approach
Both new branches decompress on the fly via
objectGetUncompressedView(o, &scratch, &view)and write the uncompressed bytes. Per R2.6.5 (AOF) and R2.6.8 (full-sync replication RDB), uncompressed is mandatory.For RDB, this is correct for replication and correct-but-suboptimal for disk RDB — disk RDBs lose their compression on save and re-acquire it via the post-S1.2 sweeper at load time. R2.6.1's compressed disk RDB encoding (
RDB_ENC_COMPRESSED+ AUX dict entries) is the proper optimization, scheduled for S3.1/S3.2 (@GilboaAWS, persistence subsystem).Dict lifetime safety in the forked child
ZSTD_DDictinstances are immutable after creation.ZSTD_DCtxincompression.cis lazy-allocated per process — child gets its own.What this PR does NOT touch
feedReplicationBufferWithObject(replication feed) — already verified in PR [design] R2.5.7 — transient decompression model + Appendix E #21's E.2 audit to operate onargvarguments and syntheticSELECTrobjs only, never on kvstore values. No changes.DUMP/RESTORE/MIGRATE(R2.6.7) — go throughlookupKey()which now decompresses transparently per S2.8 PR 2.Testing
Deferred to the post-S1.2 transparency Tcl harness. The transparency test (
tests/unit/type/compression.tclonce S1.2 enables real compression) will exercise BGSAVE/BGREWRITEAOF on compressed values end-to-end. Until then, this PR's correctness is structural — both new branches mirror the encoder/decoder contract proven out in PRs #18, #22, #23.Verified locally
make -j2 -C src SERVER_CFLAGS=-Werrorclean (BUILD_ZSTD=yesdefault).make -j2 -C src SERVER_CFLAGS=-Werror BUILD_ZSTD=noclean (theOBJ_ENCODING_COMPRESSEDbranch is dead code in non-ZSTD builds sincecreateCompressedObjectitself is gated; the function call still compiles becauseobjectGetUncompressedViewis unconditionally declared)../runtest --single unit/type/compression→ 10/10 pass.Diff stat