Inline compression: S2.1 — header codec + compressed-object alloc/free#8
Merged
Conversation
Phase 1 / S2.1 (per .agents/planning/realtime-data-compression/
implementation/plan.md). Fills in the two stub bodies left by Phase 0
in src/compression_header.c (createCompressedObject and
freeCompressedObject) and reserves the OBJ_ENCODING_COMPRESSED slot
in the encoding enum so the rest of the hot path (S2.5–S2.8) can
build against a working data type.
What lands:
- server.h: OBJ_ENCODING_COMPRESSED = 12 (per design §5.1). The
next free slot below 16; 4-bit encoding field still has room.
- compression_header.c:
* createCompressedObject(buffer, len) — validates the buffer
starts with a recognized algorithm magic and that
len == COMPRESSION_HEADER_SIZE + compressed_len, then
installs the buffer into a fresh OBJ_STRING robj without a
memcpy. The zero-copy ownership contract documented in
compression_header.h is load-bearing for §2.11 R2.11.4
(workers never touch robj) and the §5.2 worker-shrink story.
Bumps the dictionary refcount via compressionRegistryIncRef
for ZSTD frames whose dict_id is not the no-dict sentinel.
On any rejection path returns NULL without consuming the
caller's buffer (caller retains ownership and reclaims via
zfree).
* freeCompressedObject(o) — symmetric: drops the dict
reference (R2.3.4), zfrees the buffer. Defensive against
header corruption: still releases storage, just cannot
identify which dict_id to DecRef.
- object.c: freeStringObject dispatches to freeCompressedObject
when encoding is OBJ_ENCODING_COMPRESSED. Adds
`#include "compression_header.h"` to make the helper visible.
- src/unit/test_compression_header.cpp (new): gtest unit tests
for the codec round-trip, alg_magic rejection, alloc/free
lifecycle. Covers boundary values (0, UINT32_MAX),
null-out-pointer decode, all-zero corruption pattern, and
each createCompressedObject rejection path.
Verified:
- `make -j` clean (server links).
- `./runtest --single unit/type/compression` 10/10 still passes.
- Runtime smoke: SET/GET/OBJECT ENCODING work; embstr/raw values
behave as before (compression path is dormant — the registry
has no active dict, so write-path never installs a compressed
robj yet).
Not verified locally (CI will validate on push):
- gtest unit tests — local environment lacks libgtest-dev /
libgmock-dev. The new test file follows the established
pattern from test_endianconv.cpp / test_object.cpp and uses
standard fixtures.
- clang-format-18 — not installed locally. New code follows the
same indentation and brace conventions as the surrounding
Phase 0 files in src/compression*.c; the test file follows
src/unit/.clang-format (LLVM-style with 4-space indent).
Branch: ikolomi/s2-hot-path
ikolomi
commented
May 25, 2026
CI failures (root causes):
- gtest unit test wouldn't compile because compression_header.h
used `_Static_assert` (C11-only). The header is included from
src/unit/test_compression_header.cpp which is C++. Replaced with
`static_assert` which is C11 (via <assert.h>) AND C++11 (keyword);
works in both languages. Added explicit `#include <assert.h>`.
- test_compression_header.cpp referenced COMPRESSION_DICT_ID_NONE
without including compression_registry.h. Added the include
inside the existing `extern "C"` block.
- clang-format-check: the multi-line trailing comment on the new
OBJ_ENCODING_COMPRESSED define disturbed column alignment in the
block. Restructured to a single-line trailing comment that points
to compression_header.h for the layout. The other defines in the
block now align to the COMPRESSED column (one extra space per
line — matches what clang-format-18 wants).
PR #8 review comments addressed:
1. (line 61, compression_header.c) "v1 only supports string
objects, why not reflect it in the function name?" Reviewer's
concern: createCompressedObject doesn't tell readers what kind
of object is created. Resolution chosen: make the API
type-polymorphic by adding `int type` as the first argument to
createCompressedObject, so the call site is self-documenting:
createCompressedObject(OBJ_STRING, buffer, len);
This addresses the readability concern (type is explicit at
the call site) without proliferating per-type functions
(createCompressedStringObject, createCompressedHashObject, ...)
which would all be identical except for the type constant.
v1 only ever emits with OBJ_STRING via the eligibility
predicate; v2+ can extend without API changes to this layer.
freeCompressedObject stays type-agnostic — the free path
(decode header → DecRef dict → zfree buffer) is independent
of the source object's type. Single helper services every
type using OBJ_ENCODING_COMPRESSED.
2. (line 64, test) EncodeDecodeBoundaryValues was redundant with
EncodeDecodeRoundTrip. Folded the boundary cases (zero,
UINT32_MAX) into the round-trip test as additional asserts. Net
-30 lines of test, no coverage loss.
3. (line 127, test) DecodeAcceptsNullOutPointer existed only to
verify a contract no real caller uses. Tightened the API:
compressionHeaderDecode now requires non-null `out`, enforced
via serverAssert. Test removed.
4. (line 104, compression_header.c) "Is silently ignoring decode
errors common practice?" No — Valkey style is to trust the
data structure (see freeStringObject's RAW path, which doesn't
defensively validate the sds pointer). Replaced the silent
fallthrough with serverAssert(rc == 0): if a header fails to
decode at free time, that's memory corruption and we want to
fail loud. Also dropped the redundant null/encoding checks in
freeCompressedObject and createCompressedObject — the caller
(freeStringObject) already gates by encoding, and a null robj
is a bug everywhere in Valkey, not specifically here.
5. (deferred per reviewer's own instruction in earlier review)
the `compressionJob.key` extra-lookup concern is documented
for a follow-up PR around the install-path wire-up
(S2.5/S2.7).
Verified locally:
- `make -j` builds clean.
- `./runtest --single unit/type/compression` 10/10 passes.
Not verified locally (CI will validate):
- gtest unit tests (no libgtest-dev/libgmock-dev locally).
- clang-format-18 spacing details (no clang-format-18 locally).
Follow-up to 65e4eec. Reviewer flagged that we used `serverAssert` for the freeCompressedObject decode failure but `return NULL` for several conditions in createCompressedObject — without a coherent rule for when each is appropriate. Scanned src/ to derive the convention and applied it consistently. Convention (documented in .agents/planning/realtime-data-compression/research/error-handling-conventions.md with full evidence and examples): - Programmer error / contract violation / unreachable code → serverAssert / serverPanic. Aborts loud. ~653 / ~231 sites in src/. - External untrusted input (RDB load, RESP parse, network) → return NULL/-1; caller logs via rdbReportCorruptRDB or sets a client error. - Resource failure (zmalloc OOM) → try*-prefixed function returns NULL; callee logs LL_WARNING. - Recoverable runtime condition (queue full, etc.) → return code + counter increment; optional log. - Trusted internal input (factory called from internal-only path) → no validation. Trust the caller. (Pattern in createObject, freeStringObject, lookupKey* etc.) Audit applied to S2.1: createCompressedObject(int type, void *buffer, size_t buffer_len) - buffer != NULL: serverAssert (was: return NULL) Caller-controlled input; never legitimately NULL. - buffer_len >= COMPRESSION_HEADER_SIZE: serverAssert (was: return NULL) Caller allocates HEADER + compressed_len; smaller is impossible by construction. - Bad alg_magic: keep return NULL (was: return NULL) This is the ONLY legitimate rejection path. Disk bytes can present any 4-byte sequence under corruption. RDB load callers wrap with rdbReportCorruptRDB; internal callers assert the result is non-NULL. - buffer_len == HEADER + h.compressed_len: serverAssert (was: return NULL) Caller wrote the header value to match its own allocation. Mismatch is impossible by construction. Test changes: - CreateCompressedObjectRejectsNull: removed (now hits serverAssert; would crash gtest process). - CreateCompressedObjectRejectsTooSmall: removed (same). - CreateCompressedObjectRejectsSizeMismatch: removed (same). - CreateCompressedObjectRejectsBadMagic: kept; added inline comment explaining why this is the only rejection-path test. 9 tests → 6 tests. The removed tests exercised programmer-error paths under the old convention; under the new convention those paths abort the process by design, so a test would crash rather than verify behavior. Documentation: - New file: research/error-handling-conventions.md (204 lines). Source-grounded analysis of the convention with examples from src/object.c, src/rdb.c, src/server.c. Section 5 has the application table for the compression feature, mapping each error site to the appropriate pattern. - compression_header.h: createCompressedObject doc-comment now states explicitly that contract violations raise serverAssert, and that NULL return is reserved for unrecognized alg_magic (the one corruption-prone field). Verified locally: - `make -j` builds clean. - `./runtest --single unit/type/compression` 10/10 passes.
Three CI failure root causes pulled out from logs on commit 9419b7a: 1. Spellcheck: error-handling-conventions.md quoted Valkey's rdb.c verbatim, including the `clen` identifier ("compressed length" abbreviation). The typos config already allowed `clen` in [type.c] files but not in markdown. Promoted `clen` to default.extend-words so it's accepted in code excerpts in any file type. 2. test-ubuntu-latest / build-32bit / code-coverage: `atomic_size_t does not name a type` from compression_registry.h:76. The Phase 0 stub registry header uses `<stdatomic.h>` and the `atomic_size_t` typedef, both C-only. test_compression_header.cpp included it from C++ via the `extern "C"` block; the C++ compiler still parses the header and fails on the typedef. Fix: drop the `#include "compression_registry.h"` from the test file. The test only used COMPRESSION_DICT_ID_NONE (a single `#define 0u`), which we now mirror locally as `kNoDictId = 0u` with a comment pointing at the registry header. Architectural note logged in the test's comment: compression_ registry.h needs to follow the queues.h pattern (use `_Atomic(T)` keyword syntax without `<stdatomic.h>` in the public header) before future gtest tests can include it. Tracked as a follow-up for S1's owner; out of scope for this PR. 3. build-macos-latest: same root cause — compression_registry.h included from C++ pulls in libc++'s `<stdatomic.h>` shim which conflicts with wrappers.h's `_Atomic` macro definition. Same fix resolves it. Plus a 32-bit build sign-compare error: `ASSERT_EQ(OBJ_STRING, o->type)` mixes a signed `int` constant (from #define) with an unsigned bitfield. Cast both sides to `(unsigned)` matching the existing pattern in test_object.cpp (ASSERT_EQ(o->encoding, (unsigned)OBJ_ENCODING_EMBSTR)). Verified locally: - `make -j` builds clean. - `./runtest --single unit/type/compression` 10/10 passes. Not verified locally (CI will validate): - gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).
Signed-off-by: ikolomi <ikolomin@amazon.com>
GilboaAWS
added a commit
that referenced
this pull request
Jun 21, 2026
…ount
Training metrics (INFO compression) — 9 fields so operators can
observe the pipeline and tests can assert on deterministic state:
compression_training_state / _scans_started / _samples_collected /
_successes / _failures / _last_duration_ms / _cooldown_until_ms /
_current_db / _dicts_retired
Also adds compression_training_debug_frame_refs (active retiring dict's
frame_refs) — used by the retirement test to assert the ref count is
exactly right.
Integration tests (tests/integration/compression-training.tcl):
- First-training trigger fires and produces a usable dict
- After training, new writes get compressed
- Training does NOT fire below min keys
- Training aborts + cooldown on ineligible values
- Training scans across multiple databases
- Dict retirement: compress 20 keys -> decompression mode ->
read all -> dict GC'd
Fix: install-path frame-ref double-count (compression_workers.c).
The retirement test surfaced a real bug: 20 compressed keys produced
40 frame_refs. createCompressedObject already takes the dict frame-ref
(reads dict_id from the header alg_meta, pairs with the DecRef in
freeCompressedObject / releaseCompressedBuffer). compressionInstall
was taking a SECOND ref via job->dict_id — a leftover from a design
checklist that predated PR #8 moving the IncRef into
createCompressedObject. Net effect: +2 on install, -1 on
free/decompress, leaking one ref per value and pinning the dict from
GC forever. Removed the redundant IncRef; the object-lifecycle model
(create/free keyed on header alg_meta) is the single source of truth.
Adds a periodic compressionRegistryTryGc() call in compressionCron so
retiring dicts whose frame_refs reached 0 are reclaimed even when the
last DecRef didn't itself trigger GC.
GilboaAWS
added a commit
that referenced
this pull request
Jun 21, 2026
…ount
Training metrics (INFO compression) — 9 fields so operators can
observe the pipeline and tests can assert on deterministic state:
compression_training_state / _scans_started / _samples_collected /
_successes / _failures / _last_duration_ms / _cooldown_until_ms /
_current_db / _dicts_retired
Also adds compression_training_debug_frame_refs (active retiring dict's
frame_refs) — used by the retirement test to assert the ref count is
exactly right.
Integration tests (tests/integration/compression-training.tcl):
- First-training trigger fires and produces a usable dict
- After training, new writes get compressed
- Training does NOT fire below min keys
- Training aborts + cooldown on ineligible values
- Training scans across multiple databases
- Dict retirement: compress 20 keys -> decompression mode ->
read all -> dict GC'd
Fix: install-path frame-ref double-count (compression_workers.c).
The retirement test surfaced a real bug: 20 compressed keys produced
40 frame_refs. createCompressedObject already takes the dict frame-ref
(reads dict_id from the header alg_meta, pairs with the DecRef in
freeCompressedObject / releaseCompressedBuffer). compressionInstall
was taking a SECOND ref via job->dict_id — a leftover from a design
checklist that predated PR #8 moving the IncRef into
createCompressedObject. Net effect: +2 on install, -1 on
free/decompress, leaking one ref per value and pinning the dict from
GC forever. Removed the redundant IncRef; the object-lifecycle model
(create/free keyed on header alg_meta) is the single source of truth.
Adds a periodic compressionRegistryTryGc() call in compressionCron so
retiring dicts whose frame_refs reached 0 are reclaimed even when the
last DecRef didn't itself trigger GC.
This was referenced Jun 21, 2026
GilboaAWS
added a commit
that referenced
this pull request
Jun 21, 2026
…ount
Training metrics (INFO compression) — 9 fields so operators can
observe the pipeline and tests can assert on deterministic state:
compression_training_state / _scans_started / _samples_collected /
_successes / _failures / _last_duration_ms / _cooldown_until_ms /
_current_db / _dicts_retired
Also adds compression_training_debug_frame_refs (active retiring dict's
frame_refs) — used by the retirement test to assert the ref count is
exactly right.
Integration tests (tests/integration/compression-training.tcl):
- First-training trigger fires and produces a usable dict
- After training, new writes get compressed
- Training does NOT fire below min keys
- Training aborts + cooldown on ineligible values
- Training scans across multiple databases
- Dict retirement: compress 20 keys -> decompression mode ->
read all -> dict GC'd
Fix: install-path frame-ref double-count (compression_workers.c).
The retirement test surfaced a real bug: 20 compressed keys produced
40 frame_refs. createCompressedObject already takes the dict frame-ref
(reads dict_id from the header alg_meta, pairs with the DecRef in
freeCompressedObject / releaseCompressedBuffer). compressionInstall
was taking a SECOND ref via job->dict_id — a leftover from a design
checklist that predated PR #8 moving the IncRef into
createCompressedObject. Net effect: +2 on install, -1 on
free/decompress, leaking one ref per value and pinning the dict from
GC forever. Removed the redundant IncRef; the object-lifecycle model
(create/free keyed on header alg_meta) is the single source of truth.
Adds a periodic compressionRegistryTryGc() call in compressionCron so
retiring dicts whose frame_refs reached 0 are reclaimed even when the
last DecRef didn't itself trigger GC.
GilboaAWS
added a commit
that referenced
this pull request
Jun 21, 2026
compressionInstall took a second dict frame-ref via job->dict_id on top of the IncRef that createCompressedObject already takes (reading dict_id from the frame header's alg_meta). The matching DecRef in freeCompressedObject / releaseCompressedBuffer only fires once, so every compressed value leaked one ref: +2 on install, -1 on free/decompress. The leaked ref kept the dict's frame_refs > 0 forever, so a retiring dict could never satisfy canFree() and was never reclaimed by GC. Root cause: an early design checklist listed an explicit install-time compressionRegistryIncRef as a step. That predated PR #8 moving the IncRef into createCompressedObject. When the real install path landed (S2.7) it followed the stale checklist and added the now-redundant second ref. Fix: remove the redundant IncRef. The object-lifecycle model is the single source of truth — createCompressedObject (sole producer of compressed robjs) takes the ref, freeCompressedObject / releaseCompressedBuffer release it, all keyed on the header alg_meta. Verified by the dict-retirement integration test (PR #40, stacked on this): 20 compressed keys now produce exactly 20 frame_refs (was 40), and the dict is reclaimed after all values decompress.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1 / S2.1 (per .agents/planning/realtime-data-compression/ implementation/plan.md). Fills in the two stub bodies left by Phase 0 in src/compression_header.c (createCompressedObject and freeCompressedObject) and reserves the OBJ_ENCODING_COMPRESSED slot in the encoding enum so the rest of the hot path (S2.5–S2.8) can build against a working data type.
What lands:
server.h: OBJ_ENCODING_COMPRESSED = 12 (per design §5.1). The next free slot below 16; 4-bit encoding field still has room.
compression_header.c:
object.c: freeStringObject dispatches to freeCompressedObject when encoding is OBJ_ENCODING_COMPRESSED. Adds
#include "compression_header.h"to make the helper visible.src/unit/test_compression_header.cpp (new): gtest unit tests for the codec round-trip, alg_magic rejection, alloc/free lifecycle. Covers boundary values (0, UINT32_MAX), null-out-pointer decode, all-zero corruption pattern, and each createCompressedObject rejection path.
Verified:
make -jclean (server links)../runtest --single unit/type/compression10/10 still passes.Not verified locally (CI will validate on push):
Branch: ikolomi/s2-hot-path