Ikolomi/s2 eligibility by ikolomi · Pull Request #9 · ikolomi/valkey

ikolomi · 2026-05-26T12:05:13Z

Inline compression: S2.2 — eligibility predicate
Implements R2.2 / Q6 compressionIsEligible(robj *o, const sds key),
replacing the Phase 0 stub that returned 0 for every value.

The predicate has six gates, evaluated in cheapest-first order so the
master switch short-circuits early when the feature is disabled:

Master switch (server.compression_enabled).
Type + encoding gate. STRING values only (R2.2, Q6c). Of the four
string encodings, only OBJ_ENCODING_RAW is a candidate:
- INT — already memory-optimal.
- EMBSTR — ≤44 B, header overhead erases any savings (Threads
  Wishlist valkey-io/valkey#17 / Improve slot migration reliability valkey-io/valkey#21 explicitly excluded).
- COMPRESSED — defense-in-depth no-op for double-compress.
Refcount gate. Shared RESP constants are never installed in a db
(lookupKey asserts this); we mirror the assertion as a safety
check.
Size bounds. compression-min-value-size (lower) prevents wasting
CPU on values too small to recoup the per-value header (~16 B).
compression-max-value-size (upper, 0 = disabled) caps worst-case
sync-decompression latency on the main thread (~1 µs/KB).
Hot-key skip — POLICY-AWARE per the corrected R2.2:
- LRU and noeviction: robj->lru is seconds-based. seconds-based access time.
  lru_getIdleSecs(o->lru) returns seconds-since-last-touch
  (read OR write — the lru field is touched on every access,
  gated only by LOOKUP_NOTOUCH and fork). v1 cannot
  distinguish read-recency from write-recency from this
  single signal, so a single threshold —
  compression-min-idle-seconds — gates eligibility on the
  "value has been quiet long enough to be worth compressing"
  property.
- LFU: robj->lru encodes a freq counter (no per-second
  timestamp). Time-based knobs are inactive in this mode.
  Apply compression-lfu-threshold against the freq counter via
  lfu_getFrequency(), which mirrors the standard Valkey
  decay-on-read pattern (objectGetLFUFrequency in src/object.c).
  The signature is robj *o (not const robj *) because of
  this in-place decay.
Incompressible-keys retry guard. Stubbed as "always retry-eligible"
in S2.2; S2.3 lands the side hashtable and wires
compressionRetryEligible(key) here.

Test coverage in src/unit/test_compression_eligibility.cpp (16 tests,
auto-discovered by src/unit/Makefile's wildcard *.cpp):

Master switch off / on.
Each rejection branch:
- non-STRING type
- INT, EMBSTR, COMPRESSED encodings
- shared refcount
- size below min, above max
Size-bound boundary cases (at exact min, at exact max,
max=0 disables upper bound).
LRU branch:
- cold key (idle >= min_idle) — accepted
- zero thresholds — accept immediately
noeviction policy: same code path as LRU per R2.2.
LFU branch:
- freq at threshold — rejected (>= comparison)
- freq above threshold — rejected
- freq below threshold — accepted
- time-based knobs are inactive even at INT_MAX values.

The fixture saves and restores server.compression_* and
server.maxmemory_policy, and re-syncs lrulfu's cached
is_using_lfu_policy boolean via lrulfu_updateClockAndPolicy() on
both setup and teardown so tests don't leak policy state into each
other.

Verified locally:

make -j builds clean.
./runtest --single unit/type/compression 10/10 passes (the Phase 0
integration fixture exercises feature-off semantics; with
compression-enabled still 0 by default, eligibility is never
consulted in the integration server).

Not verified locally (CI will validate):

gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).

…key skip Two corrections discovered while preparing S2.2 implementation: 1. The two design docs had drifted on minor wording (idea-honing.md said `obj->encoding == RAW` and `obj->refcount != SHARED`; detailed-design.md said `OBJ_ENCODING_RAW` / `OBJ_SHARED_REFCOUNT`). Predicates now match exactly across both docs, using the actual C constants from src/server.h. Per the agreed convention: keep the exact predicate inline in both docs (different audiences both need it readable in place) rather than a cross-reference. 2. The Thread #18/#19 walkthrough resolutions made a claim that doesn't match the existing Valkey source: "the existing LRU field already provides the signal for every eviction policy" / time-based checks "work uniformly across LRU, LFU, and noeviction" / `estimateObjectIdleTime()` is "a reliable read-hotness signal universally." Reading src/lrulfu.h: - LRU and noeviction policies: robj->lru is seconds-based. lru_idle_secs(o) returns real seconds. Time-based thresholds work as designed. - LFU policy: robj->lru encodes 16 bits "last eval time in minutes" + 8 bits approximate freq counter. There is no per-second access timestamp. lru_idle_secs(o) would misinterpret the bits. The function `estimateObjectIdleTime()` referenced by the design does not exist in current Valkey; the closest available helper is `lrulfu_getIdleness()` which returns `UINT8_MAX - freq` in LFU mode (a 0..255 freq-derived heuristic, NOT seconds). The fix is policy-aware checks, not policy-uniform: - LRU and noeviction: apply settle-seconds AND min-idle-seconds against lru_idle_secs(o). - LFU: skip the time-based thresholds entirely (the metric is wrong unit). Apply compression-lfu-threshold against the freq counter — already in the predicate as the LFU branch. The dual-knob operator surface (`compression-settle-seconds` and `compression-min-idle-seconds`) is preserved across modes; in LRU/noeviction both knobs apply to the same metric (since robj->lru is touched on every read AND write — v1 cannot distinguish source) so the effective threshold is `max(settle, min_idle)`. Operators get to express two intents. The Thread #18/#19 resolutions stand as written in DESIGN_TODO.md (audit trail; the decision to use the existing LRU field rather than add a new write-time field is unchanged); only the implementation interpretation in the live design docs is refined. Updated: - detailed-design.md §2.2 R2.2 predicate + new explanatory paragraph below it. - detailed-design.md §2.12 config table: settle-seconds and min-idle-seconds descriptions now correctly note "Inactive in LFU mode." - idea-honing.md Q6 baseline filter bullet (rewritten as "policy-aware" with sub-bullets per policy). - idea-honing.md Q6 consolidated predicate (now identical to detailed-design.md §2.2). - idea-honing.md Q6a answer: rewritten with the policy-aware framing; cross-references "S2.2 implementation review" so future readers can trace this refinement.

Implements R2.2 / Q6 `compressionIsEligible(robj *o, const sds key)`, replacing the Phase 0 stub that returned 0 for every value. The predicate has six gates, evaluated in cheapest-first order so the master switch short-circuits early when the feature is disabled: 1. Master switch (server.compression_enabled). 2. Type + encoding gate. STRING values only (R2.2, Q6c). Of the four string encodings, only OBJ_ENCODING_RAW is a candidate: - INT — already memory-optimal. - EMBSTR — ≤44 B, header overhead erases any savings (Threads #17 / #21 explicitly excluded). - COMPRESSED — defense-in-depth no-op for double-compress. 3. Refcount gate. Shared RESP constants are never installed in a db (lookupKey asserts this); we mirror the assertion as a safety check. 4. Size bounds. compression-min-value-size (lower) prevents wasting CPU on values too small to recoup the per-value header (~16 B). compression-max-value-size (upper, 0 = disabled) caps worst-case sync-decompression latency on the main thread (~1 µs/KB). 5. Hot-key skip — POLICY-AWARE per the corrected R2.2: - LRU and noeviction: robj->lru is seconds-based. Apply both compression-settle-seconds (recent-write proxy) and compression-min-idle-seconds (recent-read proxy) against lru_getIdleSecs(o->lru). v1 cannot distinguish source (robj->lru is touched on read AND write); the dual surface lets operators express two intents that share an underlying signal. Effective threshold is max(settle, min_idle). - LFU: robj->lru encodes a freq counter (no per-second timestamp). Time-based knobs are inactive in this mode. Apply compression-lfu-threshold against the freq counter via lfu_getFrequency(), which mirrors the standard Valkey decay-on-read pattern (objectGetLFUFrequency in src/object.c). The signature is `robj *o` (not `const robj *`) because of this in-place decay. 6. Incompressible-keys retry guard. Stubbed as "always retry-eligible" in S2.2; S2.3 lands the side hashtable and wires compressionRetryEligible(key) here. Test coverage in src/unit/test_compression_eligibility.cpp (16 tests, auto-discovered by src/unit/Makefile's `wildcard *.cpp`): - Master switch off / on. - Each rejection branch: * non-STRING type * INT, EMBSTR, COMPRESSED encodings * shared refcount * size below min, above max - Size-bound boundary cases (at exact min, at exact max, max=0 disables upper bound). - LRU branch: * recent touch (idle < settle) — rejected * idle between settle and min_idle (max wins) — rejected * cold key (idle >= max(settle, min_idle)) — accepted * zero thresholds — accept immediately - noeviction policy: same code path as LRU per R2.2. - LFU branch: * freq at threshold — rejected (>= comparison) * freq above threshold — rejected * freq below threshold — accepted * time-based knobs are inactive even at INT_MAX values. The fixture saves and restores `server.compression_*` and `server.maxmemory_policy`, and re-syncs lrulfu's cached `is_using_lfu_policy` boolean via `lrulfu_updateClockAndPolicy()` on both setup and teardown so tests don't leak policy state into each other. Verified locally: - `make -j` builds clean. - `./runtest --single unit/type/compression` 10/10 passes (the Phase 0 integration fixture exercises feature-off semantics; with compression-enabled still 0 by default, eligibility is never consulted in the integration server). Not verified locally (CI will validate): - gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally). S2.3 (incompressible-keys hashtable) wires the last branch and ships its own gtest coverage. After that, S2.4–S2.10 wire the rest of the hot path.

…ed form Per review feedback during S2.2 review: the previous form encoded the policy split using short-circuit booleans — && (lfu_mode || lru_idle_secs(obj) >= compression-settle-seconds) && (lfu_mode || lru_idle_secs(obj) >= compression-min-idle-seconds) && (!lfu_mode || lfu_freq(obj) < compression-lfu-threshold) — which is logically correct but reads awkwardly. Three lines mention `lfu_mode` (twice unprimed, once primed); the reader has to mentally short-circuit twice to see that line 1+2 fire only in LRU/noeviction and line 3 fires only in LFU. It also looks at first glance like the predicate might be using `compression-lfu-threshold` as an LRU-mode threshold. Replaced with a branched helper that mirrors how the C implementation's if/else branches: && hot_key_check(obj) // policy-aware where hot_key_check(obj) is: if lfu_mode: lfu_freq(obj) < compression-lfu-threshold else: lru_idle_secs(obj) >= compression-settle-seconds AND lru_idle_secs(obj) >= compression-min-idle-seconds Same behavior; the implementation in src/compression.c (`compressionIsEligible()`) already uses this exact branching shape — the docs now match it visually. Updated: - detailed-design.md §2.2 R2.2 predicate. - idea-honing.md Q6 consolidated predicate. Both docs were already aligned (per the previous predicate-alignment commit); they remain identical with this rewrite. The explanatory paragraph below the §2.2 predicate (LRU vs LFU lru-field encoding) already covers the rationale and is unchanged.

Removes one of the two time-based hot-key skip knobs. The eligibility predicate's LRU/noeviction branch now compares against a single threshold (`compression-min-idle-seconds`) instead of two. Why this matters ---------------- The dual-knob surface — `compression-settle-seconds` ("recent-write protection") and `compression-min-idle-seconds` ("recent-access protection") — was introduced via the Thread #18 walkthrough resolution on the theory that operators benefit from being able to express two different intents. In v1 reality this is documentation theater. Both knobs compare against the same metric (`lru_idle_secs(o)`) because Valkey's `robj->lru` field is touched on every read AND every write — gated only by `LOOKUP_NOTOUCH` and fork. v1 cannot distinguish read-recency from write-recency from this single signal. The math always reduces to: eligible iff lru_idle_secs(o) >= max(settle, min_idle) Setting `settle=10, min_idle=120` is identical to the single-knob `min_idle=120`. Tested every scenario I could construct (different time scales, sweep cron sequences, heterogeneous workloads, forward- compat with v2) — none give the dual surface operationally distinct behavior in v1. The only non-trivial argument for keeping both was forward-compat: if v2 adds a per-object write-time field, the dual knob becomes meaningful. But adding a config in v2 is non-breaking; existing operators on v1 see no change. Removing later is harder than adding later. Plus the dual surface is an active footgun: operators tuning the two knobs differently expecting different effects get a confusing no-difference outcome. PR #1 Thread #3 specifically pushed back on "too many knobs" — that pressure applies here. Per YAGNI, ship v1 with the single knob. v2 reintroduces a write-time-specific knob non-breakingly when per-object write-time tracking lands. Code changes ------------ - src/server.h: remove `compression_settle_seconds` field. - src/config.c: remove the `createIntConfig` registration. - src/compression.c: drop the second `idle_secs >= settle` check in `compressionIsEligible`'s LRU branch. Updated the comment block to reflect single-signal reality. - src/unit/test_compression_eligibility.cpp: - Drop `LruRejectsBetweenSettleAndMinIdle` (test of dual-knob max-wins behavior — no longer applicable). - Replace `LruRejectsRecentTouch` / `LruAcceptsBeyondBothThresholds` / `LruZeroThresholdsAcceptImmediately` with single-knob equivalents (`LruRejectsRecentTouch`, `LruAcceptsBeyondThreshold`, `LruAtThresholdAcceptsBoundary`, `LruZeroThresholdAcceptsImmediately`). - Drop `compression-settle-seconds` from `LfuTimeKnobsAreInactive` and rename to `LfuTimeKnobIsInactive`. - tests/unit/type/compression.tcl: drop the `compression-settle-seconds` config-default assertion; update comment from "Advanced (11)" to "Advanced (10)". Doc changes ----------- - detailed-design.md §2.2 R2.2 predicate: hot_key_check helper now has one comparison in the LRU/noeviction branch instead of two. The rationale paragraph below the predicate explains the v1 single- signal reality and the YAGNI motivation for dropping the second knob; future v2 reintroduction noted. - detailed-design.md §2.12 advanced config table: 11 → 10 knobs; `compression-settle-seconds` row removed; `compression-min-idle- seconds` description simplified. - detailed-design.md §7.1 transparency-mode harness config: drop the `--compression-settle-seconds 0` line so the harness doesn't pass an unknown option. - idea-honing.md Q6 baseline filter bullet: collapse the two-bullet LRU branch into a single bullet; add an _italicized rationale paragraph_ explaining why the second knob was dropped (preserves the historical thinking for future readers). - idea-honing.md Q6 consolidated predicate: matches detailed-design.md. - idea-honing.md Q6 config table: drop the `compression-settle-seconds` row. - idea-honing.md Q6a answer: rewrite to reflect single-knob reality with reference to "S2.2 implementation review" so future readers can trace this refinement chain (Thread #18 → Thread #19 → S2.2 refinement). - idea-honing.md §7.1 harness config: drop the `--compression-settle-seconds 0` line. - implementation/plan.md S2.2 description: simplify to "policy-aware hot-key skip" + the actual operator-facing knobs. - summary.md: update the eligibility table row + walkthrough- highlights bullet to reflect the policy-aware single-knob outcome. Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally unchanged — they capture decisions at a point in time. The walkthrough Thread #18/#19 resolutions stand as written; only the implementation interpretation in the live design docs is refined. Verified locally ---------------- - `make -j` builds clean. - `./runtest --single unit/type/compression` 10/10 passes (the Tcl fixture's config-default assertion was updated in lockstep with the C-side removal, so the integration test catches any drift between src/config.c and tests/unit/type/compression.tcl). Not verified locally (CI will validate): - gtest unit tests (no libgtest-dev locally). Test count delta ---------------- S2.2 gtest: 16 tests → 14 tests (dropped 2, simplified 2 to remove the dual-knob exercise paths).

Two small fixes to the previous commit's collateral: 1. proposal-issue.md was inadvertently committed via `git add -A` in the previous commit. The file is a working draft of the upstream issue (already tracked in the valkey-io issue tracker) and doesn't belong in the planning directory. Removing. 2. plan.md still showed S2.2 as `[ ]`. Implementation-complete state matches the S2.1 marking convention (`[x]` once the task ships); on merge to unstable the marking becomes definitive.

ikolomi added 2 commits May 26, 2026 12:57

ikolomi requested a review from GilboaAWS May 26, 2026 12:05

ikolomi added 3 commits May 27, 2026 12:39

ikolomi merged commit 32bc54e into unstable May 27, 2026
72 of 74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ikolomi/s2 eligibility#9

Ikolomi/s2 eligibility#9
ikolomi merged 5 commits into
unstablefrom
ikolomi/s2-eligibility

ikolomi commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ikolomi commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ikolomi commented May 26, 2026 •

edited

Loading