Ikolomi/s2 eligibility#9
Merged
Merged
Conversation
…key skip Two corrections discovered while preparing S2.2 implementation: 1. The two design docs had drifted on minor wording (idea-honing.md said `obj->encoding == RAW` and `obj->refcount != SHARED`; detailed-design.md said `OBJ_ENCODING_RAW` / `OBJ_SHARED_REFCOUNT`). Predicates now match exactly across both docs, using the actual C constants from src/server.h. Per the agreed convention: keep the exact predicate inline in both docs (different audiences both need it readable in place) rather than a cross-reference. 2. The Thread #18/#19 walkthrough resolutions made a claim that doesn't match the existing Valkey source: "the existing LRU field already provides the signal for every eviction policy" / time-based checks "work uniformly across LRU, LFU, and noeviction" / `estimateObjectIdleTime()` is "a reliable read-hotness signal universally." Reading src/lrulfu.h: - LRU and noeviction policies: robj->lru is seconds-based. lru_idle_secs(o) returns real seconds. Time-based thresholds work as designed. - LFU policy: robj->lru encodes 16 bits "last eval time in minutes" + 8 bits approximate freq counter. There is no per-second access timestamp. lru_idle_secs(o) would misinterpret the bits. The function `estimateObjectIdleTime()` referenced by the design does not exist in current Valkey; the closest available helper is `lrulfu_getIdleness()` which returns `UINT8_MAX - freq` in LFU mode (a 0..255 freq-derived heuristic, NOT seconds). The fix is policy-aware checks, not policy-uniform: - LRU and noeviction: apply settle-seconds AND min-idle-seconds against lru_idle_secs(o). - LFU: skip the time-based thresholds entirely (the metric is wrong unit). Apply compression-lfu-threshold against the freq counter — already in the predicate as the LFU branch. The dual-knob operator surface (`compression-settle-seconds` and `compression-min-idle-seconds`) is preserved across modes; in LRU/noeviction both knobs apply to the same metric (since robj->lru is touched on every read AND write — v1 cannot distinguish source) so the effective threshold is `max(settle, min_idle)`. Operators get to express two intents. The Thread #18/#19 resolutions stand as written in DESIGN_TODO.md (audit trail; the decision to use the existing LRU field rather than add a new write-time field is unchanged); only the implementation interpretation in the live design docs is refined. Updated: - detailed-design.md §2.2 R2.2 predicate + new explanatory paragraph below it. - detailed-design.md §2.12 config table: settle-seconds and min-idle-seconds descriptions now correctly note "Inactive in LFU mode." - idea-honing.md Q6 baseline filter bullet (rewritten as "policy-aware" with sub-bullets per policy). - idea-honing.md Q6 consolidated predicate (now identical to detailed-design.md §2.2). - idea-honing.md Q6a answer: rewritten with the policy-aware framing; cross-references "S2.2 implementation review" so future readers can trace this refinement.
Implements R2.2 / Q6 `compressionIsEligible(robj *o, const sds key)`,
replacing the Phase 0 stub that returned 0 for every value.
The predicate has six gates, evaluated in cheapest-first order so the
master switch short-circuits early when the feature is disabled:
1. Master switch (server.compression_enabled).
2. Type + encoding gate. STRING values only (R2.2, Q6c). Of the four
string encodings, only OBJ_ENCODING_RAW is a candidate:
- INT — already memory-optimal.
- EMBSTR — ≤44 B, header overhead erases any savings (Threads
#17 / #21 explicitly excluded).
- COMPRESSED — defense-in-depth no-op for double-compress.
3. Refcount gate. Shared RESP constants are never installed in a db
(lookupKey asserts this); we mirror the assertion as a safety
check.
4. Size bounds. compression-min-value-size (lower) prevents wasting
CPU on values too small to recoup the per-value header (~16 B).
compression-max-value-size (upper, 0 = disabled) caps worst-case
sync-decompression latency on the main thread (~1 µs/KB).
5. Hot-key skip — POLICY-AWARE per the corrected R2.2:
- LRU and noeviction: robj->lru is seconds-based. Apply both
compression-settle-seconds (recent-write proxy) and
compression-min-idle-seconds (recent-read proxy) against
lru_getIdleSecs(o->lru). v1 cannot distinguish source
(robj->lru is touched on read AND write); the dual surface
lets operators express two intents that share an underlying
signal. Effective threshold is max(settle, min_idle).
- LFU: robj->lru encodes a freq counter (no per-second
timestamp). Time-based knobs are inactive in this mode.
Apply compression-lfu-threshold against the freq counter via
lfu_getFrequency(), which mirrors the standard Valkey
decay-on-read pattern (objectGetLFUFrequency in src/object.c).
The signature is `robj *o` (not `const robj *`) because of
this in-place decay.
6. Incompressible-keys retry guard. Stubbed as "always retry-eligible"
in S2.2; S2.3 lands the side hashtable and wires
compressionRetryEligible(key) here.
Test coverage in src/unit/test_compression_eligibility.cpp (16 tests,
auto-discovered by src/unit/Makefile's `wildcard *.cpp`):
- Master switch off / on.
- Each rejection branch:
* non-STRING type
* INT, EMBSTR, COMPRESSED encodings
* shared refcount
* size below min, above max
- Size-bound boundary cases (at exact min, at exact max,
max=0 disables upper bound).
- LRU branch:
* recent touch (idle < settle) — rejected
* idle between settle and min_idle (max wins) — rejected
* cold key (idle >= max(settle, min_idle)) — accepted
* zero thresholds — accept immediately
- noeviction policy: same code path as LRU per R2.2.
- LFU branch:
* freq at threshold — rejected (>= comparison)
* freq above threshold — rejected
* freq below threshold — accepted
* time-based knobs are inactive even at INT_MAX values.
The fixture saves and restores `server.compression_*` and
`server.maxmemory_policy`, and re-syncs lrulfu's cached
`is_using_lfu_policy` boolean via `lrulfu_updateClockAndPolicy()` on
both setup and teardown so tests don't leak policy state into each
other.
Verified locally:
- `make -j` builds clean.
- `./runtest --single unit/type/compression` 10/10 passes (the Phase 0
integration fixture exercises feature-off semantics; with
compression-enabled still 0 by default, eligibility is never
consulted in the integration server).
Not verified locally (CI will validate):
- gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).
S2.3 (incompressible-keys hashtable) wires the last branch and ships
its own gtest coverage. After that, S2.4–S2.10 wire the rest of the
hot path.
…ed form
Per review feedback during S2.2 review: the previous form encoded the
policy split using short-circuit booleans —
&& (lfu_mode || lru_idle_secs(obj) >= compression-settle-seconds)
&& (lfu_mode || lru_idle_secs(obj) >= compression-min-idle-seconds)
&& (!lfu_mode || lfu_freq(obj) < compression-lfu-threshold)
— which is logically correct but reads awkwardly. Three lines mention
`lfu_mode` (twice unprimed, once primed); the reader has to mentally
short-circuit twice to see that line 1+2 fire only in LRU/noeviction
and line 3 fires only in LFU. It also looks at first glance like the
predicate might be using `compression-lfu-threshold` as an LRU-mode
threshold.
Replaced with a branched helper that mirrors how the C
implementation's if/else branches:
&& hot_key_check(obj) // policy-aware
where hot_key_check(obj) is:
if lfu_mode:
lfu_freq(obj) < compression-lfu-threshold
else:
lru_idle_secs(obj) >= compression-settle-seconds
AND lru_idle_secs(obj) >= compression-min-idle-seconds
Same behavior; the implementation in src/compression.c
(`compressionIsEligible()`) already uses this exact branching shape —
the docs now match it visually.
Updated:
- detailed-design.md §2.2 R2.2 predicate.
- idea-honing.md Q6 consolidated predicate.
Both docs were already aligned (per the previous predicate-alignment
commit); they remain identical with this rewrite. The explanatory
paragraph below the §2.2 predicate (LRU vs LFU lru-field encoding)
already covers the rationale and is unchanged.
Removes one of the two time-based hot-key skip knobs. The eligibility
predicate's LRU/noeviction branch now compares against a single
threshold (`compression-min-idle-seconds`) instead of two.
Why this matters
----------------
The dual-knob surface — `compression-settle-seconds` ("recent-write
protection") and `compression-min-idle-seconds` ("recent-access
protection") — was introduced via the Thread #18 walkthrough resolution
on the theory that operators benefit from being able to express two
different intents.
In v1 reality this is documentation theater. Both knobs compare against
the same metric (`lru_idle_secs(o)`) because Valkey's `robj->lru` field
is touched on every read AND every write — gated only by
`LOOKUP_NOTOUCH` and fork. v1 cannot distinguish read-recency from
write-recency from this single signal. The math always reduces to:
eligible iff lru_idle_secs(o) >= max(settle, min_idle)
Setting `settle=10, min_idle=120` is identical to the single-knob
`min_idle=120`. Tested every scenario I could construct (different
time scales, sweep cron sequences, heterogeneous workloads, forward-
compat with v2) — none give the dual surface operationally distinct
behavior in v1.
The only non-trivial argument for keeping both was forward-compat: if
v2 adds a per-object write-time field, the dual knob becomes
meaningful. But adding a config in v2 is non-breaking; existing
operators on v1 see no change. Removing later is harder than adding
later.
Plus the dual surface is an active footgun: operators tuning the two
knobs differently expecting different effects get a confusing
no-difference outcome. PR #1 Thread #3 specifically pushed back on
"too many knobs" — that pressure applies here.
Per YAGNI, ship v1 with the single knob. v2 reintroduces a
write-time-specific knob non-breakingly when per-object write-time
tracking lands.
Code changes
------------
- src/server.h: remove `compression_settle_seconds` field.
- src/config.c: remove the `createIntConfig` registration.
- src/compression.c: drop the second `idle_secs >= settle` check
in `compressionIsEligible`'s LRU branch. Updated the comment block
to reflect single-signal reality.
- src/unit/test_compression_eligibility.cpp:
- Drop `LruRejectsBetweenSettleAndMinIdle` (test of dual-knob
max-wins behavior — no longer applicable).
- Replace `LruRejectsRecentTouch` / `LruAcceptsBeyondBothThresholds`
/ `LruZeroThresholdsAcceptImmediately` with single-knob
equivalents (`LruRejectsRecentTouch`, `LruAcceptsBeyondThreshold`,
`LruAtThresholdAcceptsBoundary`, `LruZeroThresholdAcceptsImmediately`).
- Drop `compression-settle-seconds` from `LfuTimeKnobsAreInactive`
and rename to `LfuTimeKnobIsInactive`.
- tests/unit/type/compression.tcl: drop the
`compression-settle-seconds` config-default assertion; update
comment from "Advanced (11)" to "Advanced (10)".
Doc changes
-----------
- detailed-design.md §2.2 R2.2 predicate: hot_key_check helper now has
one comparison in the LRU/noeviction branch instead of two. The
rationale paragraph below the predicate explains the v1 single-
signal reality and the YAGNI motivation for dropping the second
knob; future v2 reintroduction noted.
- detailed-design.md §2.12 advanced config table: 11 → 10 knobs;
`compression-settle-seconds` row removed; `compression-min-idle-
seconds` description simplified.
- detailed-design.md §7.1 transparency-mode harness config: drop the
`--compression-settle-seconds 0` line so the harness doesn't pass
an unknown option.
- idea-honing.md Q6 baseline filter bullet: collapse the two-bullet
LRU branch into a single bullet; add an _italicized rationale
paragraph_ explaining why the second knob was dropped (preserves
the historical thinking for future readers).
- idea-honing.md Q6 consolidated predicate: matches detailed-design.md.
- idea-honing.md Q6 config table: drop the `compression-settle-seconds`
row.
- idea-honing.md Q6a answer: rewrite to reflect single-knob reality
with reference to "S2.2 implementation review" so future readers
can trace this refinement chain (Thread #18 → Thread #19 → S2.2
refinement).
- idea-honing.md §7.1 harness config: drop the
`--compression-settle-seconds 0` line.
- implementation/plan.md S2.2 description: simplify to "policy-aware
hot-key skip" + the actual operator-facing knobs.
- summary.md: update the eligibility table row + walkthrough-
highlights bullet to reflect the policy-aware single-knob outcome.
Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally
unchanged — they capture decisions at a point in time. The
walkthrough Thread #18/#19 resolutions stand as written; only the
implementation interpretation in the live design docs is refined.
Verified locally
----------------
- `make -j` builds clean.
- `./runtest --single unit/type/compression` 10/10 passes (the Tcl
fixture's config-default assertion was updated in lockstep with the
C-side removal, so the integration test catches any drift between
src/config.c and tests/unit/type/compression.tcl).
Not verified locally (CI will validate):
- gtest unit tests (no libgtest-dev locally).
Test count delta
----------------
S2.2 gtest: 16 tests → 14 tests (dropped 2, simplified 2 to remove
the dual-knob exercise paths).
Two small fixes to the previous commit's collateral: 1. proposal-issue.md was inadvertently committed via `git add -A` in the previous commit. The file is a working draft of the upstream issue (already tracked in the valkey-io issue tracker) and doesn't belong in the planning directory. Removing. 2. plan.md still showed S2.2 as `[ ]`. Implementation-complete state matches the S2.1 marking convention (`[x]` once the task ships); on merge to unstable the marking becomes definitive.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Inline compression: S2.2 — eligibility predicate
Implements R2.2 / Q6
compressionIsEligible(robj *o, const sds key),replacing the Phase 0 stub that returned 0 for every value.
The predicate has six gates, evaluated in cheapest-first order so the
master switch short-circuits early when the feature is disabled:
string encodings, only OBJ_ENCODING_RAW is a candidate:
Wishlist valkey-io/valkey#17 / Improve slot migration reliability valkey-io/valkey#21 explicitly excluded).
(lookupKey asserts this); we mirror the assertion as a safety
check.
CPU on values too small to recoup the per-value header (~16 B).
compression-max-value-size (upper, 0 = disabled) caps worst-case
sync-decompression latency on the main thread (~1 µs/KB).
lru_getIdleSecs(o->lru)returns seconds-since-last-touch(read OR write — the lru field is touched on every access,
gated only by LOOKUP_NOTOUCH and fork). v1 cannot
distinguish read-recency from write-recency from this
single signal, so a single threshold —
compression-min-idle-seconds— gates eligibility on the"value has been quiet long enough to be worth compressing"
property.
timestamp). Time-based knobs are inactive in this mode.
Apply compression-lfu-threshold against the freq counter via
lfu_getFrequency(), which mirrors the standard Valkey
decay-on-read pattern (objectGetLFUFrequency in src/object.c).
The signature is
robj *o(notconst robj *) because ofthis in-place decay.
in S2.2; S2.3 lands the side hashtable and wires
compressionRetryEligible(key) here.
Test coverage in src/unit/test_compression_eligibility.cpp (16 tests,
auto-discovered by src/unit/Makefile's
wildcard *.cpp):max=0 disables upper bound).
The fixture saves and restores
server.compression_*andserver.maxmemory_policy, and re-syncs lrulfu's cachedis_using_lfu_policyboolean vialrulfu_updateClockAndPolicy()onboth setup and teardown so tests don't leak policy state into each
other.
Verified locally:
make -jbuilds clean../runtest --single unit/type/compression10/10 passes (the Phase 0integration fixture exercises feature-off semantics; with
compression-enabled still 0 by default, eligibility is never
consulted in the integration server).
Not verified locally (CI will validate):