Inline compression: drop dict-scoped retry guard (S2.3 superseded) by ikolomi · Pull Request #10 · ikolomi/valkey

ikolomi · 2026-05-28T12:01:46Z

Drops the incompressibleKeys{} side hashtable design from S2.3 and the compression-retry-interval config knob; refines the existing drift signal in R2.3.5 to include rejected attempts in the compression_live_ratio_10m rolling average. No new config knob.

Why this matters

Under a fixed dict and a fixed workload, the probability that any given write at key K produces compressible bytes is constant — time- based throttling between retries doesn't shift this probability, only spaces attempts. CPU bounding is already provided at the sweep level (compression-sweep-max-cpu-pct); per-key throttling would be redundant and would add correctness burden (every mutating code path — dbOverwrite, APPEND, SETRANGE, BITOP, BITFIELD, module DMA writes — would need a Clear integration point to avoid suppressing compression of legitimately-changed values).

The right trigger for "the rejection might now compress" is dict change, which is handled at the system level. Updated definition: compression_live_ratio_10m is now the rolling ratio over compression attempts (successful + rejected) in the 10 min window, weighted by uncompressed bytes. Each successful compression contributes its actual compressed/uncompressed ratio. Each rejection contributes its actual measured ratio (typically in [0.9, 1.05] since the net-savings guard rejected it for being too close to 1.0). Worker errors are excluded — they're tracked separately via compression_errors_total.

Both forms of degradation feed the metric uniformly:

workload-content drift among compressible values, and
workload composition drift (more values compressing poorly enough to fail the net-savings guard).

A single drift trigger threshold catches both scenarios. Operators reason about ONE metric; the design has ONE config knob (compression-dict-drift-ratio) for the threshold.

This is a deliberate simplification over an earlier design (PR #10) which proposed a per-key incompressibleKeys side hashtable with compression-retry-interval time fallback. That approach added a module + a config knob + integration points in every mutating code path, but the underlying assumption (waiting between retries improves the per-attempt hit rate) is incorrect for fixed-distribution workloads under a fixed dict. The simpler model — retry on every sweep tick, let the drift signal handle systemic dict-fit issues — is correct and operationally cleaner.

Drift comparison direction fixed

R2.3.5's existing trigger had the comparison backwards. The ratio convention is compressed/uncompressed (lower = more savings, per R2.10.1). The previous formulation —

fires when live_ratio < drift_ratio × post_training_ratio

— fires when live is BETTER (lower) than 70% of post-training, i.e., when compression IMPROVES over post-training. That's not drift.

Corrected to:

fires when live_ratio > post_training_ratio / drift_ratio

With drift_ratio = 0.7 (70%): fires when live exceeds post-training by a factor of 1/0.7 ≈ 1.43 — i.e., compression efficiency has degraded by about 43%. Default value preserved; only the comparison direction changed.

Why "real measured ratio" not "1.0" for rejections --------------------------------------------------

An earlier draft of this commit substituted ratio = 1.0 for every rejected attempt, on the theory that we store such values uncompressed so the effective storage savings is zero. That was reviewed and revised: the metric's purpose is drift detection (workload- compressibility under this dict), not effective-storage tracking. A rejected value at 0.91 (just above the 90% net-savings threshold) and a truly incompressible value at 1.05 carry different information about dict-workload fit. Substituting 1.0 for both throws away that distinction; using the actual measured ratio preserves it. The data is right there — the worker reports the ratio when it returns, we just record what was measured.

Code changes

src/compression.c (compressionIsEligible):
- Drop branch 6 (the S2.2-stub "always retry-eligible" comment + the S2.3-anticipated lookup that never landed).
- Drop the key parameter from the signature — no longer needed now that no branch consumes it. Trivial to re-add when a future branch (e.g., S2.7 write-path hook) wants the key for logging.
- Update doc comment to reflect the simpler predicate.
src/compression.h: matching signature change.
src/server.h: drop compression_retry_interval field.
src/config.c: drop compression-retry-interval registration. Advanced knobs count goes 10 → 9.
src/unit/test_compression_eligibility.cpp: drop key parameter from all test calls.
tests/unit/type/compression.tcl: drop the compression-retry-interval config-default assertion; update comment from "Advanced (10)" to "Advanced (9)".

Doc changes

detailed-design.md §2.2:
- Predicate: drop retry_eligible(obj) line.
- Rationale paragraph rewritten to use the probability argument: time-throttling doesn't change per-attempt success probability, only spaces attempts; sweep pacing already bounds CPU. Per-key tracking is functionless and adds integration burden.
detailed-design.md §2.3 R2.3.5 drift triggers:
- Single trigger formulation, comparison direction fixed (< → >). Definition of compression_live_ratio_10m made explicit: rolling ratio over attempts (successful + rejected) using actual measured ratios; worker errors excluded.
detailed-design.md §2.12 advanced config table:
- compression-retry-interval removed (10 → 9).
- compression-dict-drift-ratio description updated to reflect corrected formula.
detailed-design.md §6.6 Net-savings guard failure: rewritten with the probability-argument rationale and real-ratio framing.
detailed-design.md §7.1 transparency-mode harness config: drop --compression-retry-interval 0 line.
idea-honing.md Q6 baseline filter "Skip post-compression" bullet: matching rewrite (probability argument, real-ratio framing).
idea-honing.md Q6 consolidated predicate: drop retry_eligible(obj) line and the where retry_eligible(obj) is... block.
idea-honing.md Q6 config table: drop compression-retry-interval row.
idea-honing.md Q6b answer: matching rewrite with reference to the S2.3 implementation review (PR Inline compression: drop dict-scoped retry guard (S2.3 superseded) #10 design discussion) so the trace is preserved.
idea-honing.md §7.1 harness: drop the same flag.
summary.md: update eligibility row + walkthrough-highlights bullet to note the post-walkthrough refinement.
plan.md S2 row description: drop "incompressible-keys hashtable" from S2's scope summary.
plan.md S2.2: drop the trailing "incompressible-keys hashtable check" claim.
plan.md S2.3: marked "(REMOVED)" with explanation; struck-through.
plan.md S1.4: extended description to include the rolling-ratio semantics extension (rejections contribute their actual measured ratio) — the drift-detection work absorbed from removed S2.3.

Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally unchanged — Thread #20's resolution stands as written; only the implementation interpretation in the live design docs is refined.

Branch ikolomi/s2-incompressible (the abandoned S2.3 implementation) should be closed/deleted on GitHub.

Verified locally

make -j builds clean.
./runtest --single unit/type/compression 10/10 passes.

Not verified locally (CI will validate):

gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).

Drops the `incompressibleKeys{}` side hashtable design from S2.3 and the `compression-retry-interval` config knob; refines the existing drift signal in R2.3.5 to include rejected attempts in the `compression_live_ratio_10m` rolling average. No new config knob. Why this matters ---------------- Under a fixed dict and a fixed workload, the probability that any given write at key K produces compressible bytes is constant — time- based throttling between retries doesn't shift this probability, only spaces attempts. CPU bounding is already provided at the sweep level (`compression-sweep-max-cpu-pct`); per-key throttling would be redundant and would add correctness burden (every mutating code path — `dbOverwrite`, APPEND, SETRANGE, BITOP, BITFIELD, module DMA writes — would need a `Clear` integration point to avoid suppressing compression of legitimately-changed values). The right trigger for "the rejection might now compress" is **dict change**, which is handled at the system level. Updated definition: `compression_live_ratio_10m` is now the rolling ratio over compression *attempts* (successful + rejected) in the 10 min window, weighted by uncompressed bytes. Each successful compression contributes its actual `compressed/uncompressed` ratio. Each rejection contributes its actual measured ratio (typically in `[0.9, 1.05]` since the net-savings guard rejected it for being too close to 1.0). Worker errors are excluded — they're tracked separately via `compression_errors_total`. Both forms of degradation feed the metric uniformly: - workload-content drift among compressible values, and - workload composition drift (more values compressing poorly enough to fail the net-savings guard). A single drift trigger threshold catches both scenarios. Operators reason about ONE metric; the design has ONE config knob (`compression-dict-drift-ratio`) for the threshold. This is a deliberate simplification over an earlier design (PR #10) which proposed a per-key `incompressibleKeys` side hashtable with `compression-retry-interval` time fallback. That approach added a module + a config knob + integration points in every mutating code path, but the underlying assumption (waiting between retries improves the per-attempt hit rate) is incorrect for fixed-distribution workloads under a fixed dict. The simpler model — retry on every sweep tick, let the drift signal handle systemic dict-fit issues — is correct and operationally cleaner. Drift comparison direction fixed -------------------------------- R2.3.5's existing trigger had the comparison backwards. The ratio convention is `compressed/uncompressed` (lower = more savings, per R2.10.1). The previous formulation — fires when live_ratio < drift_ratio × post_training_ratio — fires when live is BETTER (lower) than 70% of post-training, i.e., when compression IMPROVES over post-training. That's not drift. Corrected to: fires when live_ratio > post_training_ratio / drift_ratio With drift_ratio = 0.7 (70%): fires when live exceeds post-training by a factor of 1/0.7 ≈ 1.43 — i.e., compression efficiency has degraded by about 43%. Default value preserved; only the comparison direction changed. Why "real measured ratio" not "1.0" for rejections -------------------------------------------------- An earlier draft of this commit substituted `ratio = 1.0` for every rejected attempt, on the theory that we store such values uncompressed so the effective storage savings is zero. That was reviewed and revised: the metric's purpose is **drift detection** (workload- compressibility under this dict), not effective-storage tracking. A rejected value at 0.91 (just above the 90% net-savings threshold) and a truly incompressible value at 1.05 carry different information about dict-workload fit. Substituting 1.0 for both throws away that distinction; using the actual measured ratio preserves it. The data is right there — the worker reports the ratio when it returns, we just record what was measured. Code changes ------------ - src/compression.c (`compressionIsEligible`): - Drop branch 6 (the S2.2-stub "always retry-eligible" comment + the S2.3-anticipated lookup that never landed). - Drop the `key` parameter from the signature — no longer needed now that no branch consumes it. Trivial to re-add when a future branch (e.g., S2.7 write-path hook) wants the key for logging. - Update doc comment to reflect the simpler predicate. - src/compression.h: matching signature change. - src/server.h: drop `compression_retry_interval` field. - src/config.c: drop `compression-retry-interval` registration. Advanced knobs count goes 10 → 9. - src/unit/test_compression_eligibility.cpp: drop `key` parameter from all test calls. - tests/unit/type/compression.tcl: drop the `compression-retry-interval` config-default assertion; update comment from "Advanced (10)" to "Advanced (9)". Doc changes ----------- - detailed-design.md §2.2: - Predicate: drop `retry_eligible(obj)` line. - Rationale paragraph rewritten to use the probability argument: time-throttling doesn't change per-attempt success probability, only spaces attempts; sweep pacing already bounds CPU. Per-key tracking is functionless and adds integration burden. - detailed-design.md §2.3 R2.3.5 drift triggers: - Single trigger formulation, comparison direction fixed (< → >). Definition of `compression_live_ratio_10m` made explicit: rolling ratio over attempts (successful + rejected) using actual measured ratios; worker errors excluded. - detailed-design.md §2.12 advanced config table: - `compression-retry-interval` removed (10 → 9). - `compression-dict-drift-ratio` description updated to reflect corrected formula. - detailed-design.md §6.6 Net-savings guard failure: rewritten with the probability-argument rationale and real-ratio framing. - detailed-design.md §7.1 transparency-mode harness config: drop `--compression-retry-interval 0` line. - idea-honing.md Q6 baseline filter "Skip post-compression" bullet: matching rewrite (probability argument, real-ratio framing). - idea-honing.md Q6 consolidated predicate: drop `retry_eligible(obj)` line and the `where retry_eligible(obj) is...` block. - idea-honing.md Q6 config table: drop `compression-retry-interval` row. - idea-honing.md Q6b answer: matching rewrite with reference to the S2.3 implementation review (PR #10 design discussion) so the trace is preserved. - idea-honing.md §7.1 harness: drop the same flag. - summary.md: update eligibility row + walkthrough-highlights bullet to note the post-walkthrough refinement. - plan.md S2 row description: drop "incompressible-keys hashtable" from S2's scope summary. - plan.md S2.2: drop the trailing "incompressible-keys hashtable check" claim. - plan.md S2.3: marked "(REMOVED)" with explanation; struck-through. - plan.md S1.4: extended description to include the rolling-ratio semantics extension (rejections contribute their actual measured ratio) — the drift-detection work absorbed from removed S2.3. Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally unchanged — Thread #20's resolution stands as written; only the implementation interpretation in the live design docs is refined. Branch ikolomi/s2-incompressible (the abandoned S2.3 implementation) should be closed/deleted on GitHub. Verified locally ---------------- - `make -j` builds clean. - `./runtest --single unit/type/compression` 10/10 passes. Not verified locally (CI will validate): - gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).

ikolomi requested a review from GilboaAWS May 28, 2026 12:01

github-actions Bot assigned ikolomi May 28, 2026

ikolomi force-pushed the ikolomi/drop-incompressible-keys branch from b412122 to 0bb3556 Compare May 28, 2026 17:34

ikolomi merged commit a6fa453 into unstable May 28, 2026
79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline compression: drop dict-scoped retry guard (S2.3 superseded)#10

Inline compression: drop dict-scoped retry guard (S2.3 superseded)#10
ikolomi merged 1 commit into
unstablefrom
ikolomi/drop-incompressible-keys

ikolomi commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ikolomi commented May 28, 2026

Why this matters

Drift comparison direction fixed

Code changes

Doc changes

Verified locally

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant