[S4.1 / Phase A item 1] Wire INFO compression observability counters by ikolomi · Pull Request #39 · ikolomi/valkey

ikolomi · 2026-06-18T10:41:58Z

Summary

Phase A item 1 of the post-S2 strategy. Wires the previously stubbed-at-:0 observability fields in compressionRenderFields to live state, removes the TODO(S4.x) annotations from tests/integration/compression.tcl, and makes assert_no_compression_errors a real gate. Only the three rolling-window rate/ratio fields remain stubbed (they need EMA / per-second-rate machinery — a self-contained follow-up).

Along the way this also fixes two dict-retirement reclamation bugs that were keeping compression_known_dicts / compression_dict_cap_reached dishonest, and surfaces a third (deeper) frame-ref accounting bug that is left to the registry owner.

INFO field status

Wired to live state (12):

Field	Source
`compression_state`	derived: `master_switch` + active-dict presence
`compression_dict_cap_reached`	derived: `known_dicts >= compression-dict-max-versions`
`compression_compressed_objects`	new atomic counter
`compression_total_uncompressed_bytes`	existing getter
`compression_total_compressed_bytes`	existing getter
`compression_ratio`	derived: `compressed/uncompressed`, 4 decimal digits
`compression_net_saved_bytes`	`compressionGetSavingsBytes()`
`compression_candidates_pending`	new getter `compressionWorkersGetCandidatesPending()`
`compression_skipped_incompressible`	new atomic counter
`compression_errors_total`	new atomic counter
`compression_training_last_duration_ms`	new field on `compressionTrainState`, snapshotted on bio completion
`compression_training_last_sample_count`	same

(The master-switch / sweeper / back-pressure fields — compression_master_switch, compression_automatic_sweeper*, compression_sweeper_running, compression_active_dict_id, compression_known_dicts, compression_candidates_dropped_total, compression_sweep_backpressure_total, compression_sweep_pacing_sleeps_total, compression_outbox_backpressure_total — were already live from prior PRs.)

Still stubbed at 0, with explicit TODO(S4.x) at the emit site (3):

Field	Reason
`compression_live_ratio_10m`	needs rolling-EMA machinery (10-min window with rejected ratios folded in per R2.3.5)
`compression_compressions_per_sec`	needs rolling-window per-second rate machinery
`compression_decompressions_per_sec`	same

The rolling-rate machinery is its own sub-feature; folding it in would more than double this PR. It gets a follow-up after Phase A's S2.13 + S5.1.

`compression_state` derivation

compression_state is a derived operational rollup (design R2.10.1). The raw inputs remain rendered as their own fields (compression_master_switch, compression_active_dict_id), so an operator still reads the unrolled config independently.

`master_switch`	active dict?	`compression_state`
`off`	(any)	`disabled`
`compression`	yes	`active`
`compression`	no	`idle` (R2.1.7 third state)
`decompression`	(any)	`idle` (active dict retired per R2.1.5; reads still work via retiring dicts)

training is reserved for an in-flight training run and lands with the rolling-rate follow-up.

Counter wiring details

compression_compressed_objects matches the existing byte-counter lifecycle: +1 in createCompressedObject, −1 in freeCompressedObject AND releaseCompressedBuffer (the latter covers compressionPermanentlyDecompress and the transient-view discard). Balanced +1/−1 across all 5 frame-lifecycle paths.
compression_skipped_incompressible bumped at the net-savings-guard rejection site (compression_workers.c drain handler).
compression_errors_total bumped at the 5 decoder corruption-class sites in compression.c (bad alg_magic, missing dict, DCtx alloc, ZSTD error, size mismatch) and at the worker drain handler when job->err < 0 (real ZSTD error, not the benign no-dict policy state).
compression_training_last_*: compressionTrainState gains scan_start_ms (set on IDLE→SCANNING), last_duration_ms, last_sample_count (snapshotted on bio completion — both the success and abort-on-insufficient-samples paths). New getters in compression_train.h. (S1.x training landed via [S1.2/S1.3] Training sampler + BIO_COMPRESSION_TRAIN job #20/fix(S1.2): NULL-guard server.db[j] iteration in compressionTrainCron #25, so these are wired here rather than deferred.)

Dict-retirement reclamation fixes

Making known_dicts / dict_cap_reached honest exposed that retiring dicts weren't actually being reclaimed in steady state. Two bugs, both fixed:

GC never driven from cron. compressionRegistryTryGc() ran only inside compressionRegistryAdd/Retire. After a master=decompression drain, retiring dicts lingered forever and known_dicts / dict_cap_reached stayed pinned at their peak. Fixed by calling compressionRegistryTryGc() from compressionCron (cheap: scans ≤16 slots, no-op when nothing is retiring).
Idle-pool QSBR lost-wake. The one-shot grace-barrier wake in startRetirement can be missed when the worker pool is fully idle (a worker not parked at the broadcast instant never sees it, and with no compression jobs there is nothing to advance its quiescent generation). The dict's gen snapshot is then never passed and it never reclaims. Fixed by having the cron GC re-issue the wake whenever a retiring, frame-ref-free dict is blocked only on worker generation (blockedOnWorkerGen), so the broadcast eventually lands while the worker is parked. Refactored the shared gen-check into workersQuiescedPastRetire() so canFree and blockedOnWorkerGen don't duplicate the loop.

Known follow-up (not in this PR): frame-ref accounting leak

A dict that ever held compressed frames keeps frame_refs pinned even after every frame is freed (verified: after flushall, compressed_objects/total_*_bytes all reach 0 but the dict stays RETIRING frame_refs=N). So a dict that held frames can never retire. This is an incRef/decRef imbalance in the registry/worker refcount path, predates this PR, and needs the registry owner's analysis — @GilboaAWS is taking it as a separate S1.x item. The tests here are written to be robust to it (see below), so it doesn't block this PR.

Tests

Per review feedback that integration tests must not no-op when a precondition isn't met:

New helper compression_clear_active_dict reaches a clean precondition reliably (including in --external shared-server mode) using only invariants that always hold: master=decompression clears the active pointer deterministically, compression-dict-max-versions 16 guarantees registry headroom, and flushall zeroes the frame counters. It does not depend on a full drain to 0 (which the frame-ref leak above can prevent).
Removed all runnable/catch/skip guards from the INFO-fields, skipped-incompressible, and training tests — no test no-ops.
Each test sets the full eligibility config it depends on (notably compression-max-value-size 0) rather than inheriting state from a prior test in the shared server — a config-inheritance hazard that surfaced in --external mode (a prior test's compression-max-value-size 512 had made 1 KB keys ineligible).
dict_cap_reached test reads the live, stabilized known_dicts and asserts the flag relationship in both directions relative to it (cap above → 0, cap == → 1, cap above → 0 again), rather than assuming an exact count (uncontrollable in --external mode). Fails with a clear diagnostic if the registry can't offer headroom below the max cap.
Tightened assertions per review: compressed_objects == N exactly, net_saved == unc-comp exactly, training sample_count == population exactly. The gtest BioEndToEnd asserts training_last_* are 0 before training and last_sample_count == 500 exactly after.

Per-field integration coverage:

Field	Integration test
`compression_state`	INFO-fields lifecycle (disabled → active → idle)
`compression_dict_cap_reached`	dedicated cap test (both directions)
`compression_compressed_objects`	INFO-fields (== N populated, 0 after drain)
`compression_total_uncompressed_bytes` / `_compressed_bytes`	INFO-fields (> 0 / 0)
`compression_ratio`	INFO-fields (∈ (0,1) for compressible content)
`compression_net_saved_bytes`	INFO-fields (== unc − comp)
`compression_candidates_pending`	INFO-fields (field presence; gauge value is race-prone)
`compression_skipped_incompressible`	dedicated test (/dev/urandom incompressible bytes)
`compression_errors_total`	`assert_no_compression_errors` on every test (now a real gate)
`compression_training_last_*`	dedicated training test + gtest `BioEndToEnd`

Test-framework gotcha (documented inline)

Calling return from inside a test {} {body} block crashes the Tcl framework's per-test cleanup (can't unset ::valkey::attributes(N)). Tests use if/else / a runnable-free structure instead.

Verification

Builds clean with -Werror under both BUILD_ZSTD=yes and BUILD_ZSTD=no.
392 gtests pass.
Integration: 10/10 (normal mode); 38/38 in an --external shared-server simulation (both compression suites against one server) — the mode that previously caught the no-op / inherited-config issues.
ASan-clean (integration suite rebuilt with -fsanitize=address).

plan.md

S4.1 marked [~] in-progress (rolling-rate fields deferred). Phase A item 1 of the strategy doc; items 2 (S2.13 COW audit) and 3 (S5.1 benchmark scaffolding) come next.

ikolomi · 2026-06-21T07:33:44Z

+        set runnable [expr {$adid_pre == 0}]
+
+        if {!$runnable} {
+            # Active dict already present — auto-train trigger can't


also here, should the dict retirement already work, if so, why not to use this mechanic to retire dicts?

Good prompt — I dug into this. Retirement of the active dict works reliably: master=decompression auto-retires it (R2.1.5) and compression_active_dict_id clears to 0 every time. I now use exactly that mechanic in compression_clear_active_dict to reach the 'no active dict' precondition.

Full reclamation (known_dicts draining back down) was NOT working, though: compressionRegistryTryGc() only ran inside compressionRegistryAdd/Retire, never from cron — so after a drain, retiring dicts lingered and known_dicts/dict_cap_reached stayed pinned at their peak forever. Since this PR is about making those exact INFO fields honest, I added a compressionRegistryTryGc() call to compressionCron (cheap: scans ≤16 slots, no-op when nothing is retiring). With it, known_dicts drains as frames are released.

One residual gap remains that I did NOT fix here (it's registry/QSBR territory): when the last dict retires against a fully-idle worker pool, the QSBR grace-barrier wake can be missed, leaving that one dict un-reclaimed until the next worker activity. So a drain reaches a small stable residue rather than strictly 0. The cap test is written to tolerate that (it reads the live stabilized known_dicts and asserts the cap-flag relationship relative to it, rather than assuming 0/2). Flagging the residual reclaim edge for the registry owner as a follow-up — want me to file it against S1.x?

Update after digging into the root cause (option A — fix, not work around):

There were actually three layered issues behind 'dicts don't retire without compression work':

GC never driven from cron — compressionRegistryTryGc() ran only inside Add/Retire. Fixed: added it to compressionCron.

Idle-pool QSBR lost-wake — the one-shot grace-barrier wake in startRetirement can be missed when the worker pool is fully idle, so a retiring frame-free dict's worker-generation snapshot is never passed and it's never reclaimed. Fixed: the cron GC now re-issues the wake whenever a retiring, frame-ref-free dict is held up only on worker generation (blockedOnWorkerGen), so the broadcast eventually lands while the worker is parked. Verified: a registry with several frame-free retiring dicts now drains them (4 → 1 in my repro).

Frame-ref accounting leak (still open, NOT fixed here) — the residual '1' in that repro is a dict that held compressed frames. After flushall frees every frame (compressed_objects → 0, total_compressed_bytes → 0), that dict still reports frame_refs == 20. So a dict that ever held frames is permanently pinned and can never retire. The frame header dict_id and the install-time incRef id are both active->dict_id, so the mismatch is subtler (looks like an incRef/decRef imbalance per frame). This is in the registry/worker refcount path, predates this PR, and needs the registry owner's analysis — I don't think it's safe to patch blind from an observability PR.

The integration tests are written to be robust to #3: the cap test reads the live (stabilized) known_dicts and asserts the flag relationship relative to it, and compression_clear_active_dict only relies on the active pointer clearing (which is reliable) — so no test no-ops and none depends on a full drain to 0.

@GilboaAWS — want me to file #3 as a separate S1.x/registry issue with this repro?

Brings the stubbed-at-:0 fields in compressionRenderFields to live state, removing the TODO(S4.x) annotations from the integration test and making assert_no_compression_errors a real gate. Live now: compression_state, compression_dict_cap_reached, compression_compressed_objects, compression_total_uncompressed_bytes, compression_total_compressed_bytes, compression_ratio, compression_net_saved_bytes, compression_candidates_pending, compression_skipped_incompressible, compression_errors_total, compression_training_last_duration_ms, compression_training_last_sample_count. Still stubbed with explicit TODO(S4.x) (need rolling-window machinery): compression_live_ratio_10m, compression_compressions_per_sec, compression_decompressions_per_sec. Counter wiring: - compressed_objects: +1 createCompressedObject, -1 freeCompressedObject AND releaseCompressedBuffer (covers permanent-decompress + transient-view discard). Balanced across all 5 frame-lifecycle paths. - skipped_incompressible: net-savings-guard rejection site. - errors_total: 5 decoder corruption-class sites + worker job->err<0 (real ZSTD error, not the benign no-dict policy state). - training_last_*: compressionTrainState gains scan_start_ms (set on IDLE->SCANNING), last_duration_ms + last_sample_count (snapshotted on bio completion, success and abort paths). New getters in compression_train.h. Registry GC driven from cron (compression.c compressionCron): compressionRegistryTryGc() previously ran ONLY inside compressionRegistryAdd/Retire, so after a decompression drain the retiring dicts were never reclaimed in steady state — compression_known_dicts stayed pinned at its peak and compression_dict_cap_reached could never recover, i.e. both observable metrics were dishonest. Running GC each cron tick lets the registry shrink once frames drain, keeping both fields correct. (A residual idle-worker-pool QSBR reclaim edge — where the last dict retired against a quiescent pool isn't always reclaimed — remains; flagged separately for the registry/S1.x owner.) compression_state derivation (R2.10.1 enum): off->disabled, compression+dict->active, compression+nodict->idle (R2.1.7), decompression->idle. The raw compression_master_switch and compression_active_dict_id remain rendered as separate fields, so an operator still sees the unrolled config independently; compression_state is the design-mandated derived operational rollup. Tests — addressing review feedback that integration tests must not no-op when a precondition isn't met: - Added compression_clear_active_dict helper: reaches 'no active dict + registry headroom' reliably via master=decompression (the active pointer clears deterministically) + cap=16 + flushall. Used by the tests that need a clean dict baseline. No test no-ops anymore. - Removed the runnable/catch/skip guards from the INFO-fields, skipped-incompressible, and training tests. - Each test now sets the full eligibility config it depends on (notably compression-max-value-size 0) rather than inheriting state from a prior test in the shared server — a config-inheritance hazard that surfaced in --external mode. - dict_cap_reached test reads the live (stabilized) known_dicts and asserts the cap-flag relationship relative to it in both directions, instead of assuming an exact count (uncontrollable in --external shared-server mode). Fails with a clear diagnostic if the registry can't offer headroom below the max cap. - Tightened assertions per review: compressed_objects == N exactly, net_saved == unc-comp exactly, training sample_count == population exactly; gtest BioEndToEnd asserts training_last_* == 0 before training and == 500 (exact sample count) after. Verified: 392 gtests + integration (10 tests, normal mode) + unit/type pass; external-mode simulation (both compression suites against one shared server) passes 38/38; BUILD_ZSTD={yes,no} clean with -Werror; ASan-clean.

Adds training-subsystem observability to INFO compression and integration tests exercising the full training + retirement lifecycle. Metrics (additive to the last_duration_ms / last_sample_count fields wired by #39): compression_training_state idle/scanning/submitted/cooldown compression_training_scans_started total scans initiated compression_training_successes successful dict promotions compression_training_failures aborts + ZSTD errors compression_training_cooldown_until_ms compression_training_current_db compression_training_dicts_retired dicts that completed lifecycle (GC'd) compression_training_debug_frame_refs active retiring dict frame_refs Integration tests (tests/integration/compression-training.tcl): - First-training trigger fires and produces a usable dict - After training, new writes get compressed - Training does NOT fire below min keys - Training aborts + cooldown on ineligible values - Training scans across multiple databases - Dict retirement: compress 20 keys -> decompression mode -> read all -> dict GC'd (asserts frame_refs == 20, the regression guard for the double-count fix in #42) The retirement test is FLUSHALL-free: 200 training keys are SET before the dict exists (stay RAW, no refs); only 20 keys SET after the dict is active get compressed. Depends on #42 (install-path double-count fix), now merged to unstable.

PR #41 delivers the S2.13 COW audit (no violators) and the S6.1 merge-blocker test tests/unit/compression-cow-invariant.tcl. Record the audit outcome, the external:skip rationale, and the separately-tracked DEBUG DIGEST / getDecodedObject crash discovered during the audit (Phase-B blocker, follow-up PR). Update the Phase-A override note to reflect S4.x (#39) and S2.13 (#41) as done.

* test(compression): COW-invariant merge-blocker (S2.13) Add tests/unit/compression-cow-invariant.tcl, the runtime form of the R2.4.5 COW-audit checklist and a v1 merge blocker per design §7.2. The feature relies on two correctness invariants when a value is, or is being, compressed: (1) Decompress-before-mutate. lookupKey(...,LOOKUP_WRITE) permanently decompresses any OBJ_ENCODING_COMPRESSED value to RAW before the command handler runs, so no in-place byte-mutating command ever operates on a compressed frame (which would corrupt bytes or hit getDecodedObject()'s serverPanic — COMPRESSED is neither sdsEncodedObject nor INT). (2) Worker-snapshot immutability (R2.4.4). While a worker reads a value's sds bytes, the value is pinned (refcount>=2), so a concurrent in-place mutation COWs via dbUnshareStringValue, leaving the worker's bytes untouched. Both are code-discipline, not type-enforced. The test exercises them at runtime across every in-place string mutator (APPEND, SETRANGE, SETBIT, BITFIELD SET, GETSET, GETDEL, SET-overwrite), a 200-iteration mutation storm against the live worker pool, and a transient-view + write-path interaction. Each asserts the result matches the value semantics computed independently in Tcl, that compression_errors_total stays 0, and that the mutated value re-compresses and round-trips. The audit behind the test found no violators: t_string.c, bitops.c, hyperloglog.c, module.c (DMA write / truncate / OpenKey), and debug.c all honor the write-lookup decompress + dbUnshareStringValue discipline. Auto-discovered via the tests/unit/*.tcl glob; skips cleanly under BUILD_ZSTD=no (no gen-zstd-dict helper). Compression is reached via COMPRESSION SWEEP FORCE (the deterministic operator trigger) rather than async enqueue timing; startup raises compression-dict-max-versions and flushes for --external shared-server headroom. * test(compression): clean up shared-server state in COW-invariant test The COW-invariant test enables compression (master=compression, sweeper, an imported dict) and leaves compressed values in the keyspace. In --external shared-server mode this pollutes the server for every subsequent test file. The next file, unit/other, runs DEBUG DIGEST, whose computeDatasetDigest -> xorObjectDigest -> mixStringObjectDigest path iterates the kvstore directly and calls getDecodedObject() on the leftover compressed value, hitting serverPanic("Unknown encoding type") (object.c) -> server crash -> the external CI jobs (test-external-{standalone,cluster,nodebug}) fail with "I/O error reading reply". Add an end-of-file cleanup test that flushes the keyspace (dropping all compressed frames) and turns the feature off, leaving the shared server clean. dict-max-versions is left at 16 (not reset to the default 4) so later suites that import dicts into the shared registry have headroom. This is test hygiene only. The underlying DEBUG DIGEST crash on compressed values is a real, pre-existing product bug (a kvstore- direct reader that does not decompress, the same class PR #24 fixed for rdbSaveStringObject / AOF rewrite but missed for the digest path). It is tracked separately and must be fixed before transparency mode (full corpus under --compression) lands. * test(compression): tag COW-invariant test external:skip The previous cleanup-at-end approach did not fully solve the shared external-server pollution: this test churns global compression state and cannot restore a pristine registry (a dict that ever held frames is not reliably reclaimable today, so it lingers). That broke unit/type/compression's documented-defaults / clean-registry assertions, which run after this file in --external mode. Tag the file external:skip instead. The test deliberately reconfigures the server globally (master switch, sweeper, imported dict, compressed frames), which makes it a poor citizen on a shared, externally-managed server. Its COW-correctness value is fully exercised in normal (dedicated-server) mode, the primary CI mode, where the server is torn down between files. This matches the precedent of other stateful tests (dump, expire, introspection, ...). Removes the now-unneeded end-of-file cleanup test. * docs(plan): mark S2.13 (COW audit) + S6.1 (cow-invariant test) done PR #41 delivers the S2.13 COW audit (no violators) and the S6.1 merge-blocker test tests/unit/compression-cow-invariant.tcl. Record the audit outcome, the external:skip rationale, and the separately-tracked DEBUG DIGEST / getDecodedObject crash discovered during the audit (Phase-B blocker, follow-up PR). Update the Phase-A override note to reflect S4.x (#39) and S2.13 (#41) as done. * test(compression): fix stale comments after external:skip change The startup and cow_configure comments still described running 'against the same server as the integration suite in --external mode'. With the file now tagged external:skip that's inaccurate — it runs only against a fresh dedicated server. Reword to reflect that the startup flush + cap headroom are defensive clean-slate setup, not shared-server handling.

* feat(S1/S4): training lifecycle metrics + integration tests Adds training-subsystem observability to INFO compression and integration tests exercising the full training + retirement lifecycle. Metrics (additive to the last_duration_ms / last_sample_count fields wired by #39): compression_training_state idle/scanning/submitted/cooldown compression_training_scans_started total scans initiated compression_training_successes successful dict promotions compression_training_failures aborts + ZSTD errors compression_training_cooldown_until_ms compression_training_current_db compression_training_dicts_retired dicts that completed lifecycle (GC'd) compression_training_debug_frame_refs active retiring dict frame_refs Integration tests (tests/integration/compression-training.tcl): - First-training trigger fires and produces a usable dict - After training, new writes get compressed - Training does NOT fire below min keys - Training aborts + cooldown on ineligible values - Training scans across multiple databases - Dict retirement: compress 20 keys -> decompression mode -> read all -> dict GC'd (asserts frame_refs == 20, the regression guard for the double-count fix in #42) The retirement test is FLUSHALL-free: 200 training keys are SET before the dict exists (stay RAW, no refs); only 20 keys SET after the dict is active get compressed. Depends on #42 (install-path double-count fix), now merged to unstable. * test(compression): simplify first-training dict assertions The first-training test had a confusing assertion: assert {$dict_after != $dict_before || $dict_before > 0} The '|| dict_before > 0' clause made it vacuously true whenever a dict already existed — the opposite of the intent. It was leftover from shared-server (external) handling, but the test is external:skip and always runs on a fresh instance where dict_before == 0. Replaced with a direct fresh-instance check (active_dict_id == 0 before, > 0 after). Also dropped an unused dict_before capture in the 'does NOT fire' test (only known_dicts is asserted there). * test(compression): reorder training tests so trigger gates are real The old order trained a dict in test 1, leaving an active dict for the rest of the shared-instance block. That made the negative-trigger tests (below-min, ineligible-abort) pass vacuously: with a dict already active, first-training can never fire regardless of key count or eligibility (the trigger is gated on !active). Reordered so all no-active-dict checks run first, then training fires once: 1. Below min keys -> no trigger (asserts scans_started == 0) 2. Enough keys but none eligible -> scan fires + aborts (scans_started == 1, failures == 1) — metric-driven, not timing 3. First-training across db0+db1 -> dict promoted (merges the old multi-DB test; only one first-training can fire per active dict) 4. After training -> new writes compressed 5. Active dict blocks a new first-training (new test for the 'no retrain while a dict is active' invariant) Test 2 arms the 30s cooldown; test 3 uses a 60s wait_for_trained_dict to absorb it. Negative tests now assert on scans_started / failures / successes counters instead of sleeping and hoping.

github-actions Bot assigned ikolomi Jun 18, 2026

ikolomi requested a review from GilboaAWS June 18, 2026 11:07

ikolomi force-pushed the ikolomi/s4-1-counter-wiring branch 5 times, most recently from 2133fee to 5ab5d86 Compare June 18, 2026 19:43

ikolomi commented Jun 18, 2026

View reviewed changes

Comment thread src/compression.c