[S4.1 / Phase A item 1] Wire INFO compression observability counters#39
Conversation
2133fee to
5ab5d86
Compare
| set runnable [expr {$adid_pre == 0}] | ||
|
|
||
| if {!$runnable} { | ||
| # Active dict already present — auto-train trigger can't |
There was a problem hiding this comment.
also here, should the dict retirement already work, if so, why not to use this mechanic to retire dicts?
There was a problem hiding this comment.
Good prompt — I dug into this. Retirement of the active dict works reliably: master=decompression auto-retires it (R2.1.5) and compression_active_dict_id clears to 0 every time. I now use exactly that mechanic in compression_clear_active_dict to reach the 'no active dict' precondition.
Full reclamation (known_dicts draining back down) was NOT working, though: compressionRegistryTryGc() only ran inside compressionRegistryAdd/Retire, never from cron — so after a drain, retiring dicts lingered and known_dicts/dict_cap_reached stayed pinned at their peak forever. Since this PR is about making those exact INFO fields honest, I added a compressionRegistryTryGc() call to compressionCron (cheap: scans ≤16 slots, no-op when nothing is retiring). With it, known_dicts drains as frames are released.
One residual gap remains that I did NOT fix here (it's registry/QSBR territory): when the last dict retires against a fully-idle worker pool, the QSBR grace-barrier wake can be missed, leaving that one dict un-reclaimed until the next worker activity. So a drain reaches a small stable residue rather than strictly 0. The cap test is written to tolerate that (it reads the live stabilized known_dicts and asserts the cap-flag relationship relative to it, rather than assuming 0/2). Flagging the residual reclaim edge for the registry owner as a follow-up — want me to file it against S1.x?
There was a problem hiding this comment.
Update after digging into the root cause (option A — fix, not work around):
There were actually three layered issues behind 'dicts don't retire without compression work':
-
GC never driven from cron —
compressionRegistryTryGc()ran only inside Add/Retire. Fixed: added it tocompressionCron. -
Idle-pool QSBR lost-wake — the one-shot grace-barrier wake in
startRetirementcan be missed when the worker pool is fully idle, so a retiring frame-free dict's worker-generation snapshot is never passed and it's never reclaimed. Fixed: the cron GC now re-issues the wake whenever a retiring, frame-ref-free dict is held up only on worker generation (blockedOnWorkerGen), so the broadcast eventually lands while the worker is parked. Verified: a registry with several frame-free retiring dicts now drains them (4 → 1 in my repro). -
Frame-ref accounting leak (still open, NOT fixed here) — the residual '1' in that repro is a dict that held compressed frames. After
flushallfrees every frame (compressed_objects → 0, total_compressed_bytes → 0), that dict still reportsframe_refs == 20. So a dict that ever held frames is permanently pinned and can never retire. The frame header dict_id and the install-time incRef id are bothactive->dict_id, so the mismatch is subtler (looks like an incRef/decRef imbalance per frame). This is in the registry/worker refcount path, predates this PR, and needs the registry owner's analysis — I don't think it's safe to patch blind from an observability PR.
The integration tests are written to be robust to #3: the cap test reads the live (stabilized) known_dicts and asserts the flag relationship relative to it, and compression_clear_active_dict only relies on the active pointer clearing (which is reliable) — so no test no-ops and none depends on a full drain to 0.
@GilboaAWS — want me to file #3 as a separate S1.x/registry issue with this repro?
2677c9f to
38dd3c5
Compare
Brings the stubbed-at-:0 fields in compressionRenderFields to live
state, removing the TODO(S4.x) annotations from the integration test
and making assert_no_compression_errors a real gate.
Live now:
compression_state, compression_dict_cap_reached,
compression_compressed_objects, compression_total_uncompressed_bytes,
compression_total_compressed_bytes, compression_ratio,
compression_net_saved_bytes, compression_candidates_pending,
compression_skipped_incompressible, compression_errors_total,
compression_training_last_duration_ms,
compression_training_last_sample_count.
Still stubbed with explicit TODO(S4.x) (need rolling-window machinery):
compression_live_ratio_10m, compression_compressions_per_sec,
compression_decompressions_per_sec.
Counter wiring:
- compressed_objects: +1 createCompressedObject, -1
freeCompressedObject AND releaseCompressedBuffer (covers
permanent-decompress + transient-view discard). Balanced across
all 5 frame-lifecycle paths.
- skipped_incompressible: net-savings-guard rejection site.
- errors_total: 5 decoder corruption-class sites + worker
job->err<0 (real ZSTD error, not the benign no-dict policy state).
- training_last_*: compressionTrainState gains scan_start_ms (set
on IDLE->SCANNING), last_duration_ms + last_sample_count
(snapshotted on bio completion, success and abort paths). New
getters in compression_train.h.
Registry GC driven from cron (compression.c compressionCron):
compressionRegistryTryGc() previously ran ONLY inside
compressionRegistryAdd/Retire, so after a decompression drain the
retiring dicts were never reclaimed in steady state —
compression_known_dicts stayed pinned at its peak and
compression_dict_cap_reached could never recover, i.e. both
observable metrics were dishonest. Running GC each cron tick lets
the registry shrink once frames drain, keeping both fields correct.
(A residual idle-worker-pool QSBR reclaim edge — where the last
dict retired against a quiescent pool isn't always reclaimed —
remains; flagged separately for the registry/S1.x owner.)
compression_state derivation (R2.10.1 enum): off->disabled,
compression+dict->active, compression+nodict->idle (R2.1.7),
decompression->idle. The raw compression_master_switch and
compression_active_dict_id remain rendered as separate fields, so an
operator still sees the unrolled config independently; compression_state
is the design-mandated derived operational rollup.
Tests — addressing review feedback that integration tests must not
no-op when a precondition isn't met:
- Added compression_clear_active_dict helper: reaches 'no active
dict + registry headroom' reliably via master=decompression (the
active pointer clears deterministically) + cap=16 + flushall.
Used by the tests that need a clean dict baseline. No test no-ops
anymore.
- Removed the runnable/catch/skip guards from the INFO-fields,
skipped-incompressible, and training tests.
- Each test now sets the full eligibility config it depends on
(notably compression-max-value-size 0) rather than inheriting
state from a prior test in the shared server — a config-inheritance
hazard that surfaced in --external mode.
- dict_cap_reached test reads the live (stabilized) known_dicts and
asserts the cap-flag relationship relative to it in both
directions, instead of assuming an exact count (uncontrollable in
--external shared-server mode). Fails with a clear diagnostic if
the registry can't offer headroom below the max cap.
- Tightened assertions per review: compressed_objects == N exactly,
net_saved == unc-comp exactly, training sample_count == population
exactly; gtest BioEndToEnd asserts training_last_* == 0 before
training and == 500 (exact sample count) after.
Verified: 392 gtests + integration (10 tests, normal mode) +
unit/type pass; external-mode simulation (both compression suites
against one shared server) passes 38/38; BUILD_ZSTD={yes,no} clean
with -Werror; ASan-clean.
38dd3c5 to
f32a800
Compare
Adds training-subsystem observability to INFO compression and integration tests exercising the full training + retirement lifecycle. Metrics (additive to the last_duration_ms / last_sample_count fields wired by #39): compression_training_state idle/scanning/submitted/cooldown compression_training_scans_started total scans initiated compression_training_successes successful dict promotions compression_training_failures aborts + ZSTD errors compression_training_cooldown_until_ms compression_training_current_db compression_training_dicts_retired dicts that completed lifecycle (GC'd) compression_training_debug_frame_refs active retiring dict frame_refs Integration tests (tests/integration/compression-training.tcl): - First-training trigger fires and produces a usable dict - After training, new writes get compressed - Training does NOT fire below min keys - Training aborts + cooldown on ineligible values - Training scans across multiple databases - Dict retirement: compress 20 keys -> decompression mode -> read all -> dict GC'd (asserts frame_refs == 20, the regression guard for the double-count fix in #42) The retirement test is FLUSHALL-free: 200 training keys are SET before the dict exists (stay RAW, no refs); only 20 keys SET after the dict is active get compressed. Depends on #42 (install-path double-count fix), now merged to unstable.
Adds training-subsystem observability to INFO compression and integration tests exercising the full training + retirement lifecycle. Metrics (additive to the last_duration_ms / last_sample_count fields wired by #39): compression_training_state idle/scanning/submitted/cooldown compression_training_scans_started total scans initiated compression_training_successes successful dict promotions compression_training_failures aborts + ZSTD errors compression_training_cooldown_until_ms compression_training_current_db compression_training_dicts_retired dicts that completed lifecycle (GC'd) compression_training_debug_frame_refs active retiring dict frame_refs Integration tests (tests/integration/compression-training.tcl): - First-training trigger fires and produces a usable dict - After training, new writes get compressed - Training does NOT fire below min keys - Training aborts + cooldown on ineligible values - Training scans across multiple databases - Dict retirement: compress 20 keys -> decompression mode -> read all -> dict GC'd (asserts frame_refs == 20, the regression guard for the double-count fix in #42) The retirement test is FLUSHALL-free: 200 training keys are SET before the dict exists (stay RAW, no refs); only 20 keys SET after the dict is active get compressed. Depends on #42 (install-path double-count fix), now merged to unstable.
PR #41 delivers the S2.13 COW audit (no violators) and the S6.1 merge-blocker test tests/unit/compression-cow-invariant.tcl. Record the audit outcome, the external:skip rationale, and the separately-tracked DEBUG DIGEST / getDecodedObject crash discovered during the audit (Phase-B blocker, follow-up PR). Update the Phase-A override note to reflect S4.x (#39) and S2.13 (#41) as done.
* test(compression): COW-invariant merge-blocker (S2.13)
Add tests/unit/compression-cow-invariant.tcl, the runtime form of the
R2.4.5 COW-audit checklist and a v1 merge blocker per design §7.2.
The feature relies on two correctness invariants when a value is, or is
being, compressed:
(1) Decompress-before-mutate. lookupKey(...,LOOKUP_WRITE) permanently
decompresses any OBJ_ENCODING_COMPRESSED value to RAW before the
command handler runs, so no in-place byte-mutating command ever
operates on a compressed frame (which would corrupt bytes or hit
getDecodedObject()'s serverPanic — COMPRESSED is neither
sdsEncodedObject nor INT).
(2) Worker-snapshot immutability (R2.4.4). While a worker reads a
value's sds bytes, the value is pinned (refcount>=2), so a
concurrent in-place mutation COWs via dbUnshareStringValue,
leaving the worker's bytes untouched.
Both are code-discipline, not type-enforced. The test exercises them at
runtime across every in-place string mutator (APPEND, SETRANGE, SETBIT,
BITFIELD SET, GETSET, GETDEL, SET-overwrite), a 200-iteration mutation
storm against the live worker pool, and a transient-view + write-path
interaction. Each asserts the result matches the value semantics
computed independently in Tcl, that compression_errors_total stays 0,
and that the mutated value re-compresses and round-trips.
The audit behind the test found no violators: t_string.c, bitops.c,
hyperloglog.c, module.c (DMA write / truncate / OpenKey), and debug.c
all honor the write-lookup decompress + dbUnshareStringValue discipline.
Auto-discovered via the tests/unit/*.tcl glob; skips cleanly under
BUILD_ZSTD=no (no gen-zstd-dict helper). Compression is reached via
COMPRESSION SWEEP FORCE (the deterministic operator trigger) rather than
async enqueue timing; startup raises compression-dict-max-versions and
flushes for --external shared-server headroom.
* test(compression): clean up shared-server state in COW-invariant test
The COW-invariant test enables compression (master=compression,
sweeper, an imported dict) and leaves compressed values in the
keyspace. In --external shared-server mode this pollutes the server
for every subsequent test file. The next file, unit/other, runs
DEBUG DIGEST, whose computeDatasetDigest -> xorObjectDigest ->
mixStringObjectDigest path iterates the kvstore directly and calls
getDecodedObject() on the leftover compressed value, hitting
serverPanic("Unknown encoding type") (object.c) -> server crash ->
the external CI jobs (test-external-{standalone,cluster,nodebug})
fail with "I/O error reading reply".
Add an end-of-file cleanup test that flushes the keyspace (dropping
all compressed frames) and turns the feature off, leaving the shared
server clean. dict-max-versions is left at 16 (not reset to the
default 4) so later suites that import dicts into the shared registry
have headroom.
This is test hygiene only. The underlying DEBUG DIGEST crash on
compressed values is a real, pre-existing product bug (a kvstore-
direct reader that does not decompress, the same class PR #24 fixed
for rdbSaveStringObject / AOF rewrite but missed for the digest
path). It is tracked separately and must be fixed before transparency
mode (full corpus under --compression) lands.
* test(compression): tag COW-invariant test external:skip
The previous cleanup-at-end approach did not fully solve the shared
external-server pollution: this test churns global compression state
and cannot restore a pristine registry (a dict that ever held frames
is not reliably reclaimable today, so it lingers). That broke
unit/type/compression's documented-defaults / clean-registry
assertions, which run after this file in --external mode.
Tag the file external:skip instead. The test deliberately reconfigures
the server globally (master switch, sweeper, imported dict, compressed
frames), which makes it a poor citizen on a shared, externally-managed
server. Its COW-correctness value is fully exercised in normal
(dedicated-server) mode, the primary CI mode, where the server is torn
down between files. This matches the precedent of other stateful tests
(dump, expire, introspection, ...).
Removes the now-unneeded end-of-file cleanup test.
* docs(plan): mark S2.13 (COW audit) + S6.1 (cow-invariant test) done
PR #41 delivers the S2.13 COW audit (no violators) and the S6.1
merge-blocker test tests/unit/compression-cow-invariant.tcl. Record the
audit outcome, the external:skip rationale, and the separately-tracked
DEBUG DIGEST / getDecodedObject crash discovered during the audit
(Phase-B blocker, follow-up PR). Update the Phase-A override note to
reflect S4.x (#39) and S2.13 (#41) as done.
* test(compression): fix stale comments after external:skip change
The startup and cow_configure comments still described running 'against
the same server as the integration suite in --external mode'. With the
file now tagged external:skip that's inaccurate — it runs only against a
fresh dedicated server. Reword to reflect that the startup flush + cap
headroom are defensive clean-slate setup, not shared-server handling.
* feat(S1/S4): training lifecycle metrics + integration tests Adds training-subsystem observability to INFO compression and integration tests exercising the full training + retirement lifecycle. Metrics (additive to the last_duration_ms / last_sample_count fields wired by #39): compression_training_state idle/scanning/submitted/cooldown compression_training_scans_started total scans initiated compression_training_successes successful dict promotions compression_training_failures aborts + ZSTD errors compression_training_cooldown_until_ms compression_training_current_db compression_training_dicts_retired dicts that completed lifecycle (GC'd) compression_training_debug_frame_refs active retiring dict frame_refs Integration tests (tests/integration/compression-training.tcl): - First-training trigger fires and produces a usable dict - After training, new writes get compressed - Training does NOT fire below min keys - Training aborts + cooldown on ineligible values - Training scans across multiple databases - Dict retirement: compress 20 keys -> decompression mode -> read all -> dict GC'd (asserts frame_refs == 20, the regression guard for the double-count fix in #42) The retirement test is FLUSHALL-free: 200 training keys are SET before the dict exists (stay RAW, no refs); only 20 keys SET after the dict is active get compressed. Depends on #42 (install-path double-count fix), now merged to unstable. * test(compression): simplify first-training dict assertions The first-training test had a confusing assertion: assert {$dict_after != $dict_before || $dict_before > 0} The '|| dict_before > 0' clause made it vacuously true whenever a dict already existed — the opposite of the intent. It was leftover from shared-server (external) handling, but the test is external:skip and always runs on a fresh instance where dict_before == 0. Replaced with a direct fresh-instance check (active_dict_id == 0 before, > 0 after). Also dropped an unused dict_before capture in the 'does NOT fire' test (only known_dicts is asserted there). * test(compression): reorder training tests so trigger gates are real The old order trained a dict in test 1, leaving an active dict for the rest of the shared-instance block. That made the negative-trigger tests (below-min, ineligible-abort) pass vacuously: with a dict already active, first-training can never fire regardless of key count or eligibility (the trigger is gated on !active). Reordered so all no-active-dict checks run first, then training fires once: 1. Below min keys -> no trigger (asserts scans_started == 0) 2. Enough keys but none eligible -> scan fires + aborts (scans_started == 1, failures == 1) — metric-driven, not timing 3. First-training across db0+db1 -> dict promoted (merges the old multi-DB test; only one first-training can fire per active dict) 4. After training -> new writes compressed 5. Active dict blocks a new first-training (new test for the 'no retrain while a dict is active' invariant) Test 2 arms the 30s cooldown; test 3 uses a 60s wait_for_trained_dict to absorb it. Negative tests now assert on scans_started / failures / successes counters instead of sleeping and hoping.
Summary
Phase A item 1 of the post-S2 strategy. Wires the previously stubbed-at-
:0observability fields incompressionRenderFieldsto live state, removes theTODO(S4.x)annotations fromtests/integration/compression.tcl, and makesassert_no_compression_errorsa real gate. Only the three rolling-window rate/ratio fields remain stubbed (they need EMA / per-second-rate machinery — a self-contained follow-up).Along the way this also fixes two dict-retirement reclamation bugs that were keeping
compression_known_dicts/compression_dict_cap_reacheddishonest, and surfaces a third (deeper) frame-ref accounting bug that is left to the registry owner.INFO field status
Wired to live state (12):
compression_statemaster_switch+ active-dict presencecompression_dict_cap_reachedknown_dicts >= compression-dict-max-versionscompression_compressed_objectscompression_total_uncompressed_bytescompression_total_compressed_bytescompression_ratiocompressed/uncompressed, 4 decimal digitscompression_net_saved_bytescompressionGetSavingsBytes()compression_candidates_pendingcompressionWorkersGetCandidatesPending()compression_skipped_incompressiblecompression_errors_totalcompression_training_last_duration_mscompressionTrainState, snapshotted on bio completioncompression_training_last_sample_count(The master-switch / sweeper / back-pressure fields —
compression_master_switch,compression_automatic_sweeper*,compression_sweeper_running,compression_active_dict_id,compression_known_dicts,compression_candidates_dropped_total,compression_sweep_backpressure_total,compression_sweep_pacing_sleeps_total,compression_outbox_backpressure_total— were already live from prior PRs.)Still stubbed at 0, with explicit
TODO(S4.x)at the emit site (3):compression_live_ratio_10mcompression_compressions_per_seccompression_decompressions_per_secThe rolling-rate machinery is its own sub-feature; folding it in would more than double this PR. It gets a follow-up after Phase A's S2.13 + S5.1.
compression_statederivationcompression_stateis a derived operational rollup (design R2.10.1). The raw inputs remain rendered as their own fields (compression_master_switch,compression_active_dict_id), so an operator still reads the unrolled config independently.master_switchcompression_stateoffdisabledcompressionactivecompressionidle(R2.1.7 third state)decompressionidle(active dict retired per R2.1.5; reads still work via retiring dicts)trainingis reserved for an in-flight training run and lands with the rolling-rate follow-up.Counter wiring details
compression_compressed_objectsmatches the existing byte-counter lifecycle: +1 increateCompressedObject, −1 infreeCompressedObjectANDreleaseCompressedBuffer(the latter coverscompressionPermanentlyDecompressand the transient-view discard). Balanced +1/−1 across all 5 frame-lifecycle paths.compression_skipped_incompressiblebumped at the net-savings-guard rejection site (compression_workers.cdrain handler).compression_errors_totalbumped at the 5 decoder corruption-class sites incompression.c(bad alg_magic, missing dict, DCtx alloc, ZSTD error, size mismatch) and at the worker drain handler whenjob->err < 0(real ZSTD error, not the benign no-dict policy state).compression_training_last_*:compressionTrainStategainsscan_start_ms(set on IDLE→SCANNING),last_duration_ms,last_sample_count(snapshotted on bio completion — both the success and abort-on-insufficient-samples paths). New getters incompression_train.h. (S1.x training landed via [S1.2/S1.3] Training sampler + BIO_COMPRESSION_TRAIN job #20/fix(S1.2): NULL-guard server.db[j] iteration in compressionTrainCron #25, so these are wired here rather than deferred.)Dict-retirement reclamation fixes
Making
known_dicts/dict_cap_reachedhonest exposed that retiring dicts weren't actually being reclaimed in steady state. Two bugs, both fixed:GC never driven from cron.
compressionRegistryTryGc()ran only insidecompressionRegistryAdd/Retire. After amaster=decompressiondrain, retiring dicts lingered forever andknown_dicts/dict_cap_reachedstayed pinned at their peak. Fixed by callingcompressionRegistryTryGc()fromcompressionCron(cheap: scans ≤16 slots, no-op when nothing is retiring).Idle-pool QSBR lost-wake. The one-shot grace-barrier wake in
startRetirementcan be missed when the worker pool is fully idle (a worker not parked at the broadcast instant never sees it, and with no compression jobs there is nothing to advance its quiescent generation). The dict's gen snapshot is then never passed and it never reclaims. Fixed by having the cron GC re-issue the wake whenever a retiring, frame-ref-free dict is blocked only on worker generation (blockedOnWorkerGen), so the broadcast eventually lands while the worker is parked. Refactored the shared gen-check intoworkersQuiescedPastRetire()socanFreeandblockedOnWorkerGendon't duplicate the loop.Known follow-up (not in this PR): frame-ref accounting leak
A dict that ever held compressed frames keeps
frame_refspinned even after every frame is freed (verified: afterflushall,compressed_objects/total_*_bytesall reach 0 but the dict staysRETIRING frame_refs=N). So a dict that held frames can never retire. This is an incRef/decRef imbalance in the registry/worker refcount path, predates this PR, and needs the registry owner's analysis — @GilboaAWS is taking it as a separate S1.x item. The tests here are written to be robust to it (see below), so it doesn't block this PR.Tests
Per review feedback that integration tests must not no-op when a precondition isn't met:
compression_clear_active_dictreaches a clean precondition reliably (including in--externalshared-server mode) using only invariants that always hold:master=decompressionclears the active pointer deterministically,compression-dict-max-versions 16guarantees registry headroom, andflushallzeroes the frame counters. It does not depend on a full drain to 0 (which the frame-ref leak above can prevent).compression-max-value-size 0) rather than inheriting state from a prior test in the shared server — a config-inheritance hazard that surfaced in--externalmode (a prior test'scompression-max-value-size 512had made 1 KB keys ineligible).dict_cap_reachedtest reads the live, stabilizedknown_dictsand asserts the flag relationship in both directions relative to it (cap above → 0, cap == → 1, cap above → 0 again), rather than assuming an exact count (uncontrollable in--externalmode). Fails with a clear diagnostic if the registry can't offer headroom below the max cap.compressed_objects == Nexactly,net_saved == unc-compexactly, trainingsample_count == populationexactly. The gtestBioEndToEndassertstraining_last_*are0before training andlast_sample_count == 500exactly after.Per-field integration coverage:
compression_statecompression_dict_cap_reachedcompression_compressed_objectscompression_total_uncompressed_bytes/_compressed_bytescompression_ratiocompression_net_saved_bytescompression_candidates_pendingcompression_skipped_incompressiblecompression_errors_totalassert_no_compression_errorson every test (now a real gate)compression_training_last_*BioEndToEndTest-framework gotcha (documented inline)
Calling
returnfrom inside atest {} {body}block crashes the Tcl framework's per-test cleanup (can't unset ::valkey::attributes(N)). Tests useif/else/ arunnable-free structure instead.Verification
-Werrorunder bothBUILD_ZSTD=yesandBUILD_ZSTD=no.--externalshared-server simulation (both compression suites against one server) — the mode that previously caught the no-op / inherited-config issues.-fsanitize=address).plan.md
S4.1 marked
[~]in-progress (rolling-rate fields deferred). Phase A item 1 of the strategy doc; items 2 (S2.13 COW audit) and 3 (S5.1 benchmark scaffolding) come next.