Skip to content

[S1/S4] Training metrics (INFO compression) + integration tests#40

Merged
ikolomi merged 3 commits into
unstablefrom
gilboa/s1-training-metrics-and-integ-tests
Jun 23, 2026
Merged

[S1/S4] Training metrics (INFO compression) + integration tests#40
ikolomi merged 3 commits into
unstablefrom
gilboa/s1-training-metrics-and-integ-tests

Conversation

@GilboaAWS

@GilboaAWS GilboaAWS commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Training-subsystem observability + integration tests for the S1 dictionary-lifecycle track. Rebased onto unstable (now includes #42 double-count fix and #39 observability counters).

1. Training metrics (INFO compression)

Additive to the last_duration_ms / last_sample_count fields wired by #39:

Field Type Purpose
compression_training_state string idle / scanning / submitted / cooldown
compression_training_scans_started counter Scans initiated
compression_training_successes counter Successful dict promotions
compression_training_failures counter Aborts (insufficient samples) + ZSTD errors
compression_training_cooldown_until_ms gauge Cooldown expiry timestamp (0 = not cooling)
compression_training_current_db gauge DB the scan is currently on
compression_training_dicts_retired counter Dicts that completed the full lifecycle (freed by GC)

A periodic compressionRegistryTryGc() in compressionCron (already added by #39) keeps dicts_retired / known_dicts honest after a drain.

2. Integration tests

tests/integration/compression-training.tcl — real training machinery, no pre-imported dicts. Own start_server blocks, external:skip, skipped under BUILD_ZSTD=no.

  1. First-training trigger fires and produces a usable dict
  2. After training, new writes get compressed (round-trip)
  3. Training does NOT fire below min keys
  4. Training aborts + cooldown on ineligible values
  5. Training scans across multiple databases
  6. Dict retirement lifecycle — compress 20 keys → flip to decompression → read all → dict GC'd

The retirement test is FLUSHALL-free: 200 training keys are SET before the dict exists (stay RAW, hold no refs); only the 20 keys SET after the dict is active get compressed. It asserts compression_compressed_objects == 20 (the regression guard for #42's double-count fix), then == 0 after reads, then dicts_retired increments.

Note on the frame-ref signal: uses the existing compression_compressed_objects counter rather than a per-dict gauge. Each compressed robj holds exactly one dict frame-ref (createCompressedObject +1 / freeCompressedObject+releaseCompressedBuffer −1), so the counter is the global frame-ref total — well-defined across multiple retiring dicts. (An earlier revision had a debug_frame_refs field that only read dicts[0] — removed as misleading.)

Testing

  • ./runtest --single integration/compression-training6/6 pass
  • make test-unit BUILD_ZSTD=yes392/392 pass
  • Clean build with BUILD_ZSTD={yes,no}, no duplicate INFO fields.

@GilboaAWS GilboaAWS force-pushed the gilboa/s1-training-metrics-and-integ-tests branch 4 times, most recently from a9394f7 to dbf9385 Compare June 21, 2026 12:24
@GilboaAWS GilboaAWS force-pushed the gilboa/s1-training-metrics-and-integ-tests branch from dbf9385 to 2241fa8 Compare June 21, 2026 12:30
@GilboaAWS GilboaAWS changed the base branch from unstable to gilboa/fix-install-double-count June 21, 2026 12:30
GilboaAWS added a commit that referenced this pull request Jun 21, 2026
compressionInstall took a second dict frame-ref via job->dict_id on
top of the IncRef that createCompressedObject already takes (reading
dict_id from the frame header's alg_meta). The matching DecRef in
freeCompressedObject / releaseCompressedBuffer only fires once, so
every compressed value leaked one ref: +2 on install, -1 on
free/decompress. The leaked ref kept the dict's frame_refs > 0
forever, so a retiring dict could never satisfy canFree() and was
never reclaimed by GC.

Root cause: an early design checklist listed an explicit install-time
compressionRegistryIncRef as a step. That predated PR #8 moving the
IncRef into createCompressedObject. When the real install path landed
(S2.7) it followed the stale checklist and added the now-redundant
second ref.

Fix: remove the redundant IncRef. The object-lifecycle model is the
single source of truth — createCompressedObject (sole producer of
compressed robjs) takes the ref, freeCompressedObject /
releaseCompressedBuffer release it, all keyed on the header alg_meta.

Verified by the dict-retirement integration test (PR #40, stacked on
this): 20 compressed keys now produce exactly 20 frame_refs (was 40),
and the dict is reclaimed after all values decompress.
@GilboaAWS GilboaAWS changed the base branch from gilboa/fix-install-double-count to unstable June 21, 2026 12:52
@GilboaAWS GilboaAWS force-pushed the gilboa/s1-training-metrics-and-integ-tests branch from 2241fa8 to f2d8cde Compare June 21, 2026 12:52
Adds training-subsystem observability to INFO compression and
integration tests exercising the full training + retirement lifecycle.

Metrics (additive to the last_duration_ms / last_sample_count fields
wired by #39):
  compression_training_state          idle/scanning/submitted/cooldown
  compression_training_scans_started  total scans initiated
  compression_training_successes      successful dict promotions
  compression_training_failures       aborts + ZSTD errors
  compression_training_cooldown_until_ms
  compression_training_current_db
  compression_training_dicts_retired  dicts that completed lifecycle (GC'd)
  compression_training_debug_frame_refs  active retiring dict frame_refs

Integration tests (tests/integration/compression-training.tcl):
  - First-training trigger fires and produces a usable dict
  - After training, new writes get compressed
  - Training does NOT fire below min keys
  - Training aborts + cooldown on ineligible values
  - Training scans across multiple databases
  - Dict retirement: compress 20 keys -> decompression mode ->
    read all -> dict GC'd (asserts frame_refs == 20, the regression
    guard for the double-count fix in #42)

The retirement test is FLUSHALL-free: 200 training keys are SET before
the dict exists (stay RAW, no refs); only 20 keys SET after the dict
is active get compressed.

Depends on #42 (install-path double-count fix), now merged to unstable.
@GilboaAWS GilboaAWS force-pushed the gilboa/s1-training-metrics-and-integ-tests branch from f2d8cde to b4d79f2 Compare June 21, 2026 13:37
The first-training test had a confusing assertion:
  assert {$dict_after != $dict_before || $dict_before > 0}
The '|| dict_before > 0' clause made it vacuously true whenever a
dict already existed — the opposite of the intent. It was leftover
from shared-server (external) handling, but the test is external:skip
and always runs on a fresh instance where dict_before == 0.

Replaced with a direct fresh-instance check (active_dict_id == 0
before, > 0 after). Also dropped an unused dict_before capture in the
'does NOT fire' test (only known_dicts is asserted there).
The old order trained a dict in test 1, leaving an active dict for the
rest of the shared-instance block. That made the negative-trigger tests
(below-min, ineligible-abort) pass vacuously: with a dict already
active, first-training can never fire regardless of key count or
eligibility (the trigger is gated on !active).

Reordered so all no-active-dict checks run first, then training fires
once:

  1. Below min keys -> no trigger (asserts scans_started == 0)
  2. Enough keys but none eligible -> scan fires + aborts
     (scans_started == 1, failures == 1) — metric-driven, not timing
  3. First-training across db0+db1 -> dict promoted (merges the old
     multi-DB test; only one first-training can fire per active dict)
  4. After training -> new writes compressed
  5. Active dict blocks a new first-training (new test for the
     'no retrain while a dict is active' invariant)

Test 2 arms the 30s cooldown; test 3 uses a 60s wait_for_trained_dict
to absorb it. Negative tests now assert on scans_started / failures /
successes counters instead of sleeping and hoping.
@GilboaAWS GilboaAWS marked this pull request as ready for review June 22, 2026 13:05
@GilboaAWS GilboaAWS requested a review from ikolomi June 23, 2026 08:45

@ikolomi ikolomi left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ikolomi ikolomi merged commit 21d7384 into unstable Jun 23, 2026
80 checks passed
@GilboaAWS GilboaAWS deleted the gilboa/s1-training-metrics-and-integ-tests branch June 23, 2026 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants