[S2.9 / C3] Sweeper engine + COMPRESSION SWEEP FORCE#31
Merged
Conversation
Implements the third and final piece of the declarative master-switch
model from PR-B: a state-machine sweeper engine plus the operator's
catch-up affordance, COMPRESSION SWEEP FORCE.
State machine
-------------
DISABLED ←→ IDLE ←→ SCANNING ←→ SLEEPING
Triggers (R2.1.5 + R2.1.4):
- master-switch direction change (off→compression, compression→
decompression, etc.) with sweeper=enabled → reset cursor, enter
SCANNING. Even mid-pass: the sweeper takes its direction from the
current master switch on every cron tick.
- sweeper config flips disabled→enabled (with master ≠ off) →
enter SCANNING.
- SLEEPING with `compression-active-sweeper-interval > 0` and
interval elapsed → enter SCANNING.
- COMPRESSION SWEEP FORCE pending → enter SCANNING regardless of
sweeper config (operator-driven; rejected when master=off).
After a pass completes:
- sweeper=disabled (force-pass committed mid-flight) → DISABLED.
- sweeper=enabled, interval > 0 → SLEEPING.
- sweeper=enabled, interval = 0 → IDLE (no periodic re-runs;
R2.1.3 default — minimizes idle-keyspace iteration).
The cardinal correctness property: with interval=0 and no direction
change / FORCE / config flip, IDLE stays IDLE — the sweeper does NOT
loop. Verified by gtest IdleWithoutDirectionChangeStaysIdle (10 ticks
with no change, pass count remains 1).
Per-key dispatch (sweepScanCallback)
------------------------------------
Direction taken from master-switch at scan time:
- master == compression: enqueue eligible RAW value via
compressionEnqueueCandidate (worker pool compresses).
- master == decompression: call compressionPermanentlyDecompress
directly on the main thread (worker pool unused for
drain-direction work, per R2.1.2).
- master == off: callback unreachable (cron returns early).
Iteration is paced via tickBudgetUs() = (1_000_000 / hz) ×
compression-sweep-max-cpu-pct / 100, with a 1ms floor. Default
hz=10 + 25% pacing = 25ms/tick. The kvstoreScan loop checks
budget after every bucket; cursor + current_db persist across
ticks so a multi-tick pass resumes where it left off.
Sub-tick edge bug + fix
-----------------------
Apply hooks fire synchronously on every CONFIG SET, but the cron
polls server.compression_master_switch every ~100ms. A sequence
like compression→off→compression that completes inside a single
tick window is invisible to the cron — every observed value is
COMPRESSION, dir_chg=0 forever, no pass triggers.
Fix: compressionSweepNotifyMasterSwitchChanged() called from
applyCompressionMasterSwitch on every transition. The cron tick
ORs this edge-trigger with the polled comparison, so neither
source of edge is missed. Without this, the Tcl test
"Sweeper state transitions: interval > 0 enters sleeping after
pass" fails — earlier rapid CONFIG SET sequences (from previous
tests in the same fixture) hide the master-switch transition.
Non-functional state warning
----------------------------
master=compression + threads=0: the worker pool refuses every
enqueue. The sweeper's pass would burn CPU iterating the keyspace
while every candidate drops. Sweeper skips the pass with a
rate-limited LL_WARNING (once per minute); cursor is preserved
so a future threads>0 setting picks up where we left off.
master=decompression + threads=0 is functional — the pool isn't
used for permanent-decompress work.
COMPRESSION SWEEP FORCE
-----------------------
New subcommand. Rejected when master=off (no direction).
Otherwise sets force_pending = 1; the next cron tick consumes
the flag and transitions to SCANNING. Allowed even with
sweeper=disabled (the catch-up affordance — operators can leave
sweeper=disabled to avoid idle-keyspace iteration cost and run
SWEEP FORCE on demand).
Idempotent: multiple FORCE calls before the next cron tick
collapse into a single pass.
Verification
------------
- src/Makefile + cmake/Modules/SourceFiles.cmake registered.
- BUILD_ZSTD=yes and BUILD_ZSTD=no clean with -Werror.
- 15 new gtest cases under src/unit/test_compression_sweep.cpp;
total 382 unit tests (was 367), all pass.
- 9 new Tcl test cases under tests/unit/type/compression.tcl;
total 22 tests, all pass.
- Smoke test: master=off + sweeper=enabled = idle (no looping);
master→compression triggers exactly one pass (interval=0);
rapid compression↔decompression sub-tick toggles trigger
one pass each (sub-tick edge fix).
Restructures S2.9 from a single bullet into the C1/C2/C3 split that matched the actual rollout (mirrors how S2.8 was tracked). C1 and C2 already merged; C3 is this PR.
ikolomi
commented
Jun 15, 2026
Two related changes from review discussion on PR #31: 1. Rename compression-active-sweeper → compression-automatic-sweeper. 2. Collapse the 4-state machine to 2 booleans + a timestamp. The rename clarifies the config's role: it gates AUTOMATIC scheduling (master-switch transitions, periodic re-runs). Manual FORCE bypasses it. The original "active-sweeper" suggested "is the sweeper engine active right now" — which under the simplified model is just scan_in_progress, an internal flag rather than an operator-facing config. Touches (rename): compression-active-sweeper → compression-automatic- sweeper across configs, struct fields, enums, INFO fields, design doc, Tcl tests. compression-active-sweeper-interval renamed for consistency. Simplification -------------- The earlier draft (4 states: DISABLED/IDLE/SCANNING/SLEEPING) ran into a sub-tick edge bug: cron polls server.compression_master_switch every ~100ms, but apply hooks fire synchronously per CONFIG SET. A sequence like compression→off→compression that completes inside a single tick window was invisible to the cron — every observed value was COMPRESSION, dir_chg=0 forever, no pass triggered. The earlier fix was a sweep_master_switch_changed flag set by the apply hook; the cron consumed it as an edge signal. This commit eliminates that whole tension by inverting control: apply hooks **directly** mutate sweeper state. Two helpers: resetScanStateAndEnableOnce() cursor=0, scan_in_progress=0, enable_once=1. Aborts in-flight pass and schedules a fresh one. abortScan() cursor=0, scan_in_progress=0, enable_once=0. Aborts without scheduling a replacement. Apply hooks (compression.c → notify callbacks → these helpers): master-switch → off abortScan() master-switch → {comp, decomp} resetScanStateAndEnableOnce() when sweeper=enabled, else nothing sweeper disabled → enabled resetScanStateAndEnableOnce() when master ≠ off, else nothing sweeper enabled → disabled abortScan() FORCE (master ≠ off) resetScanStateAndEnableOnce() (preempts in-flight scan; that is the user-facing semantic of "do a pass now, from scratch") Cron tick (~25 lines, was ~110): if master == off: return if !enable_once && !scan_in_progress && !nextIntervalElapsed(): return enable_once = 0 if !scan_in_progress: scan_in_progress = 1, log "starting" if master == compression && threads == 0: rate-limited warning, return (cursor preserved) completed = runSweepBudget(master) if completed: scan_in_progress = 0 last_completion_ms = mstime() passes_completed++ nextIntervalElapsed(): iff sweeper=enabled AND interval>0 AND last_completion_ms != 0 AND (now - last_completion_ms) >= interval*1000. Boot: compressionSweepInit() arms enable_once if sweeper=enabled and master ∈ {compression, decompression} at boot — same logic as the disabled→enabled apply hook would do at runtime, since apply hooks don't fire during config-load. INFO field ---------- compression_active_sweeper_state (4 names) → compression_sweeper_ running (0/1). The original 4 names always collapsed semantically: DISABLED/IDLE/SLEEPING all mean "not currently scanning, won't start until a trigger". The new field reports the only state that ever needed distinguishing — is a scan currently advancing? External observers wanting "will a scan trigger soon?" can derive it from compression_automatic_sweeper + compression_automatic_ sweeper_interval + compression_sweeper_running. Tests ----- gtest: 18 cases (was 15). Existing cases for direction-change / FORCE / interval / threads-zero all survive (test by behavior, not by state name); added cases for the apply-hook semantics (SweeperEnabledKicksScan, SweeperDisabledAbortsAndDoesNotResume, MasterChangeWithSweeperDisabledNoTrigger, InitArmsEnableOnce*). Tcl: 22 cases (unchanged count). Field-name assertions updated from compression_automatic_sweeper_state:idle/sleeping/disabled to compression_sweeper_running:0. The "Sweeper: enabled + master= compression runs exactly one pass with interval=0" test asserts the cardinal correctness property (no looping with interval=0). Verified clean with -Werror under both BUILD_ZSTD=yes and BUILD_ZSTD=no.
…f sweeper config Bug case: sweeper=disabled + force-pass in flight + master switch flips compression<->decompression. Without this fix, the in-flight scan continues from its current cursor with the new direction — DBs scanned before the change got the old direction's work, DBs after the change got the new direction's work. The keyspace ends up in a mixed state inconsistent with the operator's declared state. Fix: notify hook now calls resetScanStateAndEnableOnce() when either of the following is true: - sweep.scan_in_progress (force-pass or automatic in flight), OR - sweeper config = enabled (the existing automatic trigger) The 'no in-flight scan + sweeper=disabled' case stays a no-op: operator hasn't asked for any work; direction change is just configuration; FORCE will schedule when they're ready. New gtest case ForcePassInFlightDirectionChangeRestarts uses the master=compression + threads=0 path to leave scan_in_progress=1 across cron ticks (simulating mid-flight), then verifies that a direction change clears in-flight state and a subsequent tick runs a fresh pass with the new direction. End-to-end smoke verified: FORCE with threads=0 holds sweeper_running=1, master flip resets it to 0, next tick logs a fresh decompression pass.
…tics Cross-pass after the in-flight-scan reset bug fix to align comments and the design doc with the implemented behavior: - design/detailed-design.md R2.1.5: rewrote the master-switch transition table. Drop "wake the sweeper from sleeping/idle" (artifact of the deprecated 4-state machine). State the bug-fix semantic explicitly: direction change to a productive direction resets the cursor and reschedules iff (a) sweeper=enabled OR (b) a scan is in flight; with sweeper=disabled and no in-flight scan, direction change is just configuration. Master→off branch separated as its own bullet. - src/compression_sweep.h: notify-callback dispatch table updated to show the in-flight-scan branch. Drops "state-machine pseudocode" wording (the cron is plain pseudocode, no state machine). - src/compression_sweep.c: drop "During SCANNING" reference in the cursor-state-struct comment. - src/compression.c: applyCompressionMasterSwitch comment no longer describes the obsolete polling-vs-edge rationale; explains the apply-hook-driven model directly. Cron-tick callsite comment drops "state machine" framing. Build clean with -Werror under both BUILD_ZSTD=yes and BUILD_ZSTD=no. Same test set passes (no behavior change — comments only).
The compression-active-sweeper → compression-automatic-sweeper rename widened the longest identifier in the compression config block, shifting the alignment column for the trailing /* ... */ comments. clang-format-18 wants the surrounding fields re-aligned; manual fix matches the diff exactly. Whitespace only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third and final piece of the C1/C2/C3 split implementing the declarative master-switch model from PR-B (#28). After this lands, the multi-PR realtime-compression v1 design is implementation-complete and PR #26 (the older sweep approach) can be closed.
Builds on:
compression-master-switch+compression-automatic-sweeper+compression-automatic-sweeper-intervalconfigs + apply hooks (auto-retire active dict on transitions INTOdecompression).master ∈ {compression, off}, PERMANENT-DECOMPRESS inmaster == decompression).This PR delivers the engine that drives those configs:
src/compression_sweep.{c,h}(new)COMPRESSION SWEEP FORCEsrc/commands/compression-sweep.json+compressionCommandhandlersrc/compression.c(Init/Cron/Shutdown + INFO field + apply-hook notifications)src/Makefile,cmake/Modules/SourceFiles.cmakesrc/unit/test_compression_sweep.cpp(18 gtest cases),tests/unit/type/compression.tcl(9 new)Architecture (apply-hook driven)
The branch went through a design simplification mid-review. The original implementation used a 4-state machine driven by cron-tick polling; review surfaced a sub-tick edge bug (cron polls at ~100ms but
CONFIG SETcan run faster, so transitions likecompression→off→compressioninside one tick window were invisible). The simplified model inverts control: apply hooks mutate state synchronously, the cron just consumes it.Internal state (compression_sweep.c):
Two helpers:
Apply-hook dispatch table:
abortScan()abortScan()resetScanStateAndEnableOnce()if scan in flight, else nothingresetScanStateAndEnableOnce()resetScanStateAndEnableOnce()(when master ≠ off)abortScan()resetScanStateAndEnableOnce()resetScanStateAndEnableOnce()-ERR-ERRresetScanStateAndEnableOnce()(init handles since apply hooks don't fire at boot)FORCE preempts in-flight scans. Calling
resetScanStateAndEnableOnce()while a scan is in progress sets cursor=0 and re-arms enable_once, so the next cron tick starts a fresh pass from the top with current direction. That's the user-facing semantic of "do a pass now, from scratch, regardless of what was running before".Direction change preserves scan integrity. A direction change while a scan is in flight (force-pass or automatic) always resets the cursor regardless of sweeper config. Without that, the cron would resume from the saved cursor under the new direction, leaving the keyspace half-old / half-new.
Cron (the consumer side):
Per-key dispatch
Direction taken from master-switch at scan time:
master == compression:compressionEnqueueCandidate(worker pool compresses).master == decompression:compressionPermanentlyDecompressdirectly on main thread (worker pool unused for drain work, per R2.1.2).master == off: callback unreachable (cron returns early).Iteration paced via
(1_000_000 / hz) × compression-sweep-max-cpu-pct / 100µs (1ms floor). Defaulthz=10 + 25%= 25ms/tick. Cursor + current_db persist across ticks so a multi-tick pass resumes correctly.Non-functional-state warning
master=compression + threads=0: the worker pool refuses every enqueue, so a sweep pass would burn CPU iterating the keyspace while every candidate drops. Sweeper skips the scan with a rate-limitedLL_WARNING(once per minute); cursor is preserved so a futurethreads>0setting picks up where we left off.master=decompression + threads=0is functional — the pool isn't used for permanent-decompress work.COMPRESSION SWEEP FORCENew subcommand:
master=offwith-ERR compression is off; cannot sweep without a direction.resetScanStateAndEnableOnce(); next cron tick starts a fresh pass.compression-automatic-sweeper— the operator can leave sweeper disabled (no idle iteration cost) and runSWEEP FORCEon demand.Bare
COMPRESSION SWEEP(no FORCE arg) returns the standard arity error.Config rename
compression-active-sweeper→compression-automatic-sweeper(and the-intervalknob in step). The original "active-sweeper" name suggested "is the sweeper engine active right now?" — which under the simplified model is justscan_in_progress, an internal flag rather than an operator-facing config. The new name says what the config actually does: gate automatic scheduling. Manual FORCE bypasses it.Touches: configs, struct fields (
server.compression_active_sweeper→server.compression_automatic_sweeper), enum (COMPRESSION_ACTIVE_SWEEPER_*→COMPRESSION_AUTOMATIC_SWEEPER_*), apply hook (applyCompressionActiveSweeper→applyCompressionAutomaticSweeper), helper (activeSweeperName→automaticSweeperName), INFO field (compression_active_sweeper:enabled/disabled→compression_automatic_sweeper:enabled/disabled), design doc, all tests. Since unstable doesn't ship, no compat alias.INFO
compression_sweeper_running:0/1replaces an earliercompression_active_sweeper_state:disabled/idle/scanning/sleeping. The four state names always collapsed semantically (disabled/idle/sleeping all mean "not currently scanning"), and the only state that ever needed distinguishing for operators is "is a scan currently advancing?" External observers wanting "will a scan trigger soon?" can derive it fromcompression_automatic_sweeper+compression_automatic_sweeper_interval+compression_sweeper_running.Verification
make -j2 -C src SERVER_CFLAGS=-Werrorclean (BUILD_ZSTD=yes default).make -j2 -C src SERVER_CFLAGS=-Werror BUILD_ZSTD=noclean.CompressionSweepTest).master=off + sweeper=enabled→compression_sweeper_running:0.master→compressionwithinterval=0→ 1 pass, then idle. No looping.master→decompressiondirection change → 1 fresh pass.FORCEwithsweeper=disabled→ 1 pass.interval=2s→ +1 pass on enable, then sleep, then +1 pass on interval expiry.compression→decompression→compressionapply-hook edges → 2 passes (no edge missed).threads=0+ direction change → in-flight scan reset + fresh pass with new direction.Commit history (review chronology)
The branch preserves the discussion-driven evolution as separate commits — reviewer can read each as an iteration:
e3bf52afb— original C3 (4-state machine + cron polling + sub-tick edge fix flag).18343c94c— plan.md update.dcddc0bf8— simplification + rename (collapse to 2 booleans + apply-hook driven).e174c9e72— bug fix: master change must reset cursor regardless of sweeper config.29152284f— comments + design doc R2.1.5 alignment for the bug fix.80840b24f— clang-format whitespace alignment after rename.After this lands
s2-sweepapproach) will be manually closed with a comment pointing forward.