Skip to content

[design] Adopt QSBR grace-period model for dictionary lifecycle#7

Merged
ikolomi merged 2 commits into
unstablefrom
gilboa/design-qsbr-dict-lifecycle
May 24, 2026
Merged

[design] Adopt QSBR grace-period model for dictionary lifecycle#7
ikolomi merged 2 commits into
unstablefrom
gilboa/design-qsbr-dict-lifecycle

Conversation

@GilboaAWS

Copy link
Copy Markdown
Collaborator

Summary

Replaces the per-job refcount model for dictionary lifetime management with QSBR (Quiescent-State-Based Reclamation) — a grace-period pattern used in the Linux kernel's RCU subsystem.

Background

Two approaches were evaluated for managing dictionary pointer lifetime across the main thread and compression workers:

Approach A: Per-job refcount

Main thread increments a refcount before enqueue, passes CDict* in the job struct, decrements on every completion path. Dict freed at refcount == 0.

  • Pro: Deterministic reclamation
  • Con: Passes a lifetime-managed pointer across thread boundaries — a pattern that memory-safe languages (Rust) explicitly prohibit without Arc<T>. In C, correctness depends on manual discipline: every completion path (success, error, version-mismatch discard) must decRef. A missed decRef leaks; an extra decRef causes use-after-free.

Approach B: QSBR grace-period (chosen)

Workers atomically load the active dict pointer when ready to compress. After use, they report a quiescent state (advance a generation counter). The main thread retires dicts by snapshotting worker generations; a dict is freed only after all workers have advanced past the snapshot AND frame-refs == 0.

  • Pro: No lifetime-managed pointers cross thread boundaries. Worker contract is minimal (load, use, report done). Failure mode is safe-directional (delayed reclamation, not use-after-free). Complexity is concentrated in the registry rather than distributed across job completion paths.
  • Con: Deferred reclamation (bounded by grace barriers to idle workers). Requires understanding the QSBR pattern.

What changed in the design doc

Section Change
§4.4 Full QSBR model: struct updates (retire_worker_gen[], atomic active pointer, retiring list), ownership table, step-by-step flow, comparison table with rationale
§4.6 compressionJob Removed CDict* from job; dict_id filled by worker at compress time; concurrency notes updated
R2.11.4 Worker contract: load active pointer, use immutable CDict, report quiescent
§3 architecture Updated bullet to match

No code changes

Design doc only.

Replace per-job refcount model with QSBR (Quiescent-State-Based
Reclamation) for dictionary lifetime management between main thread
and compression workers.

Key changes:
- §4.4: full QSBR model description with struct updates
  (retire_worker_gen[], atomic active pointer, retiring list, GC)
- §4.4: comparison table of both approaches with rationale
- §4.6: compressionJob no longer carries CDict*; worker loads active
  dict atomically at compress time; reports quiescent after use
- R2.11.4: updated worker contract (load, use, report quiescent)
- §3: architecture bullet updated to match

Rationale: QSBR avoids passing lifetime-managed pointers across thread
boundaries — a pattern that memory-safe languages (Rust) prohibit
without Arc<T> and that in C relies on manual refcount discipline
across all control paths. The QSBR model concentrates complexity in
the registry and gives workers a minimal, hard-to-misuse contract.
@GilboaAWS GilboaAWS marked this pull request as ready for review May 19, 2026 15:44
Comment thread .agents/planning/realtime-data-compression/design/detailed-design.md Outdated
Comment thread .agents/planning/realtime-data-compression/design/detailed-design.md Outdated

**How it works:**

1. **Promotion:** Main thread creates a new `compressionDict`, atomically stores it as `registry->active`. The previous active dict moves to the retiring list via `compressionDictStartRetirement()`.

@ikolomi ikolomi May 24, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should clarify how this queue is bounded. The promotion/training should be blocked if retiring queue is full and warnings/errors emited

Comment thread .agents/planning/realtime-data-compression/design/detailed-design.md Outdated
Comment thread .agents/planning/realtime-data-compression/design/detailed-design.md Outdated

```c
typedef struct compressionJob {
robj *key; /* key name, used to resolve robj later */

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

robj         *key;         /* key name, used to resolve robj later */

Not directly related to this PR, but does this imply that the main thread should perform additional lookup when installing the compressed value? If so, this mean that for each compression-eligible write command, the main thread will need to perform 2 lookups, which is expensive. We need to rethink it in a follow up PR

Addresses 5 of 6 review comments on the QSBR design. Comment #6
(`compressionJob.key` extra-lookup concern) is explicitly deferred to
a follow-up PR per reviewer guidance.

Comment #1 (line 428) and #5 (line 544) — drop language-comparison
framing:
  Removed all references to Rust / `Arc<T>` / "memory-safe languages"
  / `shared_ptr` from §4.4 intro, the "Why QSBR" bullet list, and the
  §4.6 "Why the worker loads the active dict itself" paragraph. The
  rationale now stands on its own technical merit (decoupling the
  registry from worker hot paths; minimal worker contract;
  safe-directional failure modes) rather than via comparison to
  another language's type system. C with explicit protocols is the
  right tool for this problem; the comparison added rhetorical
  weight without adding signal.

Comment #2 (line 326) — duplication with R2.11.4:
  §3.3 Separation invariants restated the worker contract that
  R2.11.4 already specifies authoritatively. Slimmed the §3.3 bullet
  to a one-liner that points at §2.11 R2.11.4 and §4.4. Eliminates
  drift risk between the two places.

Comment #3 (line 439) — bound the retiring list, block on cap:
  Added new step 7 to the QSBR section explaining the cap interaction
  with R2.3.3. The retiring list is a subset of `dicts[]`, capped at
  `compression-dict-max-versions`. When grace-barrier draining cannot
  keep up (worker starvation, persistent `frame_refs > 0`), the cap
  is reached and BOTH training AND promotion are refused per R2.3.3:
  `LL_WARNING` log entry, `compression_dict_cap_reached` set in INFO,
  operator intervention required (raise cap or run COMPRESSION SWEEP).

Comment #4 (line 449) — grace-barrier wake-up via cond_broadcast:
  The original step 6 proposed enqueueing barrier jobs into the SPMC
  inbox to force idle workers to advance generations. This doesn't
  actually work: under work-stealing semantics a single fast worker
  can drain all barrier jobs while siblings stay asleep on the cond
  var. Rewrote step 6 to use a wake-all primitive built on
  `pthread_cond_broadcast`, and added a "Wake-all primitive"
  paragraph to §4.6 that describes extending `mutexqueue.h` with two
  new APIs: a broadcast wake-all (for QSBR grace barriers, config
  changes, etc.) and a shutdown-signal variant (for pool teardown).
  Step 6 cross-references §4.6 for the mechanism.

Comment #6 (line 513) — DEFERRED:
  Reviewer flagged that `compressionJob.key` (a `robj *` carried in
  the job) implies the main thread does an additional lookup at
  install time, doubling the per-write lookup cost. The reviewer
  explicitly tagged this as "follow up PR" — addressing it would
  require a redesign of the install-side data flow and is out of
  scope for the QSBR design change. Tracked as an open item; will
  be addressed before code lands for the install path (S2.7 in
  the implementation plan).
@ikolomi ikolomi merged commit 21773fb into unstable May 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants