Skip to content

[plan] Reassign S1 to @GilboaAWS, S6 to @ikolomi, mark Phase 0 complete#4

Merged
ikolomi merged 2 commits into
unstablefrom
gilboa/s1-dict-lifecycle
May 18, 2026
Merged

[plan] Reassign S1 to @GilboaAWS, S6 to @ikolomi, mark Phase 0 complete#4
ikolomi merged 2 commits into
unstablefrom
gilboa/s1-dict-lifecycle

Conversation

@GilboaAWS

Copy link
Copy Markdown
Collaborator

Summary

Ownership update for the realtime-data-compression implementation plan:

What changed

  • Team summary table updated
  • Subsystems table ownership column updated
  • Phase 1 and Phase 2 task tracks reorganized under correct owners
  • Phase 0 checkboxes marked [x] with PR Inline compression: Phase 0 skeleton #3 attribution

Context

@GilboaAWS is picking up the core dictionary lifecycle work (S1) in addition
to persistence/observability/benchmarks/infra. @ikolomi retains the
concurrency-critical hot path (S2) and takes on integration tests (S6)
which are tightly coupled to the COW audit.

No code changes — planning doc only.

…omplete

- Move S1 (dictionary lifecycle) ownership from @ikolomi to @GilboaAWS
- Move S6 (integration tests) ownership from @GilboaAWS to @ikolomi
- Update team summary table and Phase 1/Phase 2 task tracks accordingly
- Mark all Phase 0 tasks as completed (PR #3 — phase-0-skeleton merged)

@ikolomi ikolomi left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change @ikolomi to own S2 and S5

|---|---|---|
| **@ikolomi** | Lead engineer | S1, S2 (concurrency-critical, COW audit owner) |
| **@GilboaAWS** | Co-owner | S3, S4, S5, S6, S7 |
| **@ikolomi** | Lead engineer | S2, S6 (concurrency-critical, COW audit owner) |

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not S6 - i am doing perf tests (S5)

Swap benchmark suite (S5) to @ikolomi and integration tests (S6) to
@GilboaAWS. Update team summary, subsystems table, Phase 1/Phase 2
task tracks, and risk table accordingly.
@ikolomi ikolomi merged commit 78783ff into unstable May 18, 2026
3 checks passed
ikolomi added a commit that referenced this pull request May 24, 2026
Addresses 5 of 6 review comments on the QSBR design. Comment #6
(`compressionJob.key` extra-lookup concern) is explicitly deferred to
a follow-up PR per reviewer guidance.

Comment #1 (line 428) and #5 (line 544) — drop language-comparison
framing:
  Removed all references to Rust / `Arc<T>` / "memory-safe languages"
  / `shared_ptr` from §4.4 intro, the "Why QSBR" bullet list, and the
  §4.6 "Why the worker loads the active dict itself" paragraph. The
  rationale now stands on its own technical merit (decoupling the
  registry from worker hot paths; minimal worker contract;
  safe-directional failure modes) rather than via comparison to
  another language's type system. C with explicit protocols is the
  right tool for this problem; the comparison added rhetorical
  weight without adding signal.

Comment #2 (line 326) — duplication with R2.11.4:
  §3.3 Separation invariants restated the worker contract that
  R2.11.4 already specifies authoritatively. Slimmed the §3.3 bullet
  to a one-liner that points at §2.11 R2.11.4 and §4.4. Eliminates
  drift risk between the two places.

Comment #3 (line 439) — bound the retiring list, block on cap:
  Added new step 7 to the QSBR section explaining the cap interaction
  with R2.3.3. The retiring list is a subset of `dicts[]`, capped at
  `compression-dict-max-versions`. When grace-barrier draining cannot
  keep up (worker starvation, persistent `frame_refs > 0`), the cap
  is reached and BOTH training AND promotion are refused per R2.3.3:
  `LL_WARNING` log entry, `compression_dict_cap_reached` set in INFO,
  operator intervention required (raise cap or run COMPRESSION SWEEP).

Comment #4 (line 449) — grace-barrier wake-up via cond_broadcast:
  The original step 6 proposed enqueueing barrier jobs into the SPMC
  inbox to force idle workers to advance generations. This doesn't
  actually work: under work-stealing semantics a single fast worker
  can drain all barrier jobs while siblings stay asleep on the cond
  var. Rewrote step 6 to use a wake-all primitive built on
  `pthread_cond_broadcast`, and added a "Wake-all primitive"
  paragraph to §4.6 that describes extending `mutexqueue.h` with two
  new APIs: a broadcast wake-all (for QSBR grace barriers, config
  changes, etc.) and a shutdown-signal variant (for pool teardown).
  Step 6 cross-references §4.6 for the mechanism.

Comment #6 (line 513) — DEFERRED:
  Reviewer flagged that `compressionJob.key` (a `robj *` carried in
  the job) implies the main thread does an additional lookup at
  install time, doubling the per-write lookup cost. The reviewer
  explicitly tagged this as "follow up PR" — addressing it would
  require a redesign of the install-side data flow and is out of
  scope for the QSBR design change. Tracked as an open item; will
  be addressed before code lands for the install path (S2.7 in
  the implementation plan).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants