Skip to content

v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR #1314)#1324

Merged
garrytan merged 5 commits into
masterfrom
garrytan/pr1314-production-ready
May 23, 2026
Merged

v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR #1314)#1324
garrytan merged 5 commits into
masterfrom
garrytan/pr1314-production-ready

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Lands community PR #1314 (parallel gbrain sync --all + --status dashboard concept) with the structural fixes Codex's outside-voice review caught. The original PR's lock-id change only fired inside the --all parallel path, which would have introduced a worse race than the global-lock contention it set out to fix (sync --all on per-source lock racing against sync --source foo on the still-global lock). The shipped version makes the per-source lock the invariant for every source-scoped sync, paired with withRefreshingLock for sources that exceed 30 minutes.

What's new for users:

  • gbrain sync --all now fans out across sources by default (continuous worker pool, no head-of-line blocking from wave dispatch). 4-source brain on Postgres goes from ~4m11s sequential to ~1m17s parallel per cron tick.
  • gbrain sources status [--json] — new read-only per-source dashboard (last sync, staleness, page count, embedding coverage, brain-wide unacked failures). Sibling to sources list/add/remove/archive so reads + writes don't share a verb.
  • Per-source [<source-id>] line prefix via AsyncLocalStorage on every line emitted under parallel sync (kubectl-style; grep '[media-corpus]' actually works).
  • Stable --json envelope {schema_version:1, sources, parallel, ok_count, error_count} on stdout with banners routed to stderr via a humanSink helper so jq parses cleanly.

The 3 P0 structural fixes Codex caught:

  1. Lock asymmetry (load-bearing) — per-source lock is now the invariant for any SyncOpts.sourceId; withRefreshingLock handles 30-min TTL refresh for long sources.
  2. Broken dashboard SQL — original PR referenced chunks ch JOIN ON page_slug. Actual schema is content_chunks ch JOIN ON page_id. Tests stubbed executeRaw so the SQL never ran. Shipped version uses canonical SQL + pages.deleted_at IS NULL filter + active embedding column from the registry + errors propagate (no swallow-catch).
  3. Connection budget 2× undercount — each per-file worker opens its own pool with poolSize=2, so --parallel 4 --workers 4 = 32 connections not 16. Warning math + message text now include the × 2 per-file pool factor.

Plus: --skip-failed/--retry-failed reject under --parallel > 1 (sync-failures.jsonl is brain-global today; source-scoping filed as v0.41+ follow-up).

Closes #1314 (community PR, attribution preserved via Co-Authored-By trailer + CHANGELOG credit to @garrytan-agents).

Test Coverage

NEW MODULES                                           CASES
[+] src/core/console-prefix.ts                        8 (test/console-prefix.test.ts)
  ├── withSourcePrefix / getSourcePrefix              [★★★] propagation + nested wrap
  └── slog / serr                                     [★★★] prefix + bare + multi-line

[+] src/commands/sync.ts new exports                  16 (test/sync-all-parallel.test.ts)
  ├── resolveParallelism                              [★★★] PGLite + explicit + auto + zero
  ├── per-source lock id format                       [★★★] D8 invariant + D13 injection guard
  ├── buildSyncStatusReport                           [★★★] staleness + coverage + envelope + error
  └── connection-budget math                          [★★★] D10 corrected formula

[+] IRON RULE E2E                                     5 (test/e2e/sync-status-pglite.test.ts)
  └── Real PGLite seeds 2 sources × pages × chunks    [★★★] soft-deleted excluded
       + archived source excluded + registry-resolved      + correct embedding column
       embedding column + errors propagate                 + Q2 sub-fix verified

COVERAGE: 29 new cases, all paths covered for the new exports + IRON RULE regression
QUALITY: ★★★:29 ★★:0 ★:0  |  GAPS: 0

Pre-Landing Review

The plan went through /plan-eng-review (16 decisions D1–D16, 0 unresolved) + /codex consult mode (11 findings: 3 P0 + 5 P1 + 3 P2, all resolved). The eng-review report lives at ~/.claude/plans/system-instruction-you-are-working-fluttering-grove.md.

No issues found in the pre-landing review of the shipped code beyond what was already addressed during plan-stage review.

Plan Completion

11/11 implementation tasks (T1–T11) completed:

  • T1: src/core/console-prefix.ts AsyncLocalStorage helper ✓
  • T2: src/commands/sync.ts core rewrite (lock invariant + worker pool + JSON envelope) ✓
  • T3: buildSyncStatusReport SQL fix + Q2 catch + Expansion 2/3 ✓
  • T4: sources.ts status subcommand ✓
  • T5: full-lake slog/serr migration (embed.ts + progress.ts) ✓
  • T6: test/sync-all-parallel.test.ts (16 cases) ✓
  • T7: test/e2e/sync-status-pglite.test.ts IRON RULE regression (5 cases) ✓
  • T8: CLAUDE.md annotations for all 6 modified files ✓
  • T9: CHANGELOG + VERSION + package.json bump ✓
  • T10: TODOS.md v0.41+ follow-up entry ✓
  • T11: Closed PR feat(sync): parallelize sync --all + add --status source dashboard #1314 with thank-you to @garrytan-agents

Test plan

  • Typecheck clean (bun run typecheck)
  • All 4 CI pre-checks pass (check:jsonb, check:privacy, check:progress, check:wasm)
  • New tests pass: 29/29 across test/console-prefix.test.ts, test/sync-all-parallel.test.ts, test/e2e/sync-status-pglite.test.ts
  • All 148 existing sync-adjacent tests still pass (sync.test.ts, sync-parallel.test.ts, sync-concurrency.test.ts, sync-failures.test.ts, plus the new suites)
  • bun run build:llms regenerated; test/build-llms.test.ts (drift check) passes 7/7
  • Hand-test: gbrain sources status --json | jq returns valid JSON envelope
  • Hand-test: gbrain sync --all --parallel 4 --workers 4 triggers the 32-connection warning on stderr
  • Hand-test: gbrain sync --all --parallel 4 --skip-failed refuses with paste-ready hint
  • Production validation on a 4+ source brain — verify cron-tick wall time improvement

Compatibility

  • No schema changes. No new dependencies.
  • Single-source / non---all paths: bit-for-bit identical to v0.40.3.
  • PGLite users get serial behavior (single-connection engine).
  • Version slot skips 0.40.4 + 0.40.5 to leave room for parallel in-flight work.

🤖 Generated with Claude Code

Co-Authored-By: garrytan-agents garrytan-agents@users.noreply.github.com

garrytan and others added 5 commits May 23, 2026 09:46
… + sources status dashboard (productionized from PR #1314)

Lands the community-authored PR #1314 with the structural fixes Codex's
outside-voice review caught: the original PR's lock-id change only fired
inside the --all parallel path, which would have introduced a worse race
than the global-lock contention it fixed (sync --all on per-source lock
racing against sync --source foo on the still-global lock). The landed
version makes the per-source lock the invariant for every source-scoped
sync, paired with withRefreshingLock for sources that exceed 30 minutes.

What's new
- gbrain sync --all parallel fan-out via continuous worker pool (D2);
  --parallel N flag, default min(sourceCount, --workers, 4); per-source
  [<source-id>] line prefix via AsyncLocalStorage (D6 + D12 + D13);
  stable --json envelope {schema_version:1, ...} on stdout with banners
  on stderr (D4 + D14); --skip-failed/--retry-failed reject under
  --parallel > 1 (D15 — sync-failures.jsonl is brain-global today;
  source-scoping filed as v0.40.4 TODO).
- gbrain sources status [--json] read-only dashboard (D3 — sibling to
  sources list/add/remove/archive, not a sync flag, so reads + writes
  don't share a verb). Counts pages + chunks + embedding coverage per
  source. Active embedding column resolved via the registry (D16) so
  Voyage / multimodal brains see the right column. Archived sources
  excluded by caller filter.
- Connection-budget stderr warning when parallel × workers × 2 > 16 with
  the formula in the message text (D1 + D10 — Codex P0 #3: each per-file
  worker opens its own PostgresEngine with poolSize=2, so the
  multiplication factor is 2, not 1).

The load-bearing structural fix
- performSync defaults to per-source lock id (gbrain-sync:<sourceId>)
  whenever opts.sourceId is set + wraps in withRefreshingLock. Legacy
  single-default-source brains keep the bare tryAcquireDbLock(SYNC_LOCK_ID)
  path for back-compat.
- Dashboard SQL is the canonical content_chunks ch JOIN pages pg ON
  pg.id = ch.page_id WHERE pg.deleted_at IS NULL shape — the original PR
  shipped chunks ch JOIN ON page_slug, which would have crashed on PGLite
  parse and silently zeroed on Postgres via a swallow-catch. Errors from
  the dashboard SQL propagate (no silent zero-counts on real DB errors).

Tests
- New test/console-prefix.test.ts — 8 cases pinning ALS propagation,
  nested wraps, embedded-newline prefixing, back-compat fast path.
- New test/sync-all-parallel.test.ts (replaces PR's stubbed tests) —
  16 cases covering resolveParallelism, per-source lock format,
  buildSyncStatusReport SQL math + error propagation + envelope shape,
  connection-budget math, per-source prefix routing.
- New test/e2e/sync-status-pglite.test.ts — IRON RULE regression: real
  PGLite seeds 2 sources × pages × chunks (mixed embedded/unembedded,
  1 soft-deleted, 1 archived source). Validates SQL excludes both AND
  the active embedding column is the one used. This is the case that
  would have caught the PR's original broken SQL.

Compatibility
- No schema changes. No new dependencies.
- Single-source / non-`--all` paths: bit-for-bit identical to v0.40.2.
- PGLite users get serial behavior (single-connection engine).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: garrytan-agents <garrytan-agents@users.noreply.github.com>
…duction-ready

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/commands/sources.ts
…ight work)

Reserves v0.40.4 + v0.40.5 slots for parallel waves (salem's graph-signals
work and any other in-flight branches) and lands this PR's parallel-sync
work at v0.40.6.0. No code change beyond the version triple and the
TODOS / CLAUDE.md / CHANGELOG cross-references which were updated from
"v0.40.4" to "v0.41+" to match the new follow-up version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…duction-ready

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…duction-ready

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/commands/sources.ts
#	src/commands/sync.ts
@garrytan garrytan merged commit 677142a into master May 23, 2026
8 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master: (22 commits)
  v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377)
  v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403)
  v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364)
  v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352)
  v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367)
  v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351)
  v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350)
  v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345)
  v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313)
  v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333)
  v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327)
  v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324)
  v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322)
  v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300)
  v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323)
  v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296)
  v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298)
  v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128)
  v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308)
  v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant