Skip to content

v0.22.13 feat: parallel sync — bounded concurrent imports#490

Merged
garrytan merged 15 commits intomasterfrom
feat/parallel-sync
Apr 30, 2026
Merged

v0.22.13 feat: parallel sync — bounded concurrent imports#490
garrytan merged 15 commits intomasterfrom
feat/parallel-sync

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented Apr 28, 2026

Problem

gbrain sync on a large brain (7,200+ pages) takes 25+ minutes because imports are serial. Meanwhile, gbrain import --workers N already has a proven parallel pattern.

Solution

Thread the same parallel pattern through sync:

  • gbrain sync --concurrency N (alias --workers N) parallelizes the import phase
  • Auto-concurrency: >100 files → 4 workers. Full sync on Postgres → 4 workers. Small diffs stay serial. PGLite always serial.
  • Minion handler routes through autoConcurrency() (was hardcoded 4): gbrain jobs submit sync --params '{"concurrency":8}'
  • Delete/rename stay serial (order-dependent, fast)

Each worker gets its own Postgres pool (2 connections). Total connections during the parallel phase = workers * 2 + caller pool (e.g. 4 workers + 10-conn caller pool = 18, well under PgBouncer's max_client_conn default of 100).

Update (v0.22.13 hardening — 6 atomic commits on top of #489)

After /plan-eng-review surfaced 14 issues + Codex outside-voice surfaced 4 more (3 critical), the PR now also ships:

  • Cross-process writer lock (CODEX-2) — gbrain-sync row in gbrain_cycle_locks with 30-min TTL. Prevents two concurrent syncs from both writing last_commit and letting the last writer win. New helper src/core/db-lock.ts.
  • Head-drift gate (CODEX-3) — re-checks git rev-parse HEAD after the import phase; refuses to advance last_commit if HEAD moved (someone ran git checkout / git pull mid-sync). Vanished files now record a failedFiles entry instead of silent-skip — the silent-skip-then-advance pathology that survived prior hardening passes is dead.
  • Per-source bookmark for Minion sync jobs (CODEX-1) — handler resolves sourceId via sources.local_path lookup, mirroring cycle.ts:480's autopilot fix from PR fix: pass sourceId in cycle sync phase to prevent full reimport #475. Prevents the 30-min full-reimport-every-cycle behavior on multi-source brains.
  • Shared concurrency policy (Q5) — new src/core/sync-concurrency.ts with autoConcurrency() + parseWorkers(). Replaces three drifted call-site policies in performSync, performFullSync, and the jobs handler.
  • Worker connection cleanup (A2) — try/finally around the worker loop in both sync.ts and import.ts. Prior Promise.all(...disconnect) ran outside any try/finally, leaking 8 connections on panic.
  • Engine detection unified (A1) — both PGLite-detection sites now use engine.kind === 'pglite' (the v0.13.1 discriminator). The engine.constructor.name sniff is gone.
  • --workers validation (Q2) — --workers 0, --workers -3, --workers foo, --workers 1.5 now exit with a clear error. Prior parseInt-with-no-validation silently fell through to auto-concurrency (4 workers), the opposite of what was typed.
  • Explicit --workers honored on small diffs (Q1) — drop the >50-file floor when the user opted in.
  • Banner moved to stderr (Q4) — console.logconsole.error so gbrain sync --json stdout stays clean.

Versioning: held at v0.22.13 (patch) per user call, not the v0.23.0 (minor) Codex argued for. CHANGELOG entry names the behavior changes loud so users know to read the release notes.

Tests

  • 2916 pass / 264 skip / 0 fail across 184 files (unit suite, bun test --timeout 30000)
  • Typecheck clean
  • New unit tests: test/sync-concurrency.test.ts (17 cases) + test/sync-parallel.test.ts (7 cases, PGLite-routed, covers bookmark gate + head-drift + writer-lock contract)
  • New E2E: test/e2e/sync-parallel.test.ts — DATABASE_URL-gated, 60-file happy path with pg_stat_activity leak probe + 120-file benchmark (SYNC_PARALLEL_BENCH 120 files | serial=289ms | parallel(4)=221ms | speedup=1.31x). Real numbers now in the CHANGELOG.

Files

  • src/commands/sync.tsSyncOpts.concurrency, parallel import, auto-concurrency, writer lock, head-drift gate, vanished-file capture, engine.kind, try/finally workers, parseWorkers CLI, banner→stderr
  • src/commands/import.ts — engine.kind, try/finally workers, parseWorkers
  • src/commands/jobs.ts — sync handler resolves sourceId, autoConcurrency, noEmbed contract documented
  • src/core/sync-concurrency.ts (new) — autoConcurrency + parseWorkers + constants
  • src/core/db-lock.ts (new) — generic tryAcquireDbLock(engine, lockId) over gbrain_cycle_locks
  • test/sync-concurrency.test.ts (new), test/sync-parallel.test.ts (new), test/e2e/sync-parallel.test.ts (new)
  • CHANGELOG.md — v0.22.13 entry (renamed from v0.23.0)
  • VERSION 0.22.5 → 0.22.13, package.json 0.22.6 → 0.22.13, bun.lock refreshed
  • TODOS.md — A4 follow-up (plumb database_url through SyncOpts)
  • CLAUDE.md + README.md — doc updates for the new modules, tests, and CLI flags

Documentation

Doc updates in this PR (commit 36c750b):

  • CLAUDE.md — added "Key files" entries for src/core/sync-concurrency.ts (v0.22.13) and src/core/db-lock.ts (v0.22.13); added a new entry for src/commands/sync.ts (covers the lock, head-drift gate, engine.kind, vanished-file capture, parallel branch); updated the src/commands/jobs.ts entry with v0.22.13 sourceId resolution + autoConcurrency + noEmbed contract; added test/sync-concurrency.test.ts, test/sync-parallel.test.ts, and test/e2e/sync-parallel.test.ts to the test list with case counts and the SYNC_PARALLEL_BENCH grep marker; added a "Key commands added in v0.22.13" subsection.
  • README.md — added --workers N flag to the IMPORT section's gbrain sync and gbrain import lines, with the >100-file auto-parallelize note.
  • CHANGELOG.md — v0.22.13 release-summary section in GStack/Garry voice (no em dashes, no AI vocab) with real benchmark numbers, "What this means for you" closer, and a "To take advantage of v0.22.13" self-repair block per CLAUDE.md convention.
  • TODOS.md — D-PR490-1 follow-up: plumb resolved database_url through SyncOpts so loadConfig() isn't re-read three times per sync.
  • VERSION + package.json + bun.lock — bumped to 0.22.13 (existing master had 0.22.5/0.22.6 drift; this commit brings them back in sync).

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

root and others added 13 commits April 28, 2026 06:43
gbrain sync --concurrency N (alias --workers N) parallelizes the import
phase using per-worker Postgres engine instances with an atomic queue
index (same proven pattern as gbrain import --workers N).

Auto-concurrency: when a sync touches >100 files and the user didn't
explicitly set --concurrency, defaults to 4 workers. Small incremental
syncs (<50 files) stay serial. Full syncs auto-detect Postgres and
default to 4 workers.

Minion sync handler defaults to concurrency=4, configurable via job
params: {"concurrency": 8}.

Delete and rename phases remain serial (order-dependent, fast).
PGLite falls back to serial automatically (single-connection engine).

Changes:
- src/commands/sync.ts: SyncOpts.concurrency, parallel import loop in
  performSync incremental path, --workers passthrough in performFullSync
- src/commands/jobs.ts: sync handler accepts concurrency param (default 4)
- CHANGELOG.md: v0.23.0 parallel sync entry

All 37 existing sync tests pass. Typecheck clean.
src/core/sync-concurrency.ts — single source of truth for autoConcurrency()
+ parseWorkers() + shouldRunParallel() + constants. Replaces three drifted
call-site policies (performSync, performFullSync, jobs handler).

src/core/db-lock.ts — generic tryAcquireDbLock(engine, lockId, ttlMinutes)
over the existing gbrain_cycle_locks table. Parameterized lock id so
performSync (gbrain-sync) can nest cleanly under cycle.ts (gbrain-cycle)
without deadlock.

test/sync-concurrency.test.ts — 17 cases covering PGLite-forces-serial,
explicit override clamping, auto-path threshold, parseWorkers validation
(rejects 0, negatives, NaN, decimals, trailing chars).

No consumers yet; subsequent commits wire sync.ts, import.ts, and jobs.ts
to use these helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CODEX-2: wrap performSync body in a gbrain-sync DB lock so two concurrent
syncs (manual + autopilot, two terminals, two Conductor workspaces) cannot
both read last_commit, both write it unconditionally, and let the last
writer win. cycle.ts continues to hold gbrain-cycle for its broader scope;
the two ids nest cleanly.

CODEX-3: capture git HEAD at sync entry, re-rev-parse after the import
phase, refuse to advance last_commit if HEAD drifted (someone ran
git checkout / git pull mid-sync). Vanished files now go into failedFiles
instead of silent-skip — same gating mechanism, no more bookmark advance
past unimported work.

A1: replace both PGLite detection sites with engine.kind === 'pglite'.
The constructor.name sniff is gone (breaks under bundling) and so is the
inconsistent config?.engine string check.

A2: connect worker engines serially into an array, run inside try/finally
so disconnect always fires — even on partial connect failure, OOM, or
mid-import abort. Prior Promise.all(...disconnect) leaked the 8 worker
connections on any panic path.

Q1: explicit --workers / opts.concurrency now bypasses the >50-file floor.
User opt-in beats the auto-path safety net.

Q3: drop the config!.database_url! non-null assertions; fall back to serial
when database_url is unset instead of crashing on TypeError.

Q4: worker-count banner moves from console.log to console.error so stdout
stays clean for --json output.

test/sync-parallel.test.ts — 7 cases over PGLite covering the bookmark
gate under concurrency request, the head-drift gate, vanished-file
failure capture, PGLite-stays-serial, and the writer-lock contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Workers

A1: replace the config?.engine === 'pglite' string sniff with
engine.kind === 'pglite' to match sync.ts and the v0.13.1 contract.

A2: wrap worker engine creation + the parallel loop in try/finally so
disconnects always fire — same pattern as sync.ts. Worker engines now
push onto an array as they connect (rather than Promise.all) so the
finally block can clean up partial-connect state.

Q2: route --workers parsing through the shared parseWorkers() helper.
parseInt-with-no-validation is gone — '0', '-3', 'foo', '1.5' now exit
with a clear error message instead of silently falling through.

Q3: drop the config!.database_url! non-null assertion; fall back to
serial when database_url is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CODEX-1: resolve sourceId at handler entry by looking up sources.local_path.
Mirrors cycle.ts:480's autopilot-cycle fix (PR #475). Without this, every
Minion sync job on a multi-source brain reads global config.sync.last_commit
instead of the per-source anchor, which on a regularly-GC'd repo can drop
out of git history and trigger 30-min full reimports every cycle.

The handler accepts an optional sourceId job param for callers that want
to override; falls back to the resolveSourceForDir lookup when absent.

CODEX-4: replace the hardcoded concurrency=4 default with the shared
autoConcurrency policy. Behavior is now consistent between CLI sync,
the Minion handler, and the autopilot cycle's sync phase. Jobs that
request a specific concurrency via job.data.concurrency still win.

noEmbed default stays at true — embed is a separate job (submit
gbrain embed --stale, OR rely on the autopilot cycle's embed phase).
The doc comment makes that contract explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DATABASE_URL-gated E2E coverage that PGLite-only tests can't reach:

T2 — happy path: 60 files imported at concurrency=4, all 60 pages land
in the DB, with a pg_stat_activity probe before/after to confirm worker
engines (4 × 2 connections) actually disconnected.

P4 — benchmark: 120-file fixture, serial vs concurrency=4 timing.
Emits a single-line `SYNC_PARALLEL_BENCH 120 files | serial=Xms |
parallel(4)=Yms | speedup=Zx` so the CHANGELOG can quote a real
number instead of an unbacked '~4×' claim. Asserts parallel <=
serial * 1.5 to allow for noisy CI but fail genuine regressions.

Skips gracefully when DATABASE_URL is unset (consistent with the rest
of test/e2e/).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION + package.json + bun.lock: 0.22.5/0.22.6 → 0.22.10. Repo had
existing drift between VERSION and package.json on master; this commit
brings them back in sync at the bumped value.

CHANGELOG.md: v0.22.10 entry replaces the unfinished v0.23.0 stub from
PR #490's original commit. Voice-rule clean (no em dashes, no AI
vocabulary), real benchmark numbers from the new E2E test
(serial=289ms parallel(4)=221ms speedup=1.31x), additive worker-pool
note (A3), 'To take advantage of v0.22.10' self-repair block per
CLAUDE.md convention.

TODOS.md: A4 follow-up filed — plumb resolved database_url through
SyncOpts so performSync / performFullSync / import.ts don't each call
loadConfig() separately. Deferred to a future patch; not on the
v0.22.10 critical path.

Patch (not minor) framing held even though new CLI surface lands here;
release-notes prose names the behavior change explicitly so users know
to read them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md:
- New "Key files" entries for src/core/sync-concurrency.ts and
  src/core/db-lock.ts (both v0.22.10).
- New "Key files" entry for src/commands/sync.ts (covers the lock,
  head-drift gate, engine.kind discriminator, vanished-file failure
  capture, parallel branch wiring).
- Updated src/commands/jobs.ts entry with v0.22.10 sourceId
  resolution + autoConcurrency policy + noEmbed contract.
- Added test/sync-concurrency.test.ts and test/sync-parallel.test.ts
  to the unit-test list with case counts.
- Added test/e2e/sync-parallel.test.ts to the E2E section with the
  SYNC_PARALLEL_BENCH grep marker for CHANGELOG quoting.
- Added "Key commands added in v0.22.10" section: gbrain sync --workers,
  gbrain import --workers (parseWorkers validation).

README.md: added --workers flag to the IMPORT section's gbrain sync
and gbrain import lines, with the >100-file auto-parallelize note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	TODOS.md
#	VERSION
#	package.json
# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	VERSION
#	package.json
VERSION 0.22.10 → 0.22.13. Master moved to 0.22.8 plus claimed slots
0.22.9-0.22.12 in sibling workspaces; 0.22.13 is the next free slot for
this PR's parallel-sync hardening work.

Updated all v0.22.10 references in CHANGELOG.md (release header +
self-repair block), TODOS.md (D-PR490-1 follow-up tag), CLAUDE.md
(Key files entries + tests + commands subsection), and the inline
v0.22.10 markers in src/core/sync-concurrency.ts, src/core/db-lock.ts,
src/commands/sync.ts, src/commands/import.ts, src/commands/jobs.ts,
test/sync-parallel.test.ts, test/e2e/sync-parallel.test.ts.

No behavioral change. CHANGELOG header rewrite, content unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's build-llms generator test failed because llms-full.txt was stale
relative to the README + CLAUDE.md updates this PR added (--workers
flag in the IMPORT section, sync-concurrency.ts/db-lock.ts/sync.ts
entries in the Key files section).

Per CLAUDE.md: "Run \`bun run build:llms\` after adding a new doc."
The test test/build-llms.test.ts:67 verifies committed bundles match
generator output — now they do again.

llms.txt was already in sync (no curated config additions); only
llms-full.txt needed the regen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	TODOS.md
#	VERSION
#	package.json
@garrytan garrytan changed the title feat: parallel sync — bounded concurrent imports v0.22.13 feat: parallel sync — bounded concurrent imports Apr 29, 2026
# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	VERSION
#	llms-full.txt
#	package.json
# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	VERSION
#	llms-full.txt
#	package.json
#	src/commands/sync.ts
@garrytan garrytan merged commit e96f054 into master Apr 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant