fix: pass sourceId in cycle sync phase to prevent full reimport#475
Merged
fix: pass sourceId in cycle sync phase to prevent full reimport#475
Conversation
cycle.ts calls performSync without sourceId, so it always reads the global config.sync.last_commit key instead of the per-source sources.last_commit. When the global anchor gets garbage-collected (after a force push or rebase), sync falls back to a full reimport of all files — on a large brain this takes 30+ minutes and blocks the autopilot cycle. The fix resolves the source id from the brain directory by querying the sources table. When a matching source exists, sync reads the per-source anchor which is updated on every successful sync and stays in sync with the repo history. Falls back gracefully to the global config path for pre-v0.18 brains without a sources table.
ramybarsoum
added a commit
to ramybarsoum/RBrain
that referenced
this pull request
Apr 27, 2026
Adds 6 regression tests in test/core/cycle.test.ts pinning the new resolveSourceForDir() helper added to src/core/cycle.ts in this PR: 1. Seeded sources row → performSync receives matching sourceId 2. No matching row → sourceId=undefined (falls through to global key) 3. Different brainDir than registered source → undefined (no cross-match) 4. sources table missing (very old brain) → catch returns undefined, sync still runs. Uses a fresh PGLiteEngine because initSchema() only re-runs PENDING migrations; DROP TABLE on the shared engine would leave it permanently degraded for every later test in the file. (Codex review caught this landmine.) 5. Multiple rows with same local_path → resolver returns one matching id (non-deterministic; SQL has no ORDER BY). Documents the contract for the v0.23 UNIQUE-constraint follow-up. 6. Empty-string id row → resolver propagates "" (defensive case Codex flagged: schema PK prevents NULL but '' can be inserted). Extends the performSync mock at line 51-65 to also capture sourceId. Bumps: - VERSION: 0.22.4 → 0.22.5 - package.json: 0.22.4 → 0.22.5 - CHANGELOG.md: new [0.22.5] entry following v0.22.4 voice (release summary + numbers table + behavior matrix + To-take-advantage block + itemized changes + for-contributors) - CLAUDE.md: annotates src/core/cycle.ts entry with v0.22.5 (#475) note - llms-full.txt: regenerated via bun run build:llms Test results: - Unit: 28 pass / 0 fail in test/core/cycle.test.ts (22 prior + 6 new) - Full unit suite: pass (exit 0) - E2E: 236 pass / 0 fail across 26 files Plan + codex outside-voice review at: ~/.claude/plans/whimsical-bubbling-goose.md Follow-up TODOs filed for v0.23: - Normalize brainDir + sources.local_path before SQL compare - Add UNIQUE index on sources.local_path - Narrow resolveSourceForDir's catch to PG 42P01 (undefined_table) - Add doctor check for config.sync.last_commit / sources divergence Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
Adding tests + v0.22.5 version bump on top of this PR (commit What I added:
Test results:
Plan + outside-voice review: Follow-up TODOs filed for v0.23:
🤖 Generated with Claude Code |
CI typecheck failed because `toContain()` on `string[]` rejects the
`string | undefined` produced by `syncCalls.at(-1)?.sourceId`'s optional
chain. Tests 1, 4, and 6 use `toBe()` which accepts `string | undefined`
through its overload, but `toContain()` is stricter.
Fix: pull the value into a typed variable, assert it's defined, then
check membership. Makes the contract explicit ("resolver returned a
defined sourceId, and it was one of the matching ids") instead of
relying on a silent undefined → no-match-in-array assertion.
Locally:
- bun run typecheck: clean
- bun test test/core/cycle.test.ts: 28 pass / 0 fail (75 expect calls)
- All CI gate scripts: OK (jsonb, progress-to-stdout, wasm-embedded)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #475's Tier 1 (Mechanical) CI job hit a 5000.09ms beforeAll hook timeout in `E2E: Tags > (unnamed)`. Cause: scripts/run-e2e.sh invokes `bun test "$f"` without a --timeout flag, falling back to bun's 5s default. setupDB() does TRUNCATE CASCADE on ~30 tables, and on a CI runner under load that can exceed 5s. Match what the unit suite uses (--timeout=60000 in package.json's "test" script). Same 1m ceiling, no behavior change for healthy runs; just removes the artificial 5s floor on hooks. Verified locally: bun test --timeout=60000 test/e2e/mechanical.test.ts runs 78 pass / 0 fail in 27.99s against a fresh pgvector pg16 docker container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
Apr 28, 2026
Merges 5 master commits since last merge: v0.22.1 autopilot fix wave (#447), v0.22.2 minions worker reliability (#458), v0.22.4 frontmatter-guard (#448), sourceId in cycle sync phase (#475), and post-migration schema verification (#488). Conflict resolutions: - VERSION: kept this branch's reserved 0.27.0 slot (master at 0.22.6). - CHANGELOG.md: kept v0.27.0 entry at top, then master's v0.22.6 → v0.21.0 entries below in order. - CLAUDE.md: merged the v0.27 cycle bullet (8 phases, synthesize, patterns, transcript-discovery, dream CLI flags) with master's v0.22.1/v0.22.5 cycle additions (signal: AbortSignal, willRunExtractPhase, resolveSourceForDir). - src/core/cycle.ts: kept v0.27 yieldDuringPhase + synthInputFile/synthDate/synthFrom/synthTo CycleOpts fields AND added master's v0.22.1 signal: AbortSignal field. Both coexist. - llms-full.txt: regenerated against the merged tree. The dream_verdicts schema migration moved v25 → v30 in the prior merge. Master ended at v29 (cathedral_ii_code_edges_rls); v30 is uncontested. Tests pass post-merge: 105/105 dream + cycle tests across 9 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
Apr 28, 2026
CODEX-1: resolve sourceId at handler entry by looking up sources.local_path. Mirrors cycle.ts:480's autopilot-cycle fix (PR #475). Without this, every Minion sync job on a multi-source brain reads global config.sync.last_commit instead of the per-source anchor, which on a regularly-GC'd repo can drop out of git history and trigger 30-min full reimports every cycle. The handler accepts an optional sourceId job param for callers that want to override; falls back to the resolveSourceForDir lookup when absent. CODEX-4: replace the hardcoded concurrency=4 default with the shared autoConcurrency policy. Behavior is now consistent between CLI sync, the Minion handler, and the autopilot cycle's sync phase. Jobs that request a specific concurrency via job.data.concurrency still win. noEmbed default stays at true — embed is a separate job (submit gbrain embed --stale, OR rely on the autopilot cycle's embed phase). The doc comment makes that contract explicit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 task
garrytan
added a commit
that referenced
this pull request
Apr 30, 2026
* feat: parallel sync — bounded concurrent imports (#489) gbrain sync --concurrency N (alias --workers N) parallelizes the import phase using per-worker Postgres engine instances with an atomic queue index (same proven pattern as gbrain import --workers N). Auto-concurrency: when a sync touches >100 files and the user didn't explicitly set --concurrency, defaults to 4 workers. Small incremental syncs (<50 files) stay serial. Full syncs auto-detect Postgres and default to 4 workers. Minion sync handler defaults to concurrency=4, configurable via job params: {"concurrency": 8}. Delete and rename phases remain serial (order-dependent, fast). PGLite falls back to serial automatically (single-connection engine). Changes: - src/commands/sync.ts: SyncOpts.concurrency, parallel import loop in performSync incremental path, --workers passthrough in performFullSync - src/commands/jobs.ts: sync handler accepts concurrency param (default 4) - CHANGELOG.md: v0.23.0 parallel sync entry All 37 existing sync tests pass. Typecheck clean. * feat: shared concurrency policy + db-lock primitive src/core/sync-concurrency.ts — single source of truth for autoConcurrency() + parseWorkers() + shouldRunParallel() + constants. Replaces three drifted call-site policies (performSync, performFullSync, jobs handler). src/core/db-lock.ts — generic tryAcquireDbLock(engine, lockId, ttlMinutes) over the existing gbrain_cycle_locks table. Parameterized lock id so performSync (gbrain-sync) can nest cleanly under cycle.ts (gbrain-cycle) without deadlock. test/sync-concurrency.test.ts — 17 cases covering PGLite-forces-serial, explicit override clamping, auto-path threshold, parseWorkers validation (rejects 0, negatives, NaN, decimals, trailing chars). No consumers yet; subsequent commits wire sync.ts, import.ts, and jobs.ts to use these helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: harden performSync — writer lock, head-drift gate, engine.kind CODEX-2: wrap performSync body in a gbrain-sync DB lock so two concurrent syncs (manual + autopilot, two terminals, two Conductor workspaces) cannot both read last_commit, both write it unconditionally, and let the last writer win. cycle.ts continues to hold gbrain-cycle for its broader scope; the two ids nest cleanly. CODEX-3: capture git HEAD at sync entry, re-rev-parse after the import phase, refuse to advance last_commit if HEAD drifted (someone ran git checkout / git pull mid-sync). Vanished files now go into failedFiles instead of silent-skip — same gating mechanism, no more bookmark advance past unimported work. A1: replace both PGLite detection sites with engine.kind === 'pglite'. The constructor.name sniff is gone (breaks under bundling) and so is the inconsistent config?.engine string check. A2: connect worker engines serially into an array, run inside try/finally so disconnect always fires — even on partial connect failure, OOM, or mid-import abort. Prior Promise.all(...disconnect) leaked the 8 worker connections on any panic path. Q1: explicit --workers / opts.concurrency now bypasses the >50-file floor. User opt-in beats the auto-path safety net. Q3: drop the config!.database_url! non-null assertions; fall back to serial when database_url is unset instead of crashing on TypeError. Q4: worker-count banner moves from console.log to console.error so stdout stays clean for --json output. test/sync-parallel.test.ts — 7 cases over PGLite covering the bookmark gate under concurrency request, the head-drift gate, vanished-file failure capture, PGLite-stays-serial, and the writer-lock contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: import.ts — engine.kind discriminator, worker try/finally, parseWorkers A1: replace the config?.engine === 'pglite' string sniff with engine.kind === 'pglite' to match sync.ts and the v0.13.1 contract. A2: wrap worker engine creation + the parallel loop in try/finally so disconnects always fire — same pattern as sync.ts. Worker engines now push onto an array as they connect (rather than Promise.all) so the finally block can clean up partial-connect state. Q2: route --workers parsing through the shared parseWorkers() helper. parseInt-with-no-validation is gone — '0', '-3', 'foo', '1.5' now exit with a clear error message instead of silently falling through. Q3: drop the config!.database_url! non-null assertion; fall back to serial when database_url is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: jobs.ts sync handler — resolve sourceId, autoConcurrency CODEX-1: resolve sourceId at handler entry by looking up sources.local_path. Mirrors cycle.ts:480's autopilot-cycle fix (PR #475). Without this, every Minion sync job on a multi-source brain reads global config.sync.last_commit instead of the per-source anchor, which on a regularly-GC'd repo can drop out of git history and trigger 30-min full reimports every cycle. The handler accepts an optional sourceId job param for callers that want to override; falls back to the resolveSourceForDir lookup when absent. CODEX-4: replace the hardcoded concurrency=4 default with the shared autoConcurrency policy. Behavior is now consistent between CLI sync, the Minion handler, and the autopilot cycle's sync phase. Jobs that request a specific concurrency via job.data.concurrency still win. noEmbed default stays at true — embed is a separate job (submit gbrain embed --stale, OR rely on the autopilot cycle's embed phase). The doc comment makes that contract explicit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: e2e parallel sync against real Postgres + benchmark DATABASE_URL-gated E2E coverage that PGLite-only tests can't reach: T2 — happy path: 60 files imported at concurrency=4, all 60 pages land in the DB, with a pg_stat_activity probe before/after to confirm worker engines (4 × 2 connections) actually disconnected. P4 — benchmark: 120-file fixture, serial vs concurrency=4 timing. Emits a single-line `SYNC_PARALLEL_BENCH 120 files | serial=Xms | parallel(4)=Yms | speedup=Zx` so the CHANGELOG can quote a real number instead of an unbacked '~4×' claim. Asserts parallel <= serial * 1.5 to allow for noisy CI but fail genuine regressions. Skips gracefully when DATABASE_URL is unset (consistent with the rest of test/e2e/). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: v0.22.10 release notes + sync follow-up TODO VERSION + package.json + bun.lock: 0.22.5/0.22.6 → 0.22.10. Repo had existing drift between VERSION and package.json on master; this commit brings them back in sync at the bumped value. CHANGELOG.md: v0.22.10 entry replaces the unfinished v0.23.0 stub from PR #490's original commit. Voice-rule clean (no em dashes, no AI vocabulary), real benchmark numbers from the new E2E test (serial=289ms parallel(4)=221ms speedup=1.31x), additive worker-pool note (A3), 'To take advantage of v0.22.10' self-repair block per CLAUDE.md convention. TODOS.md: A4 follow-up filed — plumb resolved database_url through SyncOpts so performSync / performFullSync / import.ts don't each call loadConfig() separately. Deferred to a future patch; not on the v0.22.10 critical path. Patch (not minor) framing held even though new CLI surface lands here; release-notes prose names the behavior change explicitly so users know to read them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update CLAUDE.md + README for v0.22.10 sync hardening CLAUDE.md: - New "Key files" entries for src/core/sync-concurrency.ts and src/core/db-lock.ts (both v0.22.10). - New "Key files" entry for src/commands/sync.ts (covers the lock, head-drift gate, engine.kind discriminator, vanished-file failure capture, parallel branch wiring). - Updated src/commands/jobs.ts entry with v0.22.10 sourceId resolution + autoConcurrency policy + noEmbed contract. - Added test/sync-concurrency.test.ts and test/sync-parallel.test.ts to the unit-test list with case counts. - Added test/e2e/sync-parallel.test.ts to the E2E section with the SYNC_PARALLEL_BENCH grep marker for CHANGELOG quoting. - Added "Key commands added in v0.22.10" section: gbrain sync --workers, gbrain import --workers (parseWorkers validation). README.md: added --workers flag to the IMPORT section's gbrain sync and gbrain import lines, with the >100-file auto-parallelize note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version slot to v0.22.13 VERSION 0.22.10 → 0.22.13. Master moved to 0.22.8 plus claimed slots 0.22.9-0.22.12 in sibling workspaces; 0.22.13 is the next free slot for this PR's parallel-sync hardening work. Updated all v0.22.10 references in CHANGELOG.md (release header + self-repair block), TODOS.md (D-PR490-1 follow-up tag), CLAUDE.md (Key files entries + tests + commands subsection), and the inline v0.22.10 markers in src/core/sync-concurrency.ts, src/core/db-lock.ts, src/commands/sync.ts, src/commands/import.ts, src/commands/jobs.ts, test/sync-parallel.test.ts, test/e2e/sync-parallel.test.ts. No behavioral change. CHANGELOG header rewrite, content unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate llms-full.txt for v0.22.13 doc updates CI's build-llms generator test failed because llms-full.txt was stale relative to the README + CLAUDE.md updates this PR added (--workers flag in the IMPORT section, sync-concurrency.ts/db-lock.ts/sync.ts entries in the Key files section). Per CLAUDE.md: "Run \`bun run build:llms\` after adding a new doc." The test test/build-llms.test.ts:67 verifies committed bundles match generator output — now they do again. llms.txt was already in sync (no curated config additions); only llms-full.txt needed the regen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: root <root@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
cycle.ts:runPhaseSync()callsperformSync(engine, { repoPath: brainDir, ... })without passingsourceId. This means the sync phase always reads the globalconfig.sync.last_commitkey instead of the per-sourcesources.last_commitanchor.The global anchor can drift out of git history when:
When
git cat-file -t <anchor>fails, sync concludes there was a force push and triggers a full reimport of all files — on a production brain with 78K+ files this takes 30+ minutes and blocks the entire autopilot cycle.Meanwhile, the per-source anchor in
sources.last_commitis updated on every successful sync and stays aligned with repo history. It was 668 commits behind HEAD but still valid. The system had the right answer in the right place — the cycle just wasn't reading it.Timeline
config.sync.last_commitpointed to commit472f34d5sources.default.last_commithad a valid ancestor (00a62e50) the whole timeError Log
The DB state showed the divergence:
What We Tried
--verbose, saw it scanning 78K files (not stuck, just slow)configandsourcestablesgit cat-file -tsucceeded,merge-base --is-ancestorconfirmed itup_to_datein <1ssourceIdso cycle reads the right anchorSolution
1 file changed:
src/core/cycle.tsAdded
resolveSourceForDir()— queries thesourcestable for a row whoselocal_pathmatches the brain directory. Returns the source id when found,undefinedotherwise.runPhaseSync()now calls this resolver and passes the result assourceIdtoperformSync(). When a source exists, sync readssources.last_commit(per-source, always updated). When no source matches (pre-v0.18 brains), it falls through to the globalconfig.sync.last_commitkey — backward compatible.Behavior matrix
default)"default"sources.last_commit✅undefinedconfig.sync.last_commitundefined(catch)config.sync.last_commitResults
Before fix:
After fix:
--fullflagTesting
bun build src/core/cycle.ts --no-bundlecompiles cleanresolveSourceForDirreturns"default"for the production brain pathundefined)