v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity#1519
Merged
Conversation
…lamp wrapper T1 + T2 of the v0.41.16.0 workers cathedral. New src/core/worker-pool.ts is the canonical primitive every --workers N bulk command in this wave (and future bulk commands) builds on. Atomic-claim invariant enforced by scripts/check-worker-pool-atomicity.sh (wired into bun run verify). BudgetExhausted bypass + AbortSignal composition baked into the helper so budget caps are a structural ceiling under concurrency, not a per-caller convention. The new resolveWorkersWithClamp wrapper composes existing autoConcurrency with PGLite-clamp + per-(command, requested) stderr dedup. Deliberately NOT a modification to shared autoConcurrency (silent today, used by sync + import); embed.ts keeps GBRAIN_EMBED_CONCURRENCY || 20 default per codex #13. 23 + 12 + 9 = 44 hermetic tests pin every contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- test/embed-helper-migration.test.ts (T3): asserts embed.ts's two sliding-pool sites are migrated to runSlidingPool, pre-migration shapes (let nextIdx = 0, Promise.all(Array.from(...))) are gone, GBRAIN_EMBED_CONCURRENCY || 20 default preserved, failureLabel threads page.slug. Per codex #16/#17 these are invariant assertions, not byte-equality on progress event ORDERING. - test/embedding-dim-check-facts.test.ts (T6): readFactsEmbeddingDim covers vector(N) + halfvec(N), halfvec-before-vector regex ordering pinned (codex #19), buildFactsAlterRecipe emits DROP INDEX + ALTER USING + CREATE INDEX (codex #18, not bare REINDEX), FactsEmbeddingDimMismatchError tagged class shape, assertFactsEmbeddingDimMatchesConfig PGLite skip + Postgres absent- column skip, doctor check + insert-cast wiring assertions. - test/extract-conversation-facts-workers.test.ts (T5): helper exports (extractConversationFactsLockId, PER_PAGE_LOCK_TTL_MINUTES), structural wiring (runSlidingPool, resolveWorkersWithClamp, withRefreshingLock, LockUnavailableError, delete-orphans-first before segment loop, preflight before pool, exit 3 when lock_skipped > 0), Minion handler round-trip. - test/extract-workers.test.ts (T7): --workers wiring on all 3 inner fs-walk loops (extractForSlugs, extractLinksFromDir, extractTimelineFromDir) + CLI parse + opts threading through runExtractCore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1510 (garrytan/dynamic-regex-conversation-formats) claimed v0.41.16.0 on master in parallel. Advancing this wave to v0.41.17.0 so both can land cleanly. Pure mechanical version bump: - VERSION + package.json → 0.41.17.0 - CHANGELOG.md header + "To take advantage of v0.41.17.0" block - TODOS.md section header + v0.41.18+ forward references - CLAUDE.md inline version tags - Regenerated llms-full.txt / llms.txt No code changes. The actual workers cathedral feature set is unchanged from the two prior commits in this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aam-v1 # Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json # scripts/run-verify-parallel.sh # src/commands/reindex-code.ts # src/commands/reindex-multimodal.ts # src/commands/reindex.ts
CI shard 5 failed on `searchVector column routing (v0.27.1)` with: error: expected 1280 dimensions, not 1536 The test had a hardcoded `fakeText1536` helper that seeded chunks at 1536-d vectors. Master's default embedding model switched from OpenAI text-embedding-3-large (1536) to ZeroEntropy zembed-1 (1280) so a fresh PGLite brain on CI now sizes content_chunks.embedding at 1280; the test's 1536-d INSERT trips pgvector's CheckExpectedDim. Fix: probe `content_chunks.embedding` width via `readContentChunksEmbeddingDim(engine)` in `beforeAll`, store in `TEXT_DIM`, and build `fakeTextDefault(seed)` at that width. The test now passes regardless of which default ships (the model has flipped twice and may flip again). Local dev (1536 from older config) and CI fresh-install (1280 from new default) both pass. Image-side vectors stay at 1024 (matches Voyage multimodal-3 + the column's fixed width on the image side). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
facts-anti-loop.test.ts and ingest-capture.test.ts were timing out in CI
shard 4 with "beforeEach/afterEach hook timed out" after the v0.41.16.0
master merge brought migration count to 99. When these files run deep in
a shard process that has already created ~20 PGLite engines, the WASM
cold-start + 95-migration replay legitimately exceeds bun's 5s default
hook timeout (observed 5.6s and 7.3s locally when reproducing).
Bun's --timeout=60000 from scripts/test-shard.sh covers TEST timeouts
but NOT hook timeouts; those default to 5s and must be set per-hook via
the optional 2nd arg to beforeAll/afterAll.
Reproduced locally by running the first 21 shard-4 files via
head -21 /tmp/shard4-list.txt | xargs bun test
→ 179 pass, 2 fail (both with hook-timeout error)
After fix:
→ 198 pass, 0 fail (the 4 anti-loop + 15 ingest-capture tests recover)
Full shard 4 with fix: 955 pass, 0 fail.
Full shard 5 with fix: 1261 pass, 0 fail.
Also added a defensive diagnostic to the two put_page tests: if
facts_backstop is missing in the response payload, throw with the full
payload + isError so future failures surface the actual handler error
instead of a bare "expected {...} got undefined" assertion. No-op when
the test passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
You can now run
extract-conversation-facts,extract,edges-backfill,reindex-multimodal,reindex --markdown,reindex-code, andreindex-frontmatterwith--workers N. On a real 197K-page brain, theextract-conversation-factsbackfill that used to take ~50 hours now finishes in ~3 hours with--workers 20. Productionizes the RFC in PR #1473 (which is being closed).Architecture (worker-pool foundation):
src/core/worker-pool.ts(~270 LOC): canonical sliding pool + bounded semaphore. Atomic-claim invariant pinned byscripts/check-worker-pool-atomicity.sh(wired intobun run verify).BudgetExhausted(and any futureMUST_ABORT_ERROR_TAGS) bypassonErrorand hard-abort the pool — budget caps are a structural ceiling under concurrency.failures[]stores{idx, label, error}, not full items — bounded memory under 197K-page brains.Bulk commands wired (P0→P3 from the RFC):
extract-conversation-facts(P0 motivator) + per-page advisory lock viawithRefreshingLock+ delete-orphans-first replay safety + extraction-startup dim preflight.extract,edges-backfill,reindex-multimodal,reindex --markdown,reindex-code,reindex-frontmatter(P1-P3 batch).embed.tsmigrated to the shared helper (existingGBRAIN_EMBED_CONCURRENCY || 20default preserved per codex Skillpack Section 16: Deterministic Collectors — Code for Data, LLMs for Judgment #13).eval-cross-modal.tsinlinerunWithLimitdeleted; callers route through the shared helper.Facts dim-mismatch doctor parity (secondary bug):
readFactsEmbeddingDimcovers bothvector(N)andhalfvec(N)(codex feat: SQLiteEngine — zero-cost local brain (no Supabase needed) #19 — migration v40 falls back on pgvector < 0.7).assertFactsEmbeddingDimMatchesConfigpreflight thrown before first fact insert (catches the bug class new users hit before they ever run doctor).facts_embedding_width_consistencysurfaces drift with paste-ready DROP INDEX → ALTER USING → CREATE INDEX recipe (codex Auto create vector extension #18 — NOT bare REINDEX).postgres-engine.tsinsert paths now match cast suffix to actual column type (probed once per engine, cached).Test Coverage
163 wave-specific tests across 10 new/extended test files. Full unit suite: 11,256 pass, 0 fail across 4 parallel shards + 47 serial files (1148s wallclock). Typecheck clean. Worker-pool atomicity CI guard intact.
New test files:
test/worker-pool.test.ts(23 cases) — atomicity, abort, onError, failures[], BudgetExhausted bypasstest/pglite-workers-clamp.test.ts(12 cases) — PGLite-clamp + per-(command, requested) deduptest/scripts/check-worker-pool-atomicity.test.ts(9 cases) — the CI guard's own regressiontest/embed-helper-migration.test.ts(8 cases) — structural invariants for the embed migrationtest/embedding-dim-check-facts.test.ts(18 cases) — facts dim drift + ALTER recipetest/extract-conversation-facts-workers.test.ts(17 cases) — wiring + lock semantics + helper exportstest/extract-workers.test.ts(10 cases) —--workersthreading throughrunExtractCorePre-Landing Review
Absorbed into
/plan-eng-review+ codex outside-voice during implementation (21 captured D-decisions: D1 scope, D2 per-page lock, D3 budget overshoot, D4 gateway-internal backoff, D5 atomicity invariant, D6 lock-busy skip, D7 failures shape, D8 test coverage, D9 PGLite clamp, D10 outside voice, D11 delete-orphans-first, D12 refreshing lock, D13 BudgetExhausted bypass, D14 drop dream, D15 doctor + preflight, D16/D17 pull P2/P3 into wave, D18 skip auto-ALTER, D19/D20/D21 file as TODOs). Every codex finding either became a D-decision or absorbed as an inline plan adjustment.Plan + decisions persisted at
~/.claude/plans/system-instruction-you-are-working-fancy-creek.md.Version collision note (rebumped twice)
Master shipped v0.41.15.0 (sync
--timeout+--max-agewave, #1506) while this wave was in flight — rebumped to v0.41.16.0. PR #1510 (conversation parser cathedral) also claimed v0.41.16.0 in parallel — rebumped again to v0.41.17.0 to land cleanly behind it. Both waves coexist with zero overlap: v0.41.15.0 covers sync robustness; v0.41.17.0 covers bulk-command parallelism.Things to watch after merge
--workers > 1. D3 documented overshoot isN_workers × avg_per_call_cost. Pin--workers 1if you need exact-ceiling compliance.--workersis in-process. ExistingGBRAIN_ANTHROPIC_MAX_INFLIGHTis for subagent loops only, NOT bulk paths. Rely on gateway-internal 429 backoff for provider throttling.extract-conversation-facts --workers 20invocations against the same source converge correctly via per-page lock + delete-orphans-first. Exit 3 surfaces whenpages_lock_skipped > 0so the next run picks them up.facts.embeddingdim drift now warns.gbrain doctorsurfaces the mismatch with a paste-ready ALTER recipe. Preflight catches new users before the first insert.Plan Completion
All 15 tasks (T1-T15) complete. Every D-series decision (D1-D21) implemented. 4 follow-ups filed in TODOS.md for v0.41.17+: dream queue recoupling, AIMD auto-tune, BudgetTracker mutex, sync-integration hook parity. 1 TODO filed for v0.42+: reactive auto-ALTER on facts dim drift.
PR #1473 closed with attribution comment (handled in T15).
Test plan
bun run typecheck)bash scripts/check-worker-pool-atomicity.sh)gbrain extract-conversation-facts --workers 5 --limit 100 --dry-runon a real brain (post-merge)--workers 20; confirm projected ~3hr completion on the 197K-page brain (post-merge)🤖 Generated with Claude Code