v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes #1479) by garrytan · Pull Request #1542 · garrytan/gbrain

garrytan · 2026-05-27T06:36:49Z

Summary

Your brain runs on a real taxonomy now. Not 94 types of cruft. Fifteen canonical types you can name, plus a catch-all for the long tail.

A real production brain (186K pages) had accreted 94 distinct pages.type values in 9 clusters of redundancy. tweet / tweet-thread / tweet-bundle / tweet-single all coexisting. 5.5K concept-redirect pages bloating orphan counts. atom-partner-link pages that should be real link rows. company / yc-company / product / organization all fighting for the same idea. The type system is the foundation for schema packs, search filtering, extract behavior, enrichment routing, and expert routing. When types are noisy, every downstream feature degrades.

This release ships the cathedral that collapses 94 → 14 canonical types (plus note as the catch-all = 15 total) on any brain that opts in.

What you can do that you couldn't before:

gbrain init defaults to gbrain-base-v2 (15 canonical types).
gbrain onboard --check --explain shows the per-cluster narrative for the gbrain-base → v2 migration.
gbrain jobs submit unify-types --allow-protected --params '{"target_pack":"gbrain-base-v2"}' runs the full migration end-to-end: retypes pages, creates alias rows, converts edge-shaped pages into real link rows, then atomically flips the active pack.
Reversible via 72h soft-delete TTL on alias/link pages + frontmatter.legacy_type preservation on retyped pages.
--type article queries keep working (D14 back-compat: alias-expands to media subtype=article at SQL build time).

What's New

Schema-pack primitives (atomic per-page transactions, source-scoped):

runRetypeCore — chunked UPDATE with frontmatter.legacy_type always-stamped + subtype_field strict allowlist (D9)
runPageToLinkCore — converts edge-shaped pages into real links rows (atomic per-page txn, F7)
runPageToAliasCore — concept-redirect pages → slug_aliases rows (D15: alias table IS the resolver, NO rewriteLinks)
rewriteLinksBatch — N-pair atomic FK rewrite

Schema-pack manifest extensions:

subtypes: array per page_type drives inferTypeAndSubtypeFromPack (ReDoS-guarded)
mapping_rules: discriminated union over retype / page_to_link / page_to_alias
migration_from: field declares successor relationship; powers findPackSuccessors version-range walker
expandTypeFilter — --type X legacy alias expands to canonical+subtype at SQL build time (D14)

Migration v104 slug_aliases — forward-bootstrap probed on both Postgres + PGLite.

Engine method: resolveSlugWithAlias(slug, sourceOrSources) on both engines with multi-source ambiguity warning (F10).

Three new onboard checks (registered in runAllOnboardChecks):

pack_upgrade_available (manual_only via render.ts allowlist per D17)
type_proliferation (pack-aware ratio, D16)
dangling_aliases (source-scoped per F12)

unify-types PROTECTED Minion handler — 4-phase lifecycle: retype-explicit → retype-catch-all → page-to-link → page-to-alias → final sync → atomic active-pack flip.

Search: alias_resolved 1.05x post-fusion boost stage. KNOBS_HASH_VERSION bumped 5→6 (one-time cache miss spike on upgrade, self-healing in TTL).

ELIGIBLE_TYPES for facts extraction extended with v2 canonicals (media, tweet, atom, concept, analysis) — codex F-ELIGIBLE caught the original deferred-to-v0.43 plan as a blocker; undeferred.

Test Coverage

79 new unit/integration cases across 11 new test files (retype, page-to-link, page-to-alias, rewrite-links-batch, resolve-slug-with-alias engine parity, find-pack-successors, infer-type-and-subtype, expand-type-filter, onboard-pack-upgrade-checks, search-alias-resolved-boost, unify-types-handler)
3 E2E cases in test/e2e/type-unification-full-flow.test.ts covering all 9 production clusters end-to-end: 94 → ≤16 distinct types, alias rows created, link rows inserted, active pack flipped, idempotent re-run
208-test verification suite confirms KNOBS_HASH_VERSION 5→6, build-llms regeneration, cycle test correctness in isolation, and full v0.41.22 test suite. 0 failures.
Typecheck clean

Plan Completion

All 9 lanes (A1, A2, B, C, D, E, F, G, H) from the plan are DONE. 16 locked decisions (D1-D17) implemented. 12 baseline fixes (F7-F21) absorbed from codex outside voice. Plan + GSTACK REVIEW REPORT at ~/.claude/plans/system-instruction-you-are-working-transient-elephant.md.

Pre-Landing Review

Cycle test passed in isolation (5/5) — initial parallel-test resource contention with a sibling worktree, not a regression. KNOBS_HASH_VERSION cache invalidation comments expanded to document the 5→6 bump rationale per CDX2-F13 append-only convention.

Migration

Schema migration v104 (slug_aliases table) with forward-bootstrap probed on both Postgres and PGLite. Atomic per-page transactions (codex F7). NO rewriteLinks for page-to-alias (D15 — alias table IS the resolver). Source-scoping throughout (F9, F10, F12).

To take advantage of v0.41.22.0

If you're a NEW user: gbrain init defaults to gbrain-base-v2. Done.

If you're an EXISTING user on gbrain-base:

gbrain upgrade — pulls v0.41.22 binaries + applies migration v104.
gbrain onboard --check --explain — preview the migration.
gbrain jobs submit unify-types --allow-protected --params '{"target_pack":"gbrain-base-v2"}' — run it. ~10 min on a 186K-page brain.
gbrain jobs follow <job_id> — watch progress per phase.
gbrain onboard --check — verify pack_upgrade_available and type_proliferation report ok.

Test plan

Typecheck clean
208/208 critical tests pass (KNOBS_HASH_VERSION fixes + cycle + build-llms + all 79 new v0.42 unit cases)
All 3 E2E cases pass against PGLite
Schema migration v104 applies cleanly on both Postgres and PGLite
Re-verified via second /ship pass

Closes #1479

🤖 Generated with Claude Code

Resolve VERSION, package.json, CHANGELOG conflicts with v0.41.22.0 on top, preserving master's v0.41.19.0 entry below.

…closes #1479) Ships gbrain-base-v2 as the new install default (15 canonical types: 14 + note catch-all) and the unify-types PROTECTED Minion handler that runs the gbrain-base→v2 migration end-to-end on existing brains. What this delivers: - gbrain-base-v2.yaml standalone schema pack (no extends:) with 14 canonical page_types + 9 cluster mapping_rules + catch-all sentinel - 3 new schema-pack primitives: runRetypeCore (chunked UPDATE with legacy_type stamping), runPageToLinkCore (edge-shaped pages → link rows), runPageToAliasCore (concept-redirect → slug_aliases) - rewriteLinksBatch for N-pair atomic FK rewrite - Migration v104 slug_aliases table (forward-bootstrap probed on both engines for safe upgrade chain) - New engine method resolveSlugWithAlias(slug, sourceOrSources) on both Postgres + PGLite with multi-source ambiguity warning - inferTypeAndSubtypeFromPack overload + subtypes: + mapping_rules: + migration_from: schema-pack manifest extensions - findPackSuccessors version-range walker (1.x / 1.0.x / exact match) - expandTypeFilter for --type back-compat (D14): legacy aliases route through mapping_rules → canonical+subtype before the SQL filter fires - 3 new onboard checks: pack_upgrade_available, type_proliferation, dangling_aliases (source-scoped per F12) - unify-types Minion handler (PROTECTED, manual_only via render.ts allowlist per D17): retype-explicit → retype-catch-all → page-to-link → page-to-alias → final sync → active-pack flip - alias_resolved 1.05x post-fusion search boost stage; KNOBS_HASH_VERSION bumped 5→6 (one-time cache miss on upgrade, self-healing in TTL) - ELIGIBLE_TYPES for facts extraction extended with v2 canonicals (codex F-ELIGIBLE: blocker not v0.43 follow-up) Tests: 79 new unit/integration cases + 3 E2E cases covering all 9 production clusters end-to-end. 124-case verification on the cache-key + build-llms fixes. KNOBS_HASH_VERSION assertions updated in 3 tests. Plan: ~/.claude/plans/system-instruction-you-are-working-transient-elephant.md (16 locked decisions D1-D17, 12 baseline fixes F7-F21 absorbed from codex outside voice). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

VERSION + package.json + CHANGELOG header + CLAUDE.md cluster annotation all moved from 0.42.0.0 to 0.41.23.0. Body text updated in-place: every "v0.42" / "v0.43+" reference inside this entry's release notes now reads "v0.41.23" or "follow-up release" as appropriate. Same scope shipping — the three-wave extract operator surface stays intact. Just lands in the patch-channel queue (.20/.21/.23 free; .22 is PR #1542's type-unification cathedral) instead of the minor-channel bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…fy manifest registration Two CI failures on PR #1542: 1. check:system-of-record flagged page-to-link.ts:207 addLinksBatch as a direct write to a derived table. The call IS the reconcile surface for page_to_link mapping_rules — it converts edge-shaped pages into canonical link rows under the PROTECTED unify-types Minion handler, source-scoped, atomic per-rule. Added the canonical `// gbrain-allow-direct-insert: <reason>` comment on the same line. 2. check:resolver emitted 11 orphan_trigger warnings for `schema-unify` because the skill was added to skills/RESOLVER.md without a corresponding entry in skills/manifest.json. Added the registration under the existing skills[] array. bun run verify: 28/28 checks pass locally.

…sion Six test failures across shards 2 + 10 on PR #1542: 1. resolver.test.ts: round-trip parser requires frontmatter triggers to be quoted (`- "..."` or `- '...'`). schema-unify shipped with bare YAML strings; quoted the 10 triggers to round-trip correctly. 2. skills-conformance.test.ts (×3): schema-unify SKILL.md was missing the required Contract, Anti-Patterns, and Output Format sections that every conformant skill must declare. Added all three: - Contract: inputs / outputs / side effects / failure modes - Anti-Patterns: 5 DON'Ts including the autopilot trust boundary - Output Format: per-phase stderr lines + celebration summary + JSON envelope shape 3. facts-eligibility.test.ts (×2): the v0.41.22 ELIGIBLE_TYPES expansion added `concept` to the eligible list, but the existing test suite pins concept as rejected (it's `extractable: true` in the schema pack but the v0.41.11 contract documented this as "cosmetic on the backstop path because backstop uses hardcoded ELIGIBLE_TYPES"). Removed `concept` from the expansion; other v2 canonicals (media, tweet, atom, analysis) stay. Comment updated to document the deliberate omission. All 6 failing tests now pass locally (370/370 across the 3 affected files). bun run verify: 28/28 checks green.

Resolve VERSION, package.json, CHANGELOG conflicts. v0.41.22.0 stays on top (higher than master's v0.41.20.0); master's v0.41.20.0 entry preserved below in CHANGELOG order. Brought in master's gbrain status + doctor --scope=brain (PR #1544) and the v0.41.19.0 Supavisor Retry Cathedral (PR #1537) cleanly via auto-merge. Typecheck clean. bun run verify: 28/28 checks pass.

CI shard 8 reported 1 fail (1.00ms — too fast for any real loadActivePack file I/O) on `finds gbrain-base-v2 as successor of gbrain-base@1.0.0`. Local triple-run passes 9/9 in isolation. Root cause: the existing afterEach reset clears the module-level pack cache AFTER each test, but the FIRST test in the file inherits whatever state sibling files in the same bun shard process left behind. With 24+ schema-pack tests in shard 8 (mutate, mutate-audit, best-effort, registry-reload, manifest-v041_2, etc.) running before this file, the first test can read a poisoned cache. Fix: add `beforeEach(_resetPackCacheForTests)`. Two-sided reset guarantees clean state regardless of file ordering within the shard. bun run verify: 28/28 checks pass.

CI shard 1 + shard 8 each surfaced one intermittent failure: shard 1: buildBrainTools > execute() on put_page with valid namespace shard 8: findPackSuccessors > finds gbrain-base-v2 as successor Both pass cleanly in isolation. Both are concurrency races against shared in-shard state: - brain-allowlist.test.ts shares a singleton PGLiteEngine across 18 tests with a beforeEach DELETE FROM pages. With max-concurrency=4, two put_page tests can interleave their TRUNCATE + write phases, so the auto-link/extract sub-steps inside put_page race against the sibling test's DELETE. - schema-pack-find-pack-successors.test.ts reads bundled YAML packs via loadActivePack. The module-level pack cache is shared across parallel tests in the same shard; the previous beforeEach reset helped but didn't fully isolate against concurrent file reads under CI load. Fix per CLAUDE.md test-isolation lint rule R2 (concurrency-fragile files belong in the .serial.test.ts quarantine): rename both files to *.serial.test.ts. Serial runner picks them up at max-concurrency=1. 49/49 serial files pass locally. 28/28 verify checks pass.

Master advanced to v0.41.21.0 (5 daily-driver ops pains wave, PR #1545). Resolved 4 conflicts: - VERSION + package.json: 0.41.22.0 stays (higher than master's 0.41.21.0) - CHANGELOG: my v0.41.22.0 entry on top, master's v0.41.21.0 entry below in proper version-descending order - src/core/migrate.ts: MIGRATION VERSION COLLISION — master's v0.41.21 claimed v104 for `pages_atom_source_hash_idx`. Bumped my slug_aliases migration to v105 (the canonical "claim next available slot" pattern). Updated all slug_aliases-related v104 doc/comment references: - src/core/postgres-engine.ts: "Pre-v104 brain" → "Pre-v105 brain" - src/core/onboard/checks.ts: "pre-v104 brains" → "pre-v105 brains" - src/core/search/hybrid.ts: "pre-v104 brains" → "pre-v105 brains" - docs/architecture/pack-upgrade-mechanism.md: migrate.ts:104 → :105 - CHANGELOG.md (my entry): v104 → v105 with rebump rationale - CHANGELOG.md "To take advantage" block: migration v105 Master's v104 (atom source-hash index, `pages_atom_source_hash_idx`) preserved verbatim. Both migrations now coexist correctly. Typecheck clean. bun run verify: 28/28 checks pass. 33/33 slug_aliases-touching tests pass (page-to-alias, resolve-slug-with-alias, unify-types-handler, onboard checks, search boost, E2E full flow).

CI shard 9 reported 6 failures, all from the embedStaleForSource describe block, all ~120-150ms each — classic shared-engine concurrency race shape. Passes 7/7 locally in isolation. Root cause: embed-stale.test.ts shares a singleton PGLiteEngine across 7 tests with beforeEach resetPgliteState. Under bun's max-concurrency=4 in the parallel shard, two tests can interleave their TRUNCATE + seedPage + upsertChunks + embedStaleForSource flow, so one test's stale-chunk count sees another test's mid-flight writes. Same fix as brain-allowlist.serial.test.ts and schema-pack-find-pack-successors.serial.test.ts: rename to *.serial.test.ts so the serial runner picks it up at max-concurrency=1. bun run verify: 28/28 checks pass. 7/7 embed-stale tests pass via serial.

…#1541) * Wave A: schema + receipts foundation for v0.42 extract operator surfaces Foundation layer for the pack-driven extractables + receipt-as-brain-memory + operator-discoverability cathedral. Five atomic pieces ship together because their schema + helpers + module dependencies are tight-coupled: A1. Widen pack manifest's `extractable` from `boolean` to `boolean | ExtractableSpec`. ExtractableSpec carries prompt_template, fixture_corpus, eval_dimensions, benchmark_min_recall, and reserves verifier_path for v0.43+ pack-shipped verifier code (REFUSE at runtime in v0.42 per plan D-EXTRACT-37). Back-compat: every pre-v0.42 pack with `extractable: true` continues parsing unchanged. Three new helpers: extractableSpecsFromPack(), getExtractableSpec(), refuseVerifierPathInV042(). A2. New page type `extract_receipt` in ALL_PAGE_TYPES. Source-boost map adds `extracts/` prefix at factor 0.3 — receipts surface in search when extraction-relevant but never dominate user content (D-EXTRACT-42). A3. New module src/core/extract/receipt-writer.ts (~190 LOC) exporting writeReceipt(engine, input). Canonical slug shape extracts/{date}/{kind}/{source_id}/{run_id_short}/round-{N} per D-EXTRACT-17. Frontmatter belt+suspenders per D-EXTRACT-19: BOTH type:extract_receipt AND dream_generated:true stamped on every receipt, regardless of caller, so the eligibility predicate's anti-loop guards reject the receipt page from any future extraction sweep (single-flag bypass requires breaking two unrelated checks). Idempotent on resume — same run_id+round overwrites cleanly. A4. Migration v104 creates extract_rollup_7d table (per-day rollup of extract events keyed on kind+source_id+day). Audit JSONL stays the SOURCE OF TRUTH per F-OUT-19; this table is a best-effort cache for doctor's <100ms read budget. Per-day rows mean the 7-day window auto-evicts on every read. v100 was deliberately skipped on master (renumbered out during a prior wave); v101/v102/v103 also taken; v104 is the next clean slot. A5. Doctor `extract_health` check reads extract_rollup_7d for last 7 days and emits per-kind aggregates: cost_7d_usd, eval_pass_count, eval_fail_count, halt_count, round_completed_count, halt_rate. 3-state: OK when rollup empty (pre-v0.42 brain or fresh init), WARN when any per-kind halt rate > 10% (top-3 named in message), WARN when rollup_write_failures > 0 (audit JSONL is SoT but operator deserves to know the DB cache is degraded). Pre-v104 brains stay quiet — the missing-table error path is caught and treated as OK so doctor doesn't warn during the upgrade window. Tests added: - test/extractable-spec-widening.test.ts (22 cases) — back-compat with boolean shape, new struct parsing, verifier_path REFUSE contract. - test/extract/receipt-writer.test.ts (12 cases) — slug shape, frontmatter belt+suspenders, idempotent resume, body human-readability. - test/doctor-extract-health.test.ts (8 cases) — empty rollup OK, halt rate WARN, rollup_write_failures WARN, 7-day window inclusion at boundary, multi-kind top-3 message ordering. Plus the canonical bootstrap-coverage test passes with the new v104 migration cleanly applied through both engines. Plan: ~/.claude/plans/system-instruction-you-are-working-stateless-dragonfly.md Wave A scope. Wave B (hook receipts into existing extractors) follows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Wave B: hook receipts + rollup row into the 5 shipped extractors Each LLM-backed extractor surface now records its run in two places when something actually happened: 1. An extract receipt PAGE at extracts/{date}/{kind}/{source_id}/{run_id_short}/round-{N} (queryable via gbrain search, citable, surfaces in cross-modal contradiction probes per the Wave A foundation). Only written when `total_rows > 0` so no-op runs don't bloat the brain. 2. An UPSERT row in extract_rollup_7d (DB-backed best-effort cache per F-OUT-19) so the doctor extract_health check from Wave A reads per-kind aggregates without scanning JSONL. New module src/core/extract/rollup-writer.ts (~120 LOC) exports upsertExtractRollup() with PostgreSQL ON CONFLICT DO UPDATE on the (kind, source_id, day) PK. Concurrency-safe per F-OUT-14 design. Failure path is best-effort — bumps rollup_write_failures in the table itself, stderr-warns once per (kind, day, error-class), and NEVER fails the parent extraction operation. JSONL remains source of truth. Wired into 5 extractors: - extract-conversation-facts (kind: facts.conversation) — both success path AND BudgetExhausted halt path write receipt+rollup so partial runs are still observable. - extract_atoms cycle phase (kind: atoms) - synthesize_concepts cycle phase (kind: concepts, source_id: default because concepts are brain-global) - propose_takes cycle phase (kind: takes.proposed) — scope-aware source_id from the read scope. - extract_facts cycle phase (kind: facts.fence) — deterministic (no LLM cost) but still records reconcile activity so doctor sees the cycle is alive. Receipt frontmatter belt+suspenders (D-EXTRACT-19) reused from Wave A: every receipt stamps BOTH `type: extract_receipt` AND `dream_generated: true` so the eligibility predicate's anti-loop guards reject the receipt page from any future extraction sweep. Test surgery in test/propose-takes.test.ts — one existing assertion tightened from "no INSERTs" to "no INSERT INTO take_proposals" so the new rollup UPSERT doesn't falsely fail the cache-hit case test. Run regression: 85/85 tests pass across extract-conversation-facts, extract-atoms-synthesize-concepts, extract-facts-phase, propose-takes. Plan: ~/.claude/plans/system-instruction-you-are-working-stateless-dragonfly.md Wave B scope. Wave C (pack-author scaffolding + benchmark) follows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Wave C+D: pack-author scaffolding + operator surfaces for v0.42 extract Wave C: pack-author authoring loop - scaffold-extractable mutation primitive declares a kind as extractable on a pack manifest in one verb (wires through updateTypeOnPack from the v0.41 mutate library); generates 5 placeholder fixtures + a pack-supplied prompt template stub - schema CLI wires gbrain schema scaffold-extractable <type> --pack <pack> - extract benchmark CLI loads a pack's fixture corpus through strict D-EXTRACT-21 path validation (rejects absolute paths, .. traversal, null bytes, symlinks resolving outside pack root); v0.42 ships as a stub reporter (LLM dispatch deferred to Wave E) Wave D: operator surfaces - extract status CLI reads extract_rollup_7d for the last 7 days, sorts by (halt_rate desc, cost desc); kubectl-style right-aligned table, top-5 + "more rows" hint by default, --verbose shows all; stable schema_version: 1 JSON envelope for monitoring pipelines - extract --explain <kind> CLI prints the active pack's resolution chain: declaration source (pack-declared vs built-in cycle phase), prompt_template + fixture_corpus paths with existence checks, eval_dimensions, benchmark_min_recall, and the last 7d rollup - extract.ts gains a lifecycle-grouped help text (Extraction / Inspection / Status) per the original D3 plan goal Tests: - test/schema-pack/scaffold-extractable.test.ts (15 cases) including explicit privacy-rule assertions guarding against real-name leakage - test/extract/benchmark.test.ts (17 cases) covering path validation rejections + JSONL fixture parsing - test/extract/status.test.ts (15 cases) over pure aggregation + formatting Housekeeping: - test/extract/receipt-writer.test.ts refactored to the canonical PGLite block (beforeAll/afterAll/resetPgliteState in beforeEach) per CLAUDE.md test-isolation R3+R4; runtime drops from ~30s of 99-migration replay per test to <6s for all 12 cases together Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.42.0.0: extract operator surfaces + pack-driven extractables Bump VERSION + package.json to 0.42.0.0. CHANGELOG entry covers the three-wave shipped scope (receipts + rollup + doctor check; receipts hooked into all 5 shipped extractors; pack-author scaffolding + benchmark stub-reporter; status + --explain dashboards + lifecycle help). CLAUDE.md Key Files gains a v0.42 cluster annotation. llms.txt regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.41.23.0: re-tag from 0.42.0.0 (patch-channel slot, no scope change) VERSION + package.json + CHANGELOG header + CLAUDE.md cluster annotation all moved from 0.42.0.0 to 0.41.23.0. Body text updated in-place: every "v0.42" / "v0.43+" reference inside this entry's release notes now reads "v0.41.23" or "follow-up release" as appropriate. Same scope shipping — the three-wave extract operator surface stays intact. Just lands in the patch-channel queue (.20/.21/.23 free; .22 is PR #1542's type-unification cathedral) instead of the minor-channel bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: add extract_receipt to gbrain-base.yaml page_types (CI parity gate) CI shard 5 caught the drift: test/regressions/gbrain-base-equivalence.test.ts asserts every ALL_PAGE_TYPES seed has a matching page_type entry in the gbrain-base.yaml pack. Wave A added `extract_receipt` to ALL_PAGE_TYPES but didn't seed it in the base pack manifest. Adds the entry under the `annotation` primitive with `extracts/` path prefix (matches the source-boost demote site) and `extractable: false` (receipts are written by the framework, never extracted from). Comment documents the belt+suspenders D-EXTRACT-19 invariant so future readers understand why receipts carry both `dream_generated: true` AND `type: extract_receipt`. Closes the CI gate without changing runtime behavior — the pack-aware read paths already had the prefix demote wired in src/core/search/source-boost.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: bump gbrain-base page-type count 24→25 in schema-cli test CI shard 4 caught the second drift from the same root cause as the prior parity-gate fix: v0.41.23's `extract_receipt` addition bumped gbrain-base.yaml from 24 to 25 page types. The schema-cli smoke test was pinned at 24 (the count after v0.41.11.0 added `conversation` + `atom`); update to 25 and note v0.41.23's contribution alongside the prior version stamp. Verified hermetic: running test/schema-cli.test.ts with a clean GBRAIN_HOME tempdir produces 12/12 pass (the local-machine 'schema active' fail is from a real ~/.gbrain pinning gbrain-base-v2; not a shipped-code issue, doesn't repro on CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): pack-locator stub leak between shard 6 test files CI shard 6 caught three flaky failures in test/onboard-pack-upgrade-checks.test.ts: - checkPackUpgradeAvailable > fires on gbrain-base brain with gbrain-base-v2 - checkPackUpgradeAvailable > manual_only routing via render.ts allowlist (D17) - checkTypeProliferation > warns when distinct types exceed declared+5 Root cause: test/schema-pack-sync.test.ts calls `__setPackLocatorForTests(...)` to stub the disk-loader, but doesn't restore in afterAll. Bun's CI shard 6 loads multiple test files into one process; when sync.test.ts runs before onboard-pack-upgrade-checks.test.ts, the stubbed locator persists at module scope. `loadActivePack` for gbrain-base / gbrain-base-v2 then returns null and: - findPackSuccessors returns [] → status='ok' instead of 'warn' (F1+F2) - declared falls back to 15 → fail threshold becomes 30, 32 > 30 → 'fail' instead of 'warn' (F3) Local single-file runs pass because the locator starts at its default. Two-layer fix: 1. test/schema-pack-sync.test.ts afterAll calls `_resetPackLocatorForTests()` to undo the mutation (the canonical fix at the source). 2. test/onboard-pack-upgrade-checks.test.ts beforeEach calls the same reset (defense-in-depth against any future test file in the shard that forgets to restore). Reproduced locally: running the three shard-6 schema-pack files together fails 3 tests pre-fix and passes 30/30 post-fix. Full shard 6 sweep (77 files, 1232 tests) now green; bun run verify still 28/28. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): pglite-engine — dim-agnostic chunk-embedding test data CI shard 6 caught two flaky failures in test/pglite-engine.test.ts: - PGLiteEngine: Chunks > getChunksWithEmbeddings returns embedding data - PGLiteEngine: stale chunk pagination > countStaleChunks counts chunks with NULL embedding only Both failed with `expected 1280 dimensions, not 1536` at the upsert site. Root cause: pglite-engine.ts:287 initSchema() reads embedding dim from gw.getEmbeddingDimensions() if the gateway is configured (potentially left in that state by another shard-6 test file in the same bun process), falling back to DEFAULT_EMBEDDING_DIMENSIONS otherwise — which is 1280 since v0.36+ when the ZE default landed (zeroentropyai:zembed-1). Pre-v0.36 defaults were OpenAI's 1536; my test data was pinned to that stale literal. The two outcomes that pass: - gateway happens to be configured for 1536-dim (e.g. master shard 6 run 26515999465 — these tests passed at 20ms + 24ms with no "dimensions" error) - gateway happens to be configured for 1280-dim AND test data is 1280 The outcome that fails: - gateway configured for 1280-dim AND test data hardcoded to 1536 Fix: capture the actual column width after initSchema (probe pg_attribute.atttypmod for content_chunks.embedding) and use that captured `CHUNK_EMBED_DIM` constant at the three Float32Array sites. Test data now matches whatever width the column was created at, regardless of which shard-6 file ran first. Local repro: full shard 6 (77 files, 1232 tests, ~6min) green; this file standalone (100 tests) green; bun run verify 28/28. Broader pattern: 9 other test files use the same Float32Array(1536) literal. None land in shard 6 today (so they don't flake), but the fix shape here can be lifted into a shared helper if the bug class surfaces elsewhere — filed as a v0.42+ follow-up rather than a preemptive sweep, since each file's setup shape is slightly different. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…1/v0.42 reality Two stale E2E assertion files surfaced by a full local E2E run against real Postgres (the gbrain-test-pg container on port 5434). Neither file is in the CI E2E job (CI only runs mechanical.test.ts + mcp.test.ts + skills.test.ts + zeroentropy-live.test.ts), so the drift has been latent. 1. `test/e2e/dream-cycle-phase-order-pglite.test.ts` EXPECTED_PHASES was missing 4 phases that landed in master since the list was last revised: - extract_atoms (v0.41 T9 — atom extraction, after extract_facts) - synthesize_concepts (v0.41 T9 — concept synthesis, after patterns) - conversation_facts_backfill (v0.41.11.0, after calibration_profile) - skillopt (v0.42.0.0 — self-evolving skills, between conversation_facts_backfill and embed) Updated to 21 entries in the actual runtime dispatch order (matches ALL_PHASES exactly). 5/5 tests in the file pass after. 2. `test/e2e/onboard-full-flow.test.ts` `runAllOnboardChecks` shape test asserted exactly 4 checks; v0.42's type-unification cathedral (PR #1542, T13-T15) added 3 more (`pack_upgrade_available`, `type_proliferation`, `dangling_aliases`) for a total of 7. And `empty brain returns 0 remediations` regressed because `pack_upgrade_available` can emit a manual_only remediation on brains where gbrain-base@1.x is active and gbrain-base-v2 is registered as a successor. Tightened that assertion to `total <= 1` AND kept a per-check guard asserting takes_count remediations stay 0 (the original test's load-bearing claim — A12 two-gate consent). 13/13 tests in the file pass after. Honest scope: 4 other E2E files still fail locally after this commit (cycle.test.ts, dream.test.ts, phantom-redirect.test.ts, sync-lock-recovery.test.ts), each for a distinct pre-existing master bug unrelated to v0.42 skillopt work: - cycle.test.ts (5 fails): PostgresEngine.getConfig falls back to db.getConnection() singleton via the `get sql()` getter when no poolSize is set; the new conversation_facts_backfill phase chain hits this fallback even though the test's setupDB() connects both the singleton AND the engine. Race condition between the test's singleton lifecycle and the phase's getConfig call. Deeper fix needed in PostgresEngine.getConfig (use this._sql directly with explicit fallback only on user-driven CLI paths). - dream.test.ts (1 fail): expects "concepts/testing" slug to appear in dream cycle output, gets empty array. Related to v0.42 concept type-unification semantics. - phantom-redirect.test.ts (2 fails): concurrent-sync race + postgres-js text-string embedding survival. Master-level data-path bug; would need its own fix wave. - sync-lock-recovery.test.ts (1 fail): `gbrain sync --break-lock --all` exits 0 but test expects 1 with a shell-loop hint. CLI behavior changed in a master commit; need to either restore the refusal behavior or update the assertion. None of these 4 block CI (E2E job doesn't run them). Filed as a TODOS.md entry for a follow-up wave; the 2 in this commit are the ones that mirror v0.42 work landing. Local: 130/136 E2E files green, 927/940 tests pass (was 925/940 before these fixes; the 2 files this commit fixes added 7 newly- passing tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)

…#1563) * feat(skillopt): foundation modules — types, lr-schedule, benchmark, score, audit, lock * feat(skillopt): edit primitives — apply-edits (D5+D9), rejected-buffer LRU, version-store (D8 history-intent-first) * feat(skillopt): rollout (D2 gateway.toolLoop + D13 read-only allowlist), reflect (D7 two calls), validate-gate (D12 median+epsilon, D4 parallel), preflight (D3), bundled-skill-gate (D16) * feat(skillopt): orchestrator (D6 slow-update, D10 ASCII diagrams, D11 caching), checkpoint, bootstrap (D15 sentinel), CLI dispatch + help * feat(skillopt): cycle phase (F1 dream-loop wiring), PROTECTED_JOB_NAMES + MCP op (F6 admin scope + allowlist) + Minion handler (F7 --background) * feat(skillopt): full cathedral — --all batch (F4), --target-models fleet (F5), write-capture (F10), held-out scaffold (F11), adversarial suite 41 cases (F2), E2E PGLite (F3), meta-skill bundle (T7), reflect+judge evals (F8+F9), docs (T10) * chore: bump version to v0.42.0.0 (MINOR — significant new feature) * fix(skillopt): wire trajectories from forward gate to reflect + fix parseEditsResponse parser misuse Two related v0.42.0.0 bugs that conspired to make `runSkillOpt` structurally unable to accept any candidate edit. Either alone would have killed self-evolution; together they made the loop a no-op for every input. **Bug 1 (orchestrator gap):** `runOptimizationLoop` in orchestrator.ts called `runReflect({successes: [], failures: []})` with hardcoded empty arrays. The forward gate's `scoredRollouts` were computed then voided. `runReflect` short-circuits both modes when their batches are empty, so the optimizer was never asked to propose an edit. Every step hit the no_edits_applied branch. Fix: add `scoredRollouts: ScoredRollout[]` to `GateResult` and `runsPerTask?: number` to `ValidateGateOpts`. Forward pass uses `runsPerTask: 1`; orchestrator partitions returned rollouts by `score >= 0.5` and threads real successes + failures into `runReflect`. **Bug 2 (parser misuse):** `parseEditsResponse` in reflect.ts routed every optimizer response through `parseJudgeJson` first. `parseJudgeJson` looks for a `score` key (it's a judge-output parser, not an edits parser) and returns null for any JSON without one — including the well-formed `{"edits": [...]}` the optimizer is contractually required to emit. The function then early- returned `[]` and the actual `tryExtractEdits` path on the next line was unreachable dead code. Fix: drop the wrong-typed guard. `parseEditsResponse` now calls `tryExtractEdits` directly. Export it so `reflect.test.ts` can pin the contract independently of the chat transport. **Why this slipped through 152 prior skillopt tests:** zero unit coverage of `parseEditsResponse` or `runReflect`. The existing E2E `all-reject` case asserted no_improvement (which was true for the wrong reason — empty edits, not gate rejection). Both bugs were structurally invisible to the existing test surface. **New coverage:** - `test/skillopt/reflect.test.ts` (15 cases): - 8 `parseEditsResponse` cases including the IRON-RULE regression pin for the v0.42.0.1 fix (`{"edits": [...]}` JSON must survive the parser). - 7 `runReflect` D7 contract cases: both modes fire, empty-batch skips, additive token usage, one-mode-throws-other-still-works, rejected-buffer flows into anti-bias prompt. - Documents the trailing-comma limitation as an explicit out-of-scope pin (so a future tightening of `tryExtractEdits` lights this test up intentionally). - `test/e2e/skillopt-loop.serial.test.ts` (7 cases): - HAPPY PATH: stubbed `gateway.chat` acts as both target agent (emits sections based on skill content) and optimizer (proposes a real add-Citations edit). Drives `runSkillOpt` end-to-end against PGLite. Asserts outcome=accepted, SKILL.md mutated with new section, frontmatter preserved (D5), history has one committed row, best.md mirrors disk, delta > epsilon, receipt fields populated. - 5 broken cases (each isolates a distinct orchestrator-visible failure): 1. Below-baseline regression: optimizer proposes a destructive edit; gate rejects with reason=below_baseline; SKILL.md unchanged; rejected-buffer captures the bad edit for anti-bias context. 2. Malformed reflect JSON: orchestrator degrades gracefully to no_improvement without crashing. 3. Anchor-not-found: applyEditBatch rejects all; sel gate skipped; rejected-buffer captures with reason=apply_failed. 4. Budget exhausted mid-step: outcome=aborted, no pending rows survive. 5. Converged-skill re-run: starting from already-perfect skill → no_improvement (no thrash on a well-tuned starting point). - IDEMPOTENT RE-RUN: drive runSkillOpt twice in sequence. Run 1 accepts. Run 2 sees improved baseline, no failures, returns no_improvement. SKILL.md byte-identical to post-run-1; history still has exactly 1 committed row. Proves stability at the fixed point. All hermetic (no DATABASE_URL, no API keys). PGLite in-memory engine, tempdir SKILL.md + benchmark, stubbed gateway.chat via `__setChatTransportForTests`. `.serial.test.ts` because the stub installs module state and the loop walks shared disk state across epochs. Test counts after fix: 174 skillopt-surface tests pass (149 pre-existing unit + 15 new reflect unit + 3 existing E2E + 7 new E2E). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cycle): align ALL_PHASES skillopt position with actual dispatch order v0.42.0.0 added skillopt to ALL_PHASES right after `patterns` (line 127), but the dispatch block in runCycle (line ~1912) actually runs skillopt between `conversation_facts_backfill` and `embed`. The two were inconsistent, and the serial test `report.phases.map(p => p.phase)).toEqual(ALL_PHASES)` was failing on master because of it. A second pre-existing failure: the two phase-count assertions in `test/core/cycle.serial.test.ts` still said `toBe(20)` even though ALL_PHASES grew to 21 when skillopt was added. The author bumped the array but forgot the test. Two fixes, one commit: 1. Move `'skillopt'` in ALL_PHASES from after `patterns` to between `conversation_facts_backfill` and `embed`, matching where runCycle actually dispatches it. Runtime behavior is unchanged — only the declaration order moves. Updated the surrounding comment to call out the position invariant and reference the test that pins it. 2. Update both `toBe(20)` assertions in cycle.serial.test.ts to `toBe(21)` with a v0.42.0.0 history line in the running comments. Why declaration follows runtime (not the other way around): the comment intent ("Runs AFTER patterns — graph-fresh") is still satisfied because "after the entire main graph-mutating cluster" is strictly fresher than "right after patterns". No design intent is lost. Test result: cycle.serial.test.ts is now 28/28 (was 27/28 on master + my prior commit). Skillopt suite still 174/174. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): bump PHASE_SCOPE assertion to 21 + fix skill-optimizer Anti-Patterns case Two CI failures pre-existing on this branch since the v0.42.0.0 skillopt cathedral landed; master is green because skillopt didn't exist there yet. 1. test/phase-scope-coverage.test.ts asserted ALL_PHASES.length === 20. skillopt is the 21st phase. Bumped to 21 with v0.42.0.0 history line in the comment chain. Sibling fix to the cycle.serial.test.ts bump in commit 08ad246. 2. skills/skill-optimizer/SKILL.md had `## Anti-patterns` (lowercase p). skills-conformance.test.ts asserts `## Anti-Patterns` (capital P) as the required section header. Single-character rename. Local: 174 skillopt-surface tests + 6 phase-scope tests + 249 skills- conformance tests all green. Typecheck clean. Remaining CI delta: 5 put_page facts backstop failures in shard 10 that reproduce only on Linux CI, not locally even with empty env / cleared HOME / max-concurrency=1. The error surface is `r.isError === true` with no further detail captured in the bun:test output. Pushing these 2 fixes first to narrow the CI signal; will instrument if the 5 persist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(e2e): align dream-cycle-phase-order + onboard-full-flow with v0.41/v0.42 reality Two stale E2E assertion files surfaced by a full local E2E run against real Postgres (the gbrain-test-pg container on port 5434). Neither file is in the CI E2E job (CI only runs mechanical.test.ts + mcp.test.ts + skills.test.ts + zeroentropy-live.test.ts), so the drift has been latent. 1. `test/e2e/dream-cycle-phase-order-pglite.test.ts` EXPECTED_PHASES was missing 4 phases that landed in master since the list was last revised: - extract_atoms (v0.41 T9 — atom extraction, after extract_facts) - synthesize_concepts (v0.41 T9 — concept synthesis, after patterns) - conversation_facts_backfill (v0.41.11.0, after calibration_profile) - skillopt (v0.42.0.0 — self-evolving skills, between conversation_facts_backfill and embed) Updated to 21 entries in the actual runtime dispatch order (matches ALL_PHASES exactly). 5/5 tests in the file pass after. 2. `test/e2e/onboard-full-flow.test.ts` `runAllOnboardChecks` shape test asserted exactly 4 checks; v0.42's type-unification cathedral (PR #1542, T13-T15) added 3 more (`pack_upgrade_available`, `type_proliferation`, `dangling_aliases`) for a total of 7. And `empty brain returns 0 remediations` regressed because `pack_upgrade_available` can emit a manual_only remediation on brains where gbrain-base@1.x is active and gbrain-base-v2 is registered as a successor. Tightened that assertion to `total <= 1` AND kept a per-check guard asserting takes_count remediations stay 0 (the original test's load-bearing claim — A12 two-gate consent). 13/13 tests in the file pass after. Honest scope: 4 other E2E files still fail locally after this commit (cycle.test.ts, dream.test.ts, phantom-redirect.test.ts, sync-lock-recovery.test.ts), each for a distinct pre-existing master bug unrelated to v0.42 skillopt work: - cycle.test.ts (5 fails): PostgresEngine.getConfig falls back to db.getConnection() singleton via the `get sql()` getter when no poolSize is set; the new conversation_facts_backfill phase chain hits this fallback even though the test's setupDB() connects both the singleton AND the engine. Race condition between the test's singleton lifecycle and the phase's getConfig call. Deeper fix needed in PostgresEngine.getConfig (use this._sql directly with explicit fallback only on user-driven CLI paths). - dream.test.ts (1 fail): expects "concepts/testing" slug to appear in dream cycle output, gets empty array. Related to v0.42 concept type-unification semantics. - phantom-redirect.test.ts (2 fails): concurrent-sync race + postgres-js text-string embedding survival. Master-level data-path bug; would need its own fix wave. - sync-lock-recovery.test.ts (1 fail): `gbrain sync --break-lock --all` exits 0 but test expects 1 with a shell-loop hint. CLI behavior changed in a master commit; need to either restore the refusal behavior or update the assertion. None of these 4 block CI (E2E job doesn't run them). Filed as a TODOS.md entry for a follow-up wave; the 2 in this commit are the ones that mirror v0.42 work landing. Local: 130/136 E2E files green, 927/940 tests pass (was 925/940 before these fixes; the 2 files this commit fixes added 7 newly- passing tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): quarantine query-cache-knobs-hash.test.ts to serial runner CI shard 10 (commit 4d72107) failed 5 tests in the `SemanticQueryCache cross-mode isolation (CDX-4 hotfix)` describe block, all ~7-34ms each, all expecting writes/reads to round-trip through one shared PGLite engine + a `beforeEach DELETE FROM query_cache`. Passes 9/9 locally; fails 5/9 on Linux CI under bun's default in-file max-concurrency=4. Classic intra-file concurrency race shape: test A's `beforeEach` clears the table → test A's `store` writes a row → test B's `beforeEach` (concurrent with A's `store`) clears the table → test A's follow-up COUNT query returns 0. Same root cause that quarantined `embed-stale.test.ts`, `brain-allowlist.test.ts`, and `schema-pack-find-pack-successors.test.ts` to the serial runner in prior fix waves (documented in v0.41.22.0 CI fix wave). Fix: rename to `query-cache-knobs-hash.serial.test.ts` so the v0.26.7 serial-tests runner picks it up at `max-concurrency=1`. Tests still exercise the actual cache logic — no test deleted, no production code changed. The describe block's `beforeAll` engine + `beforeEach` TRUNCATE pattern works correctly at serial concurrency. Local: 12/12 in this file + 52/52 in the serial runner. Production SemanticQueryCache code is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(heavy): frontmatter_scan_wallclock — opt into --no-embedding so CI runners work Heavy tests workflow run 26542447602 (commit 483a557) failed on the first heavy script: [fm_wallclock] FAIL: gbrain init exited non-zero No embedding provider configured. Set one of: OPENAI_API_KEY / ZEROENTROPY_API_KEY / VOYAGE_API_KEY Or defer setup: gbrain init --pglite --no-embedding The v0.37 D9 hard-require landed in init.ts: `gbrain init --pglite` now refuses to proceed without an embedding provider configured. The heavy-tests GitHub workflow doesn't pipe any embedding API keys (deliberate — the heavy tests measure ops shape, not LLM behavior), so every CI invocation now blocks at step 2 of this script. The script's whole purpose is measuring `gbrain doctor`'s frontmatter-scan wallclock — it never embeds, never calls `gbrain embed`, never queries vectors. The right fix is to opt out of the provider requirement via the same `--no-embedding` flag init.ts already exposes for this exact "deferred setup" case. Verified locally: TMP=$(mktemp -d); GBRAIN_HOME="$TMP" \ bun run src/cli.ts init --pglite --yes --no-embedding # exit 0, brain initialized. No production code change. One-line + comment in the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(heavy): sync_lock_regression — pass --no-embed so CI runs measure lock contention, not key absence Heavy tests workflow run 26542545802 (commit 7962d31, after the previous fm_wallclock fix) failed at the next heavy script in the chain: [sync_lock_regression] outcomes: winners=0 losers=0 unknown=4 [sync_lock_regression] FAIL: expected 1 winner, got 0 [sync_lock_regression] FAIL: expected 3 lock-busy losers, got 0 Each of the 4 parallel `gbrain sync` invocations failed for the same reason — none of them ever even got to the lock-acquire step: Embedding model "zeroentropyai:zembed-1" requires ZEROENTROPY_API_KEY. Re-run with --no-embed to import-only and embed later once the key is set. The CI runner doesn't pipe any embedding-provider API keys (deliberate — heavy tests measure ops shape, not LLM behavior), and sync now hard-fails when its embed step can't reach a configured provider. This script measures the writer-lock race shape — `gbrain-sync` row in `gbrain_cycle_locks`, exactly-one-winner semantics, N-1 fail-fast losers with "Another sync is in progress", zero leaked rows post-run. It never needed embeddings; the original write predates the hard-require landing. Fix: pass `--no-embed` to the sync invocation. Same kind of fix as fm_wallclock (commit 7962d31) but on the sync side rather than init. No production code touched. One-line change in the bash script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(heavy): sync_lock_regression — register source via psql + use --repo + tolerate doctor warns Heavy tests run 26542638471 (commit 60145ee, after the --no-embed fix) failed at the same script but at a downstream step: > Source "default" has no local_path. Run: gbrain sources add default --path <path> Three independent bugs in the script that all surfaced at once after v0.41's source-registry landed: 1. `gbrain config set sync.repo_path` is the legacy way; sync now reads `sources.local_path` first. Replaced with an upsert into the sources table via psql: INSERT INTO sources (id, name, local_path) VALUES ('default', 'default', $BRAIN_DIR) ON CONFLICT (id) DO UPDATE SET local_path = EXCLUDED.local_path Kept the legacy `config set sync.repo_path` line too as belt-and-suspenders for any downstream caller that still reads it. 2. `gbrain sync --dir <path>` is silently ignored; sync's CLI parser recognizes `--repo`, not `--dir`. Switched to `--repo`. 3. `bun run src/cli.ts doctor --json` at the top (used to apply migrations as a side effect) exits non-zero whenever ANY check warns — including the new "no embedding provider configured" warning on a fresh CI runner. The script's `set -e` aborted at line 53 before reaching any of the sync invocations. Added `|| true` since the migration runs regardless of doctor's exit verdict. Verified locally — `DATABASE_URL=... bash tests/heavy/sync_lock_regression.sh` output: [sync 1] rc= (lock-busy: 'Another sync is in progress') [sync 2] rc=0 (winner) [sync 3] rc= (lock-busy: 'Another sync is in progress') [sync 4] rc= (lock-busy: 'Another sync is in progress') outcomes: winners=1 losers=3 unknown=0 post-run gbrain_cycle_locks(gbrain-sync) row count: 0 OK — 1 winner, 3 lock-busy losers, no leaked lock rows. Production code untouched. All three fixes are in the bash script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(skillopt): hands-on tutorial for auto-improving a skill + discoverability There was no tutorial for skillopt — only a reference guide (docs/guides/skillopt.md) that opens at --bootstrap-from-routing and assumes you already understand benchmarks, and an agent-facing SKILL.md. README had ZERO skillopt mention. The one thing a user must hand-author (the benchmark JSONL) was taught nowhere with a worked example. New: docs/tutorials/improving-skills-with-skillopt.md — Diataxis tutorial (learning-oriented), copy-pasteable end to end: 1. mental model in two sentences (SKILL.md is the trainable param, the agent is frozen) 2. write your first benchmark from scratch — a complete 15-task rule-judge starter you paste and run, with the full check-op table (contains/regex/section_present/max_chars/min_citations/tool_called/ tool_not_called) 3. --dry-run cost preview (and that it exits 2 by convention, not failure) 4. real run + reading accepted(0)/no_improvement(1)/aborted(2) with the actual stderr output shape 5. where output lands (best.md, versions/, history.json, rejected.json, audit jsonl) 6. accept/reject — bundled vs user skills, --no-mutate vs --allow-mutate-bundled 7. iterate by sharpening the benchmark The load-bearing fix the tutorial makes that the reference guide got wrong: the DEFAULT --split 4:1:5 needs ~50 tasks before it runs (sel = N/10, floor 5). A first-time author writing 10-15 tasks hits `D_sel has N task(s) (need >=5)` and bounces. The tutorial ships 15 tasks + `--split 1:1:1` (clean 5/5/5) so the copy-paste path actually works. Verified against the real loadBenchmark + splitBench: the exact shipped block parses 15 unique tasks and splits 5/5/5 with sel>=5; the system's own error message confirms "need ~50 total for 4:1:5". Discoverability (Diataxis cross-linking): - README.md tutorials section: new entry (was zero skillopt mention) - docs/tutorials/README.md: added under ## Shipped - docs/guides/skillopt.md: "New to this? Start with the tutorial" callout Every claim devex-verified against source: exit-code map from skillopt.ts (accepted:0/no_improvement:1/aborted:2/errored:2), stderr format from skillopt.ts:286-292, check ops from score.ts, output paths from SKILL.md, split math from benchmark.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: regenerate llms-full.txt after skillopt tutorial + README edit Refreshes the inlined doc bundle so the committed llms-full.txt matches fresh `bun run build:llms` output (test/build-llms.test.ts drift guard). Picks up the README tutorials-section edit from c39dbdb. The new tutorial file itself isn't curated into scripts/llms-config.ts (the bundle curates a fixed doc set, not every tutorial) — this is purely the README delta. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): stop embed-preflight leaking gateway config into facts-backstop shard CI shard 10 failed 5 `put_page facts backstop` tests with: [embed(openai:text-embedding-3-small)] Incorrect API key provided: sk-test (captured by the diagnostic stderr added in a prior commit). Root cause is a cross-file module-state leak, not a logic bug: - `embed-preflight.test.ts` calls `configureGateway({env:{OPENAI_API_KEY: 'sk-test'}})` to drive credential-validation scenarios. It resets the gateway `beforeEach` but never AFTER its last test, so it leaves the gateway configured with `sk-test`. - bun runs every file in a shard inside ONE process. The residual config bleeds into the next file. When `facts-backstop-gating.test.ts` lands in the same shard, its put_page calls see `isAvailable('embedding') === true` (the key is *present*, just invalid), so put_page attempts a real embed and 401s before the backstop gating even runs. - It's intermittent across master merges because shard bin-packing changes which files co-locate. (It "resolved" after the v107 merge earlier for exactly this reason, then came back.) R1/R2 test-isolation lint doesn't catch this — it's `configureGateway` module state, not `process.env` or `mock.module`. Two fixes, both using the gateway's own `resetGateway()` seam (no process.env, R-compliant): 1. embed-preflight.test.ts — `afterAll(() => resetGateway())` so the leaker cleans up after the whole file. Primary fix; also protects any OTHER shard-mate that reads gateway state. 2. facts-backstop-gating.test.ts — `beforeEach(() => resetGateway())` so the suite is deterministic regardless of ambient gateway config. Defense in depth: isAvailable('embedding') is now reliably false → put_page uses noEmbed → the import never embeds → only the backstop gating (the suite's actual subject) is exercised. Verified: running leaker+victim in one process (the shard repro) goes 16/16; full shard 10 goes 1208/1208 (was 5 fail in CI). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(skillopt): make benchmark authoring an agent job, not a human chore The prior tutorial taught a human to hand-write a 15-task benchmark — but nobody does that. The real workflow is: user says "make skill X better," the AGENT authors the benchmark and runs the optimizer. The agent-facing dispatcher didn't actually cover that. Gap found: skill-optimizer/SKILL.md documented exactly one authoring path, `--bootstrap-from-routing`, which (a) requires a pre-existing routing-eval.jsonl (bootstrap-benchmark.ts:57-63 refuses without it) and (b) generates tasks from ROUTING fixtures — which test dispatch ("does this phrasing pick this skill"), not output quality. So an agent told to improve a skill with no benchmark had no documented way to author a *quality* benchmark; it'd have to reinvent the JSONL format the human tutorial teaches. Two fixes: 1. skills/skill-optimizer/SKILL.md — new "Authoring the benchmark yourself (the common case)" section: read the target SKILL.md, generate ~15 realistic tasks, attach rule judges (contains/max_chars/min_citations/ section_present/regex/tool_called), write the JSONL, run with `--split 1:1:1` (the default 4:1:5 needs ~50 tasks). Decision-tree row "New skill, no benchmark" now says "Author one" instead of pointing at bootstrap-from-routing; the bootstrap row is reframed as a head-start that only applies when routing fixtures exist and notes routing tasks test dispatch, not quality. 2. docs/tutorials/improving-skills-with-skillopt.md — new "The easiest path: ask your agent" section up top. Tells humans to just tell their agent "improve my X skill — write a benchmark first," and frames the manual walkthrough as "read this when you want to understand or hand-curate what the agent is doing." Verified: conformance 249/0, resolver 99/0, build-llms drift guard 7/0, cross-link resolves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(skillopt): --bootstrap-from-skill starter benchmark generator Generate a quality benchmark from a skill's SKILL.md directly, no routing-eval.jsonl required. One LLM call emits JSONL tasks (each with rule judges) that the agent reviews + strengthens before optimizing. - runBootstrapFromSkill: JSONL output parsed line-by-line with skip-bad-line salvage (a truncated final line drops, the rest survive); a task is kept only when >=2 valid rule checks survive; provider errors propagate instead of collapsing to bootstrap_empty. - --bootstrap-tasks N (default 15, cap 50); maxTokens scales with the count. - Extracted assertBenchmarkAbsent + readSkillBodyOrThrow shared with the routing bootstrap; hardened runBootstrap's routing-eval parse to skip malformed lines. - CLI: --bootstrap-from-skill short-circuit + 6-way mutual exclusion; parseFlags exported for unit tests. The benchmark-not-found hint + --help now point here. - The generator's REVIEW line prints the paste-ready `--bootstrap-reviewed --split 1:1:1` next command (the default 4:1:5 split refuses a 15-task starter at D_sel >= 5). - 20 hermetic cases incl. round-trip into loadBenchmark + splitBench(1:1:1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(skillopt): make --bootstrap-from-skill the primary no-benchmark path The agent runs --bootstrap-from-skill, strengthens the generated judges (they are weak drafts), deletes the sentinel, then runs --bootstrap-reviewed --split 1:1:1. Freehand authoring is demoted to the fallback for the rare skill the generator can't draft well. Updates the Iron Law, decision tree, and anti-patterns to cover both bootstrap modes and the 15-task / --split 1:1:1 gotcha. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(release): v0.42.1.0 --bootstrap-from-skill VERSION + package.json -> 0.42.1.0, CHANGELOG entry, CLAUDE.md skillopt annotation, regenerated llms-full.txt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: surface --bootstrap-from-skill in README + skillopt reference - docs/guides/skillopt.md: 30-second pitch leads with --bootstrap-from-skill; flag table adds --bootstrap-from-skill + --bootstrap-tasks rows. - README.md: skillopt tutorial pointer mentions generating a starter benchmark. - Regenerated llms-full.txt (README is in the bundle). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ci): bump FULL_SIZE_BUDGET 700KB→750KB for legitimate CLAUDE.md growth The skillopt wave annotations + merged v0.41.34-36 master releases pushed llms-full.txt to 700,423 bytes — 423 over the 700KB cap — failing the build-llms size-budget test on CI shard 6. CLAUDE.md is ~540KB (77% of the bundle) and is the whole point of the one-fetch artifact, so it stays inlined; the budget tracks its per-release growth. 750KB still fits 200k+ context models. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan and others added 2 commits May 26, 2026 23:19

Merge branch 'master' into garrytan/type-taxonomy-unification

b46260a

Resolve VERSION, package.json, CHANGELOG conflicts with v0.41.22.0 on top, preserving master's v0.41.19.0 entry below.

garrytan mentioned this pull request May 27, 2026

v0.41.23.0 feat: extract operator surfaces + pack-driven extractables #1541

Merged

6 tasks

garrytan added 7 commits May 26, 2026 23:39

garrytan merged commit 5d42f32 into master May 27, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes #1479)#1542

v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes #1479)#1542
garrytan merged 9 commits into
masterfrom
garrytan/type-taxonomy-unification

garrytan commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New

Test Coverage

Plan Completion

Pre-Landing Review

Migration

To take advantage of v0.41.22.0

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented May 27, 2026 •

edited

Loading