Skip to content

v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes #1479)#1542

Merged
garrytan merged 9 commits into
masterfrom
garrytan/type-taxonomy-unification
May 27, 2026
Merged

v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes #1479)#1542
garrytan merged 9 commits into
masterfrom
garrytan/type-taxonomy-unification

Conversation

@garrytan

@garrytan garrytan commented May 27, 2026

Copy link
Copy Markdown
Owner

Summary

Your brain runs on a real taxonomy now. Not 94 types of cruft. Fifteen canonical types you can name, plus a catch-all for the long tail.

A real production brain (186K pages) had accreted 94 distinct pages.type values in 9 clusters of redundancy. tweet / tweet-thread / tweet-bundle / tweet-single all coexisting. 5.5K concept-redirect pages bloating orphan counts. atom-partner-link pages that should be real link rows. company / yc-company / product / organization all fighting for the same idea. The type system is the foundation for schema packs, search filtering, extract behavior, enrichment routing, and expert routing. When types are noisy, every downstream feature degrades.

This release ships the cathedral that collapses 94 → 14 canonical types (plus note as the catch-all = 15 total) on any brain that opts in.

What you can do that you couldn't before:

  • gbrain init defaults to gbrain-base-v2 (15 canonical types).
  • gbrain onboard --check --explain shows the per-cluster narrative for the gbrain-base → v2 migration.
  • gbrain jobs submit unify-types --allow-protected --params '{"target_pack":"gbrain-base-v2"}' runs the full migration end-to-end: retypes pages, creates alias rows, converts edge-shaped pages into real link rows, then atomically flips the active pack.
  • Reversible via 72h soft-delete TTL on alias/link pages + frontmatter.legacy_type preservation on retyped pages.
  • --type article queries keep working (D14 back-compat: alias-expands to media subtype=article at SQL build time).

What's New

Schema-pack primitives (atomic per-page transactions, source-scoped):

  • runRetypeCore — chunked UPDATE with frontmatter.legacy_type always-stamped + subtype_field strict allowlist (D9)
  • runPageToLinkCore — converts edge-shaped pages into real links rows (atomic per-page txn, F7)
  • runPageToAliasCore — concept-redirect pages → slug_aliases rows (D15: alias table IS the resolver, NO rewriteLinks)
  • rewriteLinksBatch — N-pair atomic FK rewrite

Schema-pack manifest extensions:

  • subtypes: array per page_type drives inferTypeAndSubtypeFromPack (ReDoS-guarded)
  • mapping_rules: discriminated union over retype / page_to_link / page_to_alias
  • migration_from: field declares successor relationship; powers findPackSuccessors version-range walker
  • expandTypeFilter--type X legacy alias expands to canonical+subtype at SQL build time (D14)

Migration v104 slug_aliases — forward-bootstrap probed on both Postgres + PGLite.

Engine method: resolveSlugWithAlias(slug, sourceOrSources) on both engines with multi-source ambiguity warning (F10).

Three new onboard checks (registered in runAllOnboardChecks):

  • pack_upgrade_available (manual_only via render.ts allowlist per D17)
  • type_proliferation (pack-aware ratio, D16)
  • dangling_aliases (source-scoped per F12)

unify-types PROTECTED Minion handler — 4-phase lifecycle: retype-explicit → retype-catch-all → page-to-link → page-to-alias → final sync → atomic active-pack flip.

Search: alias_resolved 1.05x post-fusion boost stage. KNOBS_HASH_VERSION bumped 5→6 (one-time cache miss spike on upgrade, self-healing in TTL).

ELIGIBLE_TYPES for facts extraction extended with v2 canonicals (media, tweet, atom, concept, analysis) — codex F-ELIGIBLE caught the original deferred-to-v0.43 plan as a blocker; undeferred.

Test Coverage

  • 79 new unit/integration cases across 11 new test files (retype, page-to-link, page-to-alias, rewrite-links-batch, resolve-slug-with-alias engine parity, find-pack-successors, infer-type-and-subtype, expand-type-filter, onboard-pack-upgrade-checks, search-alias-resolved-boost, unify-types-handler)
  • 3 E2E cases in test/e2e/type-unification-full-flow.test.ts covering all 9 production clusters end-to-end: 94 → ≤16 distinct types, alias rows created, link rows inserted, active pack flipped, idempotent re-run
  • 208-test verification suite confirms KNOBS_HASH_VERSION 5→6, build-llms regeneration, cycle test correctness in isolation, and full v0.41.22 test suite. 0 failures.
  • Typecheck clean

Plan Completion

All 9 lanes (A1, A2, B, C, D, E, F, G, H) from the plan are DONE. 16 locked decisions (D1-D17) implemented. 12 baseline fixes (F7-F21) absorbed from codex outside voice. Plan + GSTACK REVIEW REPORT at ~/.claude/plans/system-instruction-you-are-working-transient-elephant.md.

Pre-Landing Review

Cycle test passed in isolation (5/5) — initial parallel-test resource contention with a sibling worktree, not a regression. KNOBS_HASH_VERSION cache invalidation comments expanded to document the 5→6 bump rationale per CDX2-F13 append-only convention.

Migration

Schema migration v104 (slug_aliases table) with forward-bootstrap probed on both Postgres and PGLite. Atomic per-page transactions (codex F7). NO rewriteLinks for page-to-alias (D15 — alias table IS the resolver). Source-scoping throughout (F9, F10, F12).

To take advantage of v0.41.22.0

If you're a NEW user: gbrain init defaults to gbrain-base-v2. Done.

If you're an EXISTING user on gbrain-base:

  1. gbrain upgrade — pulls v0.41.22 binaries + applies migration v104.
  2. gbrain onboard --check --explain — preview the migration.
  3. gbrain jobs submit unify-types --allow-protected --params '{"target_pack":"gbrain-base-v2"}' — run it. ~10 min on a 186K-page brain.
  4. gbrain jobs follow <job_id> — watch progress per phase.
  5. gbrain onboard --check — verify pack_upgrade_available and type_proliferation report ok.

Test plan

  • Typecheck clean
  • 208/208 critical tests pass (KNOBS_HASH_VERSION fixes + cycle + build-llms + all 79 new v0.42 unit cases)
  • All 3 E2E cases pass against PGLite
  • Schema migration v104 applies cleanly on both Postgres and PGLite
  • Re-verified via second /ship pass

Closes #1479

🤖 Generated with Claude Code

garrytan and others added 2 commits May 26, 2026 23:19
Resolve VERSION, package.json, CHANGELOG conflicts with v0.41.22.0
on top, preserving master's v0.41.19.0 entry below.
…closes #1479)

Ships gbrain-base-v2 as the new install default (15 canonical types: 14
+ note catch-all) and the unify-types PROTECTED Minion handler that
runs the gbrain-base→v2 migration end-to-end on existing brains.

What this delivers:
- gbrain-base-v2.yaml standalone schema pack (no extends:) with 14
  canonical page_types + 9 cluster mapping_rules + catch-all sentinel
- 3 new schema-pack primitives: runRetypeCore (chunked UPDATE with
  legacy_type stamping), runPageToLinkCore (edge-shaped pages →
  link rows), runPageToAliasCore (concept-redirect → slug_aliases)
- rewriteLinksBatch for N-pair atomic FK rewrite
- Migration v104 slug_aliases table (forward-bootstrap probed on both
  engines for safe upgrade chain)
- New engine method resolveSlugWithAlias(slug, sourceOrSources) on
  both Postgres + PGLite with multi-source ambiguity warning
- inferTypeAndSubtypeFromPack overload + subtypes: + mapping_rules:
  + migration_from: schema-pack manifest extensions
- findPackSuccessors version-range walker (1.x / 1.0.x / exact match)
- expandTypeFilter for --type back-compat (D14): legacy aliases route
  through mapping_rules → canonical+subtype before the SQL filter fires
- 3 new onboard checks: pack_upgrade_available, type_proliferation,
  dangling_aliases (source-scoped per F12)
- unify-types Minion handler (PROTECTED, manual_only via render.ts
  allowlist per D17): retype-explicit → retype-catch-all →
  page-to-link → page-to-alias → final sync → active-pack flip
- alias_resolved 1.05x post-fusion search boost stage; KNOBS_HASH_VERSION
  bumped 5→6 (one-time cache miss on upgrade, self-healing in TTL)
- ELIGIBLE_TYPES for facts extraction extended with v2 canonicals
  (codex F-ELIGIBLE: blocker not v0.43 follow-up)

Tests: 79 new unit/integration cases + 3 E2E cases covering all 9
production clusters end-to-end. 124-case verification on the cache-key
+ build-llms fixes. KNOBS_HASH_VERSION assertions updated in 3 tests.

Plan: ~/.claude/plans/system-instruction-you-are-working-transient-elephant.md
(16 locked decisions D1-D17, 12 baseline fixes F7-F21 absorbed from
codex outside voice).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 27, 2026
VERSION + package.json + CHANGELOG header + CLAUDE.md cluster annotation
all moved from 0.42.0.0 to 0.41.23.0. Body text updated in-place: every
"v0.42" / "v0.43+" reference inside this entry's release notes now reads
"v0.41.23" or "follow-up release" as appropriate.

Same scope shipping — the three-wave extract operator surface stays
intact. Just lands in the patch-channel queue (.20/.21/.23 free; .22 is
PR #1542's type-unification cathedral) instead of the minor-channel bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added 7 commits May 26, 2026 23:39
…fy manifest registration

Two CI failures on PR #1542:

1. check:system-of-record flagged page-to-link.ts:207 addLinksBatch as
   a direct write to a derived table. The call IS the reconcile surface
   for page_to_link mapping_rules — it converts edge-shaped pages into
   canonical link rows under the PROTECTED unify-types Minion handler,
   source-scoped, atomic per-rule. Added the canonical
   `// gbrain-allow-direct-insert: <reason>` comment on the same line.

2. check:resolver emitted 11 orphan_trigger warnings for `schema-unify`
   because the skill was added to skills/RESOLVER.md without a
   corresponding entry in skills/manifest.json. Added the registration
   under the existing skills[] array.

bun run verify: 28/28 checks pass locally.
…sion

Six test failures across shards 2 + 10 on PR #1542:

1. resolver.test.ts: round-trip parser requires frontmatter triggers to
   be quoted (`- "..."` or `- '...'`). schema-unify shipped with bare
   YAML strings; quoted the 10 triggers to round-trip correctly.

2. skills-conformance.test.ts (×3): schema-unify SKILL.md was missing
   the required Contract, Anti-Patterns, and Output Format sections
   that every conformant skill must declare. Added all three:
   - Contract: inputs / outputs / side effects / failure modes
   - Anti-Patterns: 5 DON'Ts including the autopilot trust boundary
   - Output Format: per-phase stderr lines + celebration summary +
     JSON envelope shape

3. facts-eligibility.test.ts (×2): the v0.41.22 ELIGIBLE_TYPES
   expansion added `concept` to the eligible list, but the existing
   test suite pins concept as rejected (it's `extractable: true` in
   the schema pack but the v0.41.11 contract documented this as
   "cosmetic on the backstop path because backstop uses hardcoded
   ELIGIBLE_TYPES"). Removed `concept` from the expansion; other v2
   canonicals (media, tweet, atom, analysis) stay. Comment updated
   to document the deliberate omission.

All 6 failing tests now pass locally (370/370 across the 3 affected
files). bun run verify: 28/28 checks green.
Resolve VERSION, package.json, CHANGELOG conflicts. v0.41.22.0 stays
on top (higher than master's v0.41.20.0); master's v0.41.20.0 entry
preserved below in CHANGELOG order. Brought in master's gbrain status
+ doctor --scope=brain (PR #1544) and the v0.41.19.0 Supavisor Retry
Cathedral (PR #1537) cleanly via auto-merge.

Typecheck clean. bun run verify: 28/28 checks pass.
CI shard 8 reported 1 fail (1.00ms — too fast for any real loadActivePack
file I/O) on `finds gbrain-base-v2 as successor of gbrain-base@1.0.0`.
Local triple-run passes 9/9 in isolation.

Root cause: the existing afterEach reset clears the module-level pack
cache AFTER each test, but the FIRST test in the file inherits whatever
state sibling files in the same bun shard process left behind. With
24+ schema-pack tests in shard 8 (mutate, mutate-audit, best-effort,
registry-reload, manifest-v041_2, etc.) running before this file, the
first test can read a poisoned cache.

Fix: add `beforeEach(_resetPackCacheForTests)`. Two-sided reset
guarantees clean state regardless of file ordering within the shard.

bun run verify: 28/28 checks pass.
CI shard 1 + shard 8 each surfaced one intermittent failure:

shard 1: buildBrainTools > execute() on put_page with valid namespace
shard 8: findPackSuccessors > finds gbrain-base-v2 as successor

Both pass cleanly in isolation. Both are concurrency races against
shared in-shard state:

- brain-allowlist.test.ts shares a singleton PGLiteEngine across 18
  tests with a beforeEach DELETE FROM pages. With max-concurrency=4,
  two put_page tests can interleave their TRUNCATE + write phases,
  so the auto-link/extract sub-steps inside put_page race against
  the sibling test's DELETE.
- schema-pack-find-pack-successors.test.ts reads bundled YAML packs
  via loadActivePack. The module-level pack cache is shared across
  parallel tests in the same shard; the previous beforeEach reset
  helped but didn't fully isolate against concurrent file reads
  under CI load.

Fix per CLAUDE.md test-isolation lint rule R2 (concurrency-fragile
files belong in the .serial.test.ts quarantine): rename both files
to *.serial.test.ts. Serial runner picks them up at max-concurrency=1.
49/49 serial files pass locally. 28/28 verify checks pass.
Master advanced to v0.41.21.0 (5 daily-driver ops pains wave, PR #1545).
Resolved 4 conflicts:

- VERSION + package.json: 0.41.22.0 stays (higher than master's 0.41.21.0)
- CHANGELOG: my v0.41.22.0 entry on top, master's v0.41.21.0 entry below
  in proper version-descending order
- src/core/migrate.ts: MIGRATION VERSION COLLISION — master's v0.41.21
  claimed v104 for `pages_atom_source_hash_idx`. Bumped my slug_aliases
  migration to v105 (the canonical "claim next available slot" pattern).
  Updated all slug_aliases-related v104 doc/comment references:
    - src/core/postgres-engine.ts: "Pre-v104 brain" → "Pre-v105 brain"
    - src/core/onboard/checks.ts: "pre-v104 brains" → "pre-v105 brains"
    - src/core/search/hybrid.ts: "pre-v104 brains" → "pre-v105 brains"
    - docs/architecture/pack-upgrade-mechanism.md: migrate.ts:104 → :105
    - CHANGELOG.md (my entry): v104 → v105 with rebump rationale
    - CHANGELOG.md "To take advantage" block: migration v105

Master's v104 (atom source-hash index, `pages_atom_source_hash_idx`)
preserved verbatim. Both migrations now coexist correctly.

Typecheck clean. bun run verify: 28/28 checks pass. 33/33
slug_aliases-touching tests pass (page-to-alias, resolve-slug-with-alias,
unify-types-handler, onboard checks, search boost, E2E full flow).
CI shard 9 reported 6 failures, all from the embedStaleForSource describe
block, all ~120-150ms each — classic shared-engine concurrency race shape.
Passes 7/7 locally in isolation.

Root cause: embed-stale.test.ts shares a singleton PGLiteEngine across 7
tests with beforeEach resetPgliteState. Under bun's max-concurrency=4 in
the parallel shard, two tests can interleave their TRUNCATE + seedPage +
upsertChunks + embedStaleForSource flow, so one test's stale-chunk count
sees another test's mid-flight writes.

Same fix as brain-allowlist.serial.test.ts and
schema-pack-find-pack-successors.serial.test.ts: rename to *.serial.test.ts
so the serial runner picks it up at max-concurrency=1.

bun run verify: 28/28 checks pass. 7/7 embed-stale tests pass via serial.
@garrytan garrytan merged commit 5d42f32 into master May 27, 2026
21 checks passed
garrytan added a commit that referenced this pull request May 27, 2026
…#1541)

* Wave A: schema + receipts foundation for v0.42 extract operator surfaces

Foundation layer for the pack-driven extractables + receipt-as-brain-memory
+ operator-discoverability cathedral. Five atomic pieces ship together
because their schema + helpers + module dependencies are tight-coupled:

A1. Widen pack manifest's `extractable` from `boolean` to
    `boolean | ExtractableSpec`. ExtractableSpec carries prompt_template,
    fixture_corpus, eval_dimensions, benchmark_min_recall, and reserves
    verifier_path for v0.43+ pack-shipped verifier code (REFUSE at
    runtime in v0.42 per plan D-EXTRACT-37). Back-compat: every pre-v0.42
    pack with `extractable: true` continues parsing unchanged. Three new
    helpers: extractableSpecsFromPack(), getExtractableSpec(),
    refuseVerifierPathInV042().

A2. New page type `extract_receipt` in ALL_PAGE_TYPES. Source-boost map
    adds `extracts/` prefix at factor 0.3 — receipts surface in search
    when extraction-relevant but never dominate user content (D-EXTRACT-42).

A3. New module src/core/extract/receipt-writer.ts (~190 LOC) exporting
    writeReceipt(engine, input). Canonical slug shape
    extracts/{date}/{kind}/{source_id}/{run_id_short}/round-{N} per
    D-EXTRACT-17. Frontmatter belt+suspenders per D-EXTRACT-19: BOTH
    type:extract_receipt AND dream_generated:true stamped on every
    receipt, regardless of caller, so the eligibility predicate's
    anti-loop guards reject the receipt page from any future extraction
    sweep (single-flag bypass requires breaking two unrelated checks).
    Idempotent on resume — same run_id+round overwrites cleanly.

A4. Migration v104 creates extract_rollup_7d table (per-day rollup of
    extract events keyed on kind+source_id+day). Audit JSONL stays the
    SOURCE OF TRUTH per F-OUT-19; this table is a best-effort cache for
    doctor's <100ms read budget. Per-day rows mean the 7-day window
    auto-evicts on every read. v100 was deliberately skipped on master
    (renumbered out during a prior wave); v101/v102/v103 also taken;
    v104 is the next clean slot.

A5. Doctor `extract_health` check reads extract_rollup_7d for last 7
    days and emits per-kind aggregates: cost_7d_usd, eval_pass_count,
    eval_fail_count, halt_count, round_completed_count, halt_rate.
    3-state: OK when rollup empty (pre-v0.42 brain or fresh init), WARN
    when any per-kind halt rate > 10% (top-3 named in message), WARN
    when rollup_write_failures > 0 (audit JSONL is SoT but operator
    deserves to know the DB cache is degraded). Pre-v104 brains stay
    quiet — the missing-table error path is caught and treated as
    OK so doctor doesn't warn during the upgrade window.

Tests added:
- test/extractable-spec-widening.test.ts (22 cases) — back-compat with
  boolean shape, new struct parsing, verifier_path REFUSE contract.
- test/extract/receipt-writer.test.ts (12 cases) — slug shape, frontmatter
  belt+suspenders, idempotent resume, body human-readability.
- test/doctor-extract-health.test.ts (8 cases) — empty rollup OK, halt
  rate WARN, rollup_write_failures WARN, 7-day window inclusion at
  boundary, multi-kind top-3 message ordering.

Plus the canonical bootstrap-coverage test passes with the new v104
migration cleanly applied through both engines.

Plan: ~/.claude/plans/system-instruction-you-are-working-stateless-dragonfly.md
Wave A scope. Wave B (hook receipts into existing extractors) follows.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Wave B: hook receipts + rollup row into the 5 shipped extractors

Each LLM-backed extractor surface now records its run in two places when
something actually happened:

1. An extract receipt PAGE at extracts/{date}/{kind}/{source_id}/{run_id_short}/round-{N}
   (queryable via gbrain search, citable, surfaces in cross-modal
   contradiction probes per the Wave A foundation). Only written when
   `total_rows > 0` so no-op runs don't bloat the brain.

2. An UPSERT row in extract_rollup_7d (DB-backed best-effort cache
   per F-OUT-19) so the doctor extract_health check from Wave A reads
   per-kind aggregates without scanning JSONL.

New module src/core/extract/rollup-writer.ts (~120 LOC) exports
upsertExtractRollup() with PostgreSQL ON CONFLICT DO UPDATE on the
(kind, source_id, day) PK. Concurrency-safe per F-OUT-14 design.
Failure path is best-effort — bumps rollup_write_failures in the
table itself, stderr-warns once per (kind, day, error-class), and
NEVER fails the parent extraction operation. JSONL remains source
of truth.

Wired into 5 extractors:
- extract-conversation-facts (kind: facts.conversation) — both
  success path AND BudgetExhausted halt path write receipt+rollup
  so partial runs are still observable.
- extract_atoms cycle phase (kind: atoms)
- synthesize_concepts cycle phase (kind: concepts, source_id: default
  because concepts are brain-global)
- propose_takes cycle phase (kind: takes.proposed) — scope-aware
  source_id from the read scope.
- extract_facts cycle phase (kind: facts.fence) — deterministic
  (no LLM cost) but still records reconcile activity so doctor sees
  the cycle is alive.

Receipt frontmatter belt+suspenders (D-EXTRACT-19) reused from
Wave A: every receipt stamps BOTH `type: extract_receipt` AND
`dream_generated: true` so the eligibility predicate's anti-loop
guards reject the receipt page from any future extraction sweep.

Test surgery in test/propose-takes.test.ts — one existing assertion
tightened from "no INSERTs" to "no INSERT INTO take_proposals" so
the new rollup UPSERT doesn't falsely fail the cache-hit case test.

Run regression: 85/85 tests pass across extract-conversation-facts,
extract-atoms-synthesize-concepts, extract-facts-phase, propose-takes.

Plan: ~/.claude/plans/system-instruction-you-are-working-stateless-dragonfly.md
Wave B scope. Wave C (pack-author scaffolding + benchmark) follows.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Wave C+D: pack-author scaffolding + operator surfaces for v0.42 extract

Wave C: pack-author authoring loop
  - scaffold-extractable mutation primitive declares a kind as extractable
    on a pack manifest in one verb (wires through updateTypeOnPack from
    the v0.41 mutate library); generates 5 placeholder fixtures + a
    pack-supplied prompt template stub
  - schema CLI wires gbrain schema scaffold-extractable <type> --pack <pack>
  - extract benchmark CLI loads a pack's fixture corpus through strict
    D-EXTRACT-21 path validation (rejects absolute paths, .. traversal,
    null bytes, symlinks resolving outside pack root); v0.42 ships as a
    stub reporter (LLM dispatch deferred to Wave E)

Wave D: operator surfaces
  - extract status CLI reads extract_rollup_7d for the last 7 days,
    sorts by (halt_rate desc, cost desc); kubectl-style right-aligned
    table, top-5 + "more rows" hint by default, --verbose shows all;
    stable schema_version: 1 JSON envelope for monitoring pipelines
  - extract --explain <kind> CLI prints the active pack's resolution
    chain: declaration source (pack-declared vs built-in cycle phase),
    prompt_template + fixture_corpus paths with existence checks,
    eval_dimensions, benchmark_min_recall, and the last 7d rollup
  - extract.ts gains a lifecycle-grouped help text (Extraction /
    Inspection / Status) per the original D3 plan goal

Tests:
  - test/schema-pack/scaffold-extractable.test.ts (15 cases) including
    explicit privacy-rule assertions guarding against real-name leakage
  - test/extract/benchmark.test.ts (17 cases) covering path validation
    rejections + JSONL fixture parsing
  - test/extract/status.test.ts (15 cases) over pure aggregation +
    formatting

Housekeeping:
  - test/extract/receipt-writer.test.ts refactored to the canonical
    PGLite block (beforeAll/afterAll/resetPgliteState in beforeEach)
    per CLAUDE.md test-isolation R3+R4; runtime drops from ~30s of
    99-migration replay per test to <6s for all 12 cases together

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.42.0.0: extract operator surfaces + pack-driven extractables

Bump VERSION + package.json to 0.42.0.0. CHANGELOG entry covers the
three-wave shipped scope (receipts + rollup + doctor check; receipts
hooked into all 5 shipped extractors; pack-author scaffolding +
benchmark stub-reporter; status + --explain dashboards + lifecycle
help). CLAUDE.md Key Files gains a v0.42 cluster annotation. llms.txt
regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.41.23.0: re-tag from 0.42.0.0 (patch-channel slot, no scope change)

VERSION + package.json + CHANGELOG header + CLAUDE.md cluster annotation
all moved from 0.42.0.0 to 0.41.23.0. Body text updated in-place: every
"v0.42" / "v0.43+" reference inside this entry's release notes now reads
"v0.41.23" or "follow-up release" as appropriate.

Same scope shipping — the three-wave extract operator surface stays
intact. Just lands in the patch-channel queue (.20/.21/.23 free; .22 is
PR #1542's type-unification cathedral) instead of the minor-channel bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: add extract_receipt to gbrain-base.yaml page_types (CI parity gate)

CI shard 5 caught the drift: test/regressions/gbrain-base-equivalence.test.ts
asserts every ALL_PAGE_TYPES seed has a matching page_type entry in the
gbrain-base.yaml pack. Wave A added `extract_receipt` to ALL_PAGE_TYPES
but didn't seed it in the base pack manifest.

Adds the entry under the `annotation` primitive with `extracts/` path
prefix (matches the source-boost demote site) and `extractable: false`
(receipts are written by the framework, never extracted from). Comment
documents the belt+suspenders D-EXTRACT-19 invariant so future readers
understand why receipts carry both `dream_generated: true` AND
`type: extract_receipt`.

Closes the CI gate without changing runtime behavior — the pack-aware
read paths already had the prefix demote wired in src/core/search/source-boost.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: bump gbrain-base page-type count 24→25 in schema-cli test

CI shard 4 caught the second drift from the same root cause as the
prior parity-gate fix: v0.41.23's `extract_receipt` addition bumped
gbrain-base.yaml from 24 to 25 page types. The schema-cli smoke test
was pinned at 24 (the count after v0.41.11.0 added `conversation` +
`atom`); update to 25 and note v0.41.23's contribution alongside the
prior version stamp.

Verified hermetic: running test/schema-cli.test.ts with a clean
GBRAIN_HOME tempdir produces 12/12 pass (the local-machine 'schema
active' fail is from a real ~/.gbrain pinning gbrain-base-v2; not a
shipped-code issue, doesn't repro on CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): pack-locator stub leak between shard 6 test files

CI shard 6 caught three flaky failures in test/onboard-pack-upgrade-checks.test.ts:
  - checkPackUpgradeAvailable > fires on gbrain-base brain with gbrain-base-v2
  - checkPackUpgradeAvailable > manual_only routing via render.ts allowlist (D17)
  - checkTypeProliferation > warns when distinct types exceed declared+5

Root cause: test/schema-pack-sync.test.ts calls
`__setPackLocatorForTests(...)` to stub the disk-loader, but doesn't
restore in afterAll. Bun's CI shard 6 loads multiple test files into
one process; when sync.test.ts runs before onboard-pack-upgrade-checks.test.ts,
the stubbed locator persists at module scope. `loadActivePack` for
gbrain-base / gbrain-base-v2 then returns null and:
  - findPackSuccessors returns [] → status='ok' instead of 'warn' (F1+F2)
  - declared falls back to 15 → fail threshold becomes 30, 32 > 30 → 'fail'
    instead of 'warn' (F3)

Local single-file runs pass because the locator starts at its default.

Two-layer fix:
  1. test/schema-pack-sync.test.ts afterAll calls
     `_resetPackLocatorForTests()` to undo the mutation (the canonical
     fix at the source).
  2. test/onboard-pack-upgrade-checks.test.ts beforeEach calls the same
     reset (defense-in-depth against any future test file in the shard
     that forgets to restore).

Reproduced locally: running the three shard-6 schema-pack files together
fails 3 tests pre-fix and passes 30/30 post-fix. Full shard 6 sweep
(77 files, 1232 tests) now green; bun run verify still 28/28.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): pglite-engine — dim-agnostic chunk-embedding test data

CI shard 6 caught two flaky failures in test/pglite-engine.test.ts:
  - PGLiteEngine: Chunks > getChunksWithEmbeddings returns embedding data
  - PGLiteEngine: stale chunk pagination > countStaleChunks counts chunks
    with NULL embedding only

Both failed with `expected 1280 dimensions, not 1536` at the upsert site.

Root cause: pglite-engine.ts:287 initSchema() reads embedding dim from
gw.getEmbeddingDimensions() if the gateway is configured (potentially
left in that state by another shard-6 test file in the same bun process),
falling back to DEFAULT_EMBEDDING_DIMENSIONS otherwise — which is 1280
since v0.36+ when the ZE default landed (zeroentropyai:zembed-1).
Pre-v0.36 defaults were OpenAI's 1536; my test data was pinned to that
stale literal.

The two outcomes that pass:
  - gateway happens to be configured for 1536-dim (e.g. master shard 6
    run 26515999465 — these tests passed at 20ms + 24ms with no
    "dimensions" error)
  - gateway happens to be configured for 1280-dim AND test data is 1280

The outcome that fails:
  - gateway configured for 1280-dim AND test data hardcoded to 1536

Fix: capture the actual column width after initSchema (probe
pg_attribute.atttypmod for content_chunks.embedding) and use that
captured `CHUNK_EMBED_DIM` constant at the three Float32Array sites.
Test data now matches whatever width the column was created at,
regardless of which shard-6 file ran first.

Local repro: full shard 6 (77 files, 1232 tests, ~6min) green; this
file standalone (100 tests) green; bun run verify 28/28.

Broader pattern: 9 other test files use the same Float32Array(1536)
literal. None land in shard 6 today (so they don't flake), but the
fix shape here can be lifted into a shared helper if the bug class
surfaces elsewhere — filed as a v0.42+ follow-up rather than a
preemptive sweep, since each file's setup shape is slightly different.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 27, 2026
…1/v0.42 reality

Two stale E2E assertion files surfaced by a full local E2E run against
real Postgres (the gbrain-test-pg container on port 5434). Neither file
is in the CI E2E job (CI only runs mechanical.test.ts + mcp.test.ts +
skills.test.ts + zeroentropy-live.test.ts), so the drift has been latent.

1. `test/e2e/dream-cycle-phase-order-pglite.test.ts`
   EXPECTED_PHASES was missing 4 phases that landed in master since the
   list was last revised:
     - extract_atoms (v0.41 T9 — atom extraction, after extract_facts)
     - synthesize_concepts (v0.41 T9 — concept synthesis, after patterns)
     - conversation_facts_backfill (v0.41.11.0, after calibration_profile)
     - skillopt (v0.42.0.0 — self-evolving skills, between
       conversation_facts_backfill and embed)
   Updated to 21 entries in the actual runtime dispatch order (matches
   ALL_PHASES exactly). 5/5 tests in the file pass after.

2. `test/e2e/onboard-full-flow.test.ts`
   `runAllOnboardChecks` shape test asserted exactly 4 checks; v0.42's
   type-unification cathedral (PR #1542, T13-T15) added 3 more
   (`pack_upgrade_available`, `type_proliferation`, `dangling_aliases`)
   for a total of 7. And `empty brain returns 0 remediations` regressed
   because `pack_upgrade_available` can emit a manual_only remediation
   on brains where gbrain-base@1.x is active and gbrain-base-v2 is
   registered as a successor. Tightened that assertion to `total <= 1`
   AND kept a per-check guard asserting takes_count remediations stay 0
   (the original test's load-bearing claim — A12 two-gate consent).
   13/13 tests in the file pass after.

Honest scope: 4 other E2E files still fail locally after this commit
(cycle.test.ts, dream.test.ts, phantom-redirect.test.ts,
sync-lock-recovery.test.ts), each for a distinct pre-existing master
bug unrelated to v0.42 skillopt work:
  - cycle.test.ts (5 fails): PostgresEngine.getConfig falls back to
    db.getConnection() singleton via the `get sql()` getter when no
    poolSize is set; the new conversation_facts_backfill phase chain
    hits this fallback even though the test's setupDB() connects both
    the singleton AND the engine. Race condition between the test's
    singleton lifecycle and the phase's getConfig call. Deeper fix
    needed in PostgresEngine.getConfig (use this._sql directly with
    explicit fallback only on user-driven CLI paths).
  - dream.test.ts (1 fail): expects "concepts/testing" slug to appear
    in dream cycle output, gets empty array. Related to v0.42 concept
    type-unification semantics.
  - phantom-redirect.test.ts (2 fails): concurrent-sync race +
    postgres-js text-string embedding survival. Master-level data-path
    bug; would need its own fix wave.
  - sync-lock-recovery.test.ts (1 fail): `gbrain sync --break-lock
    --all` exits 0 but test expects 1 with a shell-loop hint. CLI
    behavior changed in a master commit; need to either restore the
    refusal behavior or update the assertion.

None of these 4 block CI (E2E job doesn't run them). Filed as a
TODOS.md entry for a follow-up wave; the 2 in this commit are the
ones that mirror v0.42 work landing.

Local: 130/136 E2E files green, 927/940 tests pass (was 925/940
before these fixes; the 2 files this commit fixes added 7 newly-
passing tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572)
  v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571)
  v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566)
  v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543)
  v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541)
  v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562)
  v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542)
  v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545)
  v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544)
  feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537)
  v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521)
  v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519)
  v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510)
  v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
garrytan added a commit that referenced this pull request May 31, 2026
…#1563)

* feat(skillopt): foundation modules — types, lr-schedule, benchmark, score, audit, lock

* feat(skillopt): edit primitives — apply-edits (D5+D9), rejected-buffer LRU, version-store (D8 history-intent-first)

* feat(skillopt): rollout (D2 gateway.toolLoop + D13 read-only allowlist), reflect (D7 two calls), validate-gate (D12 median+epsilon, D4 parallel), preflight (D3), bundled-skill-gate (D16)

* feat(skillopt): orchestrator (D6 slow-update, D10 ASCII diagrams, D11 caching), checkpoint, bootstrap (D15 sentinel), CLI dispatch + help

* feat(skillopt): cycle phase (F1 dream-loop wiring), PROTECTED_JOB_NAMES + MCP op (F6 admin scope + allowlist) + Minion handler (F7 --background)

* feat(skillopt): full cathedral — --all batch (F4), --target-models fleet (F5), write-capture (F10), held-out scaffold (F11), adversarial suite 41 cases (F2), E2E PGLite (F3), meta-skill bundle (T7), reflect+judge evals (F8+F9), docs (T10)

* chore: bump version to v0.42.0.0 (MINOR — significant new feature)

* fix(skillopt): wire trajectories from forward gate to reflect + fix parseEditsResponse parser misuse

Two related v0.42.0.0 bugs that conspired to make `runSkillOpt` structurally
unable to accept any candidate edit. Either alone would have killed self-evolution;
together they made the loop a no-op for every input.

**Bug 1 (orchestrator gap):** `runOptimizationLoop` in orchestrator.ts called
`runReflect({successes: [], failures: []})` with hardcoded empty arrays. The
forward gate's `scoredRollouts` were computed then voided. `runReflect`
short-circuits both modes when their batches are empty, so the optimizer was
never asked to propose an edit. Every step hit the no_edits_applied branch.

Fix: add `scoredRollouts: ScoredRollout[]` to `GateResult` and
`runsPerTask?: number` to `ValidateGateOpts`. Forward pass uses
`runsPerTask: 1`; orchestrator partitions returned rollouts by `score >= 0.5`
and threads real successes + failures into `runReflect`.

**Bug 2 (parser misuse):** `parseEditsResponse` in reflect.ts routed every
optimizer response through `parseJudgeJson` first. `parseJudgeJson` looks for
a `score` key (it's a judge-output parser, not an edits parser) and returns
null for any JSON without one — including the well-formed `{"edits": [...]}`
the optimizer is contractually required to emit. The function then early-
returned `[]` and the actual `tryExtractEdits` path on the next line was
unreachable dead code.

Fix: drop the wrong-typed guard. `parseEditsResponse` now calls
`tryExtractEdits` directly. Export it so `reflect.test.ts` can pin the
contract independently of the chat transport.

**Why this slipped through 152 prior skillopt tests:** zero unit coverage
of `parseEditsResponse` or `runReflect`. The existing E2E `all-reject` case
asserted no_improvement (which was true for the wrong reason — empty edits,
not gate rejection). Both bugs were structurally invisible to the existing
test surface.

**New coverage:**

- `test/skillopt/reflect.test.ts` (15 cases):
  - 8 `parseEditsResponse` cases including the IRON-RULE regression pin
    for the v0.42.0.1 fix (`{"edits": [...]}` JSON must survive the parser).
  - 7 `runReflect` D7 contract cases: both modes fire, empty-batch skips,
    additive token usage, one-mode-throws-other-still-works, rejected-buffer
    flows into anti-bias prompt.
  - Documents the trailing-comma limitation as an explicit out-of-scope pin
    (so a future tightening of `tryExtractEdits` lights this test up
    intentionally).

- `test/e2e/skillopt-loop.serial.test.ts` (7 cases):
  - HAPPY PATH: stubbed `gateway.chat` acts as both target agent (emits
    sections based on skill content) and optimizer (proposes a real
    add-Citations edit). Drives `runSkillOpt` end-to-end against PGLite.
    Asserts outcome=accepted, SKILL.md mutated with new section,
    frontmatter preserved (D5), history has one committed row,
    best.md mirrors disk, delta > epsilon, receipt fields populated.
  - 5 broken cases (each isolates a distinct orchestrator-visible failure):
    1. Below-baseline regression: optimizer proposes a destructive edit;
       gate rejects with reason=below_baseline; SKILL.md unchanged;
       rejected-buffer captures the bad edit for anti-bias context.
    2. Malformed reflect JSON: orchestrator degrades gracefully to
       no_improvement without crashing.
    3. Anchor-not-found: applyEditBatch rejects all; sel gate skipped;
       rejected-buffer captures with reason=apply_failed.
    4. Budget exhausted mid-step: outcome=aborted, no pending rows survive.
    5. Converged-skill re-run: starting from already-perfect skill →
       no_improvement (no thrash on a well-tuned starting point).
  - IDEMPOTENT RE-RUN: drive runSkillOpt twice in sequence. Run 1 accepts.
    Run 2 sees improved baseline, no failures, returns no_improvement.
    SKILL.md byte-identical to post-run-1; history still has exactly 1
    committed row. Proves stability at the fixed point.

All hermetic (no DATABASE_URL, no API keys). PGLite in-memory engine,
tempdir SKILL.md + benchmark, stubbed gateway.chat via
`__setChatTransportForTests`. `.serial.test.ts` because the stub installs
module state and the loop walks shared disk state across epochs.

Test counts after fix: 174 skillopt-surface tests pass (149 pre-existing
unit + 15 new reflect unit + 3 existing E2E + 7 new E2E). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cycle): align ALL_PHASES skillopt position with actual dispatch order

v0.42.0.0 added skillopt to ALL_PHASES right after `patterns` (line 127), but
the dispatch block in runCycle (line ~1912) actually runs skillopt between
`conversation_facts_backfill` and `embed`. The two were inconsistent, and the
serial test `report.phases.map(p => p.phase)).toEqual(ALL_PHASES)` was failing
on master because of it.

A second pre-existing failure: the two phase-count assertions in
`test/core/cycle.serial.test.ts` still said `toBe(20)` even though
ALL_PHASES grew to 21 when skillopt was added. The author bumped the array
but forgot the test.

Two fixes, one commit:

1. Move `'skillopt'` in ALL_PHASES from after `patterns` to between
   `conversation_facts_backfill` and `embed`, matching where runCycle
   actually dispatches it. Runtime behavior is unchanged — only the
   declaration order moves. Updated the surrounding comment to call out
   the position invariant and reference the test that pins it.

2. Update both `toBe(20)` assertions in cycle.serial.test.ts to `toBe(21)`
   with a v0.42.0.0 history line in the running comments.

Why declaration follows runtime (not the other way around): the comment
intent ("Runs AFTER patterns — graph-fresh") is still satisfied because
"after the entire main graph-mutating cluster" is strictly fresher than
"right after patterns". No design intent is lost.

Test result: cycle.serial.test.ts is now 28/28 (was 27/28 on master + my
prior commit). Skillopt suite still 174/174.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): bump PHASE_SCOPE assertion to 21 + fix skill-optimizer Anti-Patterns case

Two CI failures pre-existing on this branch since the v0.42.0.0 skillopt
cathedral landed; master is green because skillopt didn't exist there yet.

1. test/phase-scope-coverage.test.ts asserted ALL_PHASES.length === 20.
   skillopt is the 21st phase. Bumped to 21 with v0.42.0.0 history line
   in the comment chain. Sibling fix to the cycle.serial.test.ts bump
   in commit 08ad246.

2. skills/skill-optimizer/SKILL.md had `## Anti-patterns` (lowercase p).
   skills-conformance.test.ts asserts `## Anti-Patterns` (capital P) as
   the required section header. Single-character rename.

Local: 174 skillopt-surface tests + 6 phase-scope tests + 249 skills-
conformance tests all green. Typecheck clean.

Remaining CI delta: 5 put_page facts backstop failures in shard 10 that
reproduce only on Linux CI, not locally even with empty env / cleared
HOME / max-concurrency=1. The error surface is `r.isError === true` with
no further detail captured in the bun:test output. Pushing these 2 fixes
first to narrow the CI signal; will instrument if the 5 persist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(e2e): align dream-cycle-phase-order + onboard-full-flow with v0.41/v0.42 reality

Two stale E2E assertion files surfaced by a full local E2E run against
real Postgres (the gbrain-test-pg container on port 5434). Neither file
is in the CI E2E job (CI only runs mechanical.test.ts + mcp.test.ts +
skills.test.ts + zeroentropy-live.test.ts), so the drift has been latent.

1. `test/e2e/dream-cycle-phase-order-pglite.test.ts`
   EXPECTED_PHASES was missing 4 phases that landed in master since the
   list was last revised:
     - extract_atoms (v0.41 T9 — atom extraction, after extract_facts)
     - synthesize_concepts (v0.41 T9 — concept synthesis, after patterns)
     - conversation_facts_backfill (v0.41.11.0, after calibration_profile)
     - skillopt (v0.42.0.0 — self-evolving skills, between
       conversation_facts_backfill and embed)
   Updated to 21 entries in the actual runtime dispatch order (matches
   ALL_PHASES exactly). 5/5 tests in the file pass after.

2. `test/e2e/onboard-full-flow.test.ts`
   `runAllOnboardChecks` shape test asserted exactly 4 checks; v0.42's
   type-unification cathedral (PR #1542, T13-T15) added 3 more
   (`pack_upgrade_available`, `type_proliferation`, `dangling_aliases`)
   for a total of 7. And `empty brain returns 0 remediations` regressed
   because `pack_upgrade_available` can emit a manual_only remediation
   on brains where gbrain-base@1.x is active and gbrain-base-v2 is
   registered as a successor. Tightened that assertion to `total <= 1`
   AND kept a per-check guard asserting takes_count remediations stay 0
   (the original test's load-bearing claim — A12 two-gate consent).
   13/13 tests in the file pass after.

Honest scope: 4 other E2E files still fail locally after this commit
(cycle.test.ts, dream.test.ts, phantom-redirect.test.ts,
sync-lock-recovery.test.ts), each for a distinct pre-existing master
bug unrelated to v0.42 skillopt work:
  - cycle.test.ts (5 fails): PostgresEngine.getConfig falls back to
    db.getConnection() singleton via the `get sql()` getter when no
    poolSize is set; the new conversation_facts_backfill phase chain
    hits this fallback even though the test's setupDB() connects both
    the singleton AND the engine. Race condition between the test's
    singleton lifecycle and the phase's getConfig call. Deeper fix
    needed in PostgresEngine.getConfig (use this._sql directly with
    explicit fallback only on user-driven CLI paths).
  - dream.test.ts (1 fail): expects "concepts/testing" slug to appear
    in dream cycle output, gets empty array. Related to v0.42 concept
    type-unification semantics.
  - phantom-redirect.test.ts (2 fails): concurrent-sync race +
    postgres-js text-string embedding survival. Master-level data-path
    bug; would need its own fix wave.
  - sync-lock-recovery.test.ts (1 fail): `gbrain sync --break-lock
    --all` exits 0 but test expects 1 with a shell-loop hint. CLI
    behavior changed in a master commit; need to either restore the
    refusal behavior or update the assertion.

None of these 4 block CI (E2E job doesn't run them). Filed as a
TODOS.md entry for a follow-up wave; the 2 in this commit are the
ones that mirror v0.42 work landing.

Local: 130/136 E2E files green, 927/940 tests pass (was 925/940
before these fixes; the 2 files this commit fixes added 7 newly-
passing tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): quarantine query-cache-knobs-hash.test.ts to serial runner

CI shard 10 (commit 4d72107) failed 5 tests in the
`SemanticQueryCache cross-mode isolation (CDX-4 hotfix)` describe block,
all ~7-34ms each, all expecting writes/reads to round-trip through one
shared PGLite engine + a `beforeEach DELETE FROM query_cache`. Passes
9/9 locally; fails 5/9 on Linux CI under bun's default in-file
max-concurrency=4.

Classic intra-file concurrency race shape: test A's `beforeEach`
clears the table → test A's `store` writes a row → test B's
`beforeEach` (concurrent with A's `store`) clears the table → test A's
follow-up COUNT query returns 0. Same root cause that quarantined
`embed-stale.test.ts`, `brain-allowlist.test.ts`, and
`schema-pack-find-pack-successors.test.ts` to the serial runner in
prior fix waves (documented in v0.41.22.0 CI fix wave).

Fix: rename to `query-cache-knobs-hash.serial.test.ts` so the v0.26.7
serial-tests runner picks it up at `max-concurrency=1`. Tests still
exercise the actual cache logic — no test deleted, no production code
changed. The describe block's `beforeAll` engine + `beforeEach`
TRUNCATE pattern works correctly at serial concurrency.

Local: 12/12 in this file + 52/52 in the serial runner. Production
SemanticQueryCache code is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(heavy): frontmatter_scan_wallclock — opt into --no-embedding so CI runners work

Heavy tests workflow run 26542447602 (commit 483a557) failed on the
first heavy script:

  [fm_wallclock] FAIL: gbrain init exited non-zero
  No embedding provider configured. Set one of:
    OPENAI_API_KEY / ZEROENTROPY_API_KEY / VOYAGE_API_KEY
  Or defer setup: gbrain init --pglite --no-embedding

The v0.37 D9 hard-require landed in init.ts: `gbrain init --pglite` now
refuses to proceed without an embedding provider configured. The
heavy-tests GitHub workflow doesn't pipe any embedding API keys
(deliberate — the heavy tests measure ops shape, not LLM behavior), so
every CI invocation now blocks at step 2 of this script.

The script's whole purpose is measuring `gbrain doctor`'s
frontmatter-scan wallclock — it never embeds, never calls
`gbrain embed`, never queries vectors. The right fix is to opt out of
the provider requirement via the same `--no-embedding` flag init.ts
already exposes for this exact "deferred setup" case.

Verified locally:
  TMP=$(mktemp -d); GBRAIN_HOME="$TMP" \
    bun run src/cli.ts init --pglite --yes --no-embedding
  # exit 0, brain initialized.

No production code change. One-line + comment in the script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(heavy): sync_lock_regression — pass --no-embed so CI runs measure lock contention, not key absence

Heavy tests workflow run 26542545802 (commit 7962d31, after the
previous fm_wallclock fix) failed at the next heavy script in the chain:

  [sync_lock_regression] outcomes: winners=0 losers=0 unknown=4
  [sync_lock_regression] FAIL: expected 1 winner, got 0
  [sync_lock_regression] FAIL: expected 3 lock-busy losers, got 0

Each of the 4 parallel `gbrain sync` invocations failed for the same
reason — none of them ever even got to the lock-acquire step:

    Embedding model "zeroentropyai:zembed-1" requires ZEROENTROPY_API_KEY.
    Re-run with --no-embed to import-only and embed later once the key is set.

The CI runner doesn't pipe any embedding-provider API keys (deliberate —
heavy tests measure ops shape, not LLM behavior), and sync now hard-fails
when its embed step can't reach a configured provider.

This script measures the writer-lock race shape — `gbrain-sync` row in
`gbrain_cycle_locks`, exactly-one-winner semantics, N-1 fail-fast losers
with "Another sync is in progress", zero leaked rows post-run. It never
needed embeddings; the original write predates the hard-require landing.

Fix: pass `--no-embed` to the sync invocation. Same kind of fix as
fm_wallclock (commit 7962d31) but on the sync side rather than init.

No production code touched. One-line change in the bash script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(heavy): sync_lock_regression — register source via psql + use --repo + tolerate doctor warns

Heavy tests run 26542638471 (commit 60145ee, after the --no-embed
fix) failed at the same script but at a downstream step:

  > Source "default" has no local_path. Run: gbrain sources add default --path <path>

Three independent bugs in the script that all surfaced at once after
v0.41's source-registry landed:

1. `gbrain config set sync.repo_path` is the legacy way; sync now
   reads `sources.local_path` first. Replaced with an upsert into the
   sources table via psql:
     INSERT INTO sources (id, name, local_path)
     VALUES ('default', 'default', $BRAIN_DIR)
     ON CONFLICT (id) DO UPDATE SET local_path = EXCLUDED.local_path
   Kept the legacy `config set sync.repo_path` line too as
   belt-and-suspenders for any downstream caller that still reads it.

2. `gbrain sync --dir <path>` is silently ignored; sync's CLI parser
   recognizes `--repo`, not `--dir`. Switched to `--repo`.

3. `bun run src/cli.ts doctor --json` at the top (used to apply
   migrations as a side effect) exits non-zero whenever ANY check
   warns — including the new "no embedding provider configured"
   warning on a fresh CI runner. The script's `set -e` aborted at
   line 53 before reaching any of the sync invocations. Added `|| true`
   since the migration runs regardless of doctor's exit verdict.

Verified locally — `DATABASE_URL=... bash tests/heavy/sync_lock_regression.sh`
output:
  [sync 1] rc= (lock-busy: 'Another sync is in progress')
  [sync 2] rc=0 (winner)
  [sync 3] rc= (lock-busy: 'Another sync is in progress')
  [sync 4] rc= (lock-busy: 'Another sync is in progress')
  outcomes: winners=1 losers=3 unknown=0
  post-run gbrain_cycle_locks(gbrain-sync) row count: 0
  OK — 1 winner, 3 lock-busy losers, no leaked lock rows.

Production code untouched. All three fixes are in the bash script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(skillopt): hands-on tutorial for auto-improving a skill + discoverability

There was no tutorial for skillopt — only a reference guide
(docs/guides/skillopt.md) that opens at --bootstrap-from-routing and
assumes you already understand benchmarks, and an agent-facing SKILL.md.
README had ZERO skillopt mention. The one thing a user must hand-author
(the benchmark JSONL) was taught nowhere with a worked example.

New: docs/tutorials/improving-skills-with-skillopt.md — Diataxis tutorial
(learning-oriented), copy-pasteable end to end:
  1. mental model in two sentences (SKILL.md is the trainable param, the
     agent is frozen)
  2. write your first benchmark from scratch — a complete 15-task rule-judge
     starter you paste and run, with the full check-op table
     (contains/regex/section_present/max_chars/min_citations/tool_called/
     tool_not_called)
  3. --dry-run cost preview (and that it exits 2 by convention, not failure)
  4. real run + reading accepted(0)/no_improvement(1)/aborted(2) with the
     actual stderr output shape
  5. where output lands (best.md, versions/, history.json, rejected.json,
     audit jsonl)
  6. accept/reject — bundled vs user skills, --no-mutate vs
     --allow-mutate-bundled
  7. iterate by sharpening the benchmark

The load-bearing fix the tutorial makes that the reference guide got wrong:
the DEFAULT --split 4:1:5 needs ~50 tasks before it runs (sel = N/10, floor
5). A first-time author writing 10-15 tasks hits `D_sel has N task(s)
(need >=5)` and bounces. The tutorial ships 15 tasks + `--split 1:1:1`
(clean 5/5/5) so the copy-paste path actually works. Verified against the
real loadBenchmark + splitBench: the exact shipped block parses 15 unique
tasks and splits 5/5/5 with sel>=5; the system's own error message confirms
"need ~50 total for 4:1:5".

Discoverability (Diataxis cross-linking):
  - README.md tutorials section: new entry (was zero skillopt mention)
  - docs/tutorials/README.md: added under ## Shipped
  - docs/guides/skillopt.md: "New to this? Start with the tutorial" callout

Every claim devex-verified against source: exit-code map from
skillopt.ts (accepted:0/no_improvement:1/aborted:2/errored:2), stderr
format from skillopt.ts:286-292, check ops from score.ts, output paths
from SKILL.md, split math from benchmark.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: regenerate llms-full.txt after skillopt tutorial + README edit

Refreshes the inlined doc bundle so the committed llms-full.txt matches
fresh `bun run build:llms` output (test/build-llms.test.ts drift guard).
Picks up the README tutorials-section edit from c39dbdb. The new tutorial
file itself isn't curated into scripts/llms-config.ts (the bundle curates
a fixed doc set, not every tutorial) — this is purely the README delta.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): stop embed-preflight leaking gateway config into facts-backstop shard

CI shard 10 failed 5 `put_page facts backstop` tests with:

  [embed(openai:text-embedding-3-small)] Incorrect API key provided: sk-test

(captured by the diagnostic stderr added in a prior commit). Root cause is
a cross-file module-state leak, not a logic bug:

- `embed-preflight.test.ts` calls `configureGateway({env:{OPENAI_API_KEY:
  'sk-test'}})` to drive credential-validation scenarios. It resets the
  gateway `beforeEach` but never AFTER its last test, so it leaves the
  gateway configured with `sk-test`.
- bun runs every file in a shard inside ONE process. The residual config
  bleeds into the next file. When `facts-backstop-gating.test.ts` lands in
  the same shard, its put_page calls see `isAvailable('embedding') === true`
  (the key is *present*, just invalid), so put_page attempts a real embed
  and 401s before the backstop gating even runs.
- It's intermittent across master merges because shard bin-packing changes
  which files co-locate. (It "resolved" after the v107 merge earlier for
  exactly this reason, then came back.)

R1/R2 test-isolation lint doesn't catch this — it's `configureGateway`
module state, not `process.env` or `mock.module`.

Two fixes, both using the gateway's own `resetGateway()` seam (no
process.env, R-compliant):

1. embed-preflight.test.ts — `afterAll(() => resetGateway())` so the leaker
   cleans up after the whole file. Primary fix; also protects any OTHER
   shard-mate that reads gateway state.
2. facts-backstop-gating.test.ts — `beforeEach(() => resetGateway())` so the
   suite is deterministic regardless of ambient gateway config. Defense in
   depth: isAvailable('embedding') is now reliably false → put_page uses
   noEmbed → the import never embeds → only the backstop gating (the suite's
   actual subject) is exercised.

Verified: running leaker+victim in one process (the shard repro) goes
16/16; full shard 10 goes 1208/1208 (was 5 fail in CI). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(skillopt): make benchmark authoring an agent job, not a human chore

The prior tutorial taught a human to hand-write a 15-task benchmark — but
nobody does that. The real workflow is: user says "make skill X better,"
the AGENT authors the benchmark and runs the optimizer. The agent-facing
dispatcher didn't actually cover that.

Gap found: skill-optimizer/SKILL.md documented exactly one authoring path,
`--bootstrap-from-routing`, which (a) requires a pre-existing
routing-eval.jsonl (bootstrap-benchmark.ts:57-63 refuses without it) and
(b) generates tasks from ROUTING fixtures — which test dispatch ("does
this phrasing pick this skill"), not output quality. So an agent told to
improve a skill with no benchmark had no documented way to author a
*quality* benchmark; it'd have to reinvent the JSONL format the human
tutorial teaches.

Two fixes:

1. skills/skill-optimizer/SKILL.md — new "Authoring the benchmark yourself
   (the common case)" section: read the target SKILL.md, generate ~15
   realistic tasks, attach rule judges (contains/max_chars/min_citations/
   section_present/regex/tool_called), write the JSONL, run with
   `--split 1:1:1` (the default 4:1:5 needs ~50 tasks). Decision-tree row
   "New skill, no benchmark" now says "Author one" instead of pointing at
   bootstrap-from-routing; the bootstrap row is reframed as a head-start
   that only applies when routing fixtures exist and notes routing tasks
   test dispatch, not quality.

2. docs/tutorials/improving-skills-with-skillopt.md — new "The easiest
   path: ask your agent" section up top. Tells humans to just tell their
   agent "improve my X skill — write a benchmark first," and frames the
   manual walkthrough as "read this when you want to understand or
   hand-curate what the agent is doing."

Verified: conformance 249/0, resolver 99/0, build-llms drift guard 7/0,
cross-link resolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(skillopt): --bootstrap-from-skill starter benchmark generator

Generate a quality benchmark from a skill's SKILL.md directly, no
routing-eval.jsonl required. One LLM call emits JSONL tasks (each with rule
judges) that the agent reviews + strengthens before optimizing.

- runBootstrapFromSkill: JSONL output parsed line-by-line with skip-bad-line
  salvage (a truncated final line drops, the rest survive); a task is kept only
  when >=2 valid rule checks survive; provider errors propagate instead of
  collapsing to bootstrap_empty.
- --bootstrap-tasks N (default 15, cap 50); maxTokens scales with the count.
- Extracted assertBenchmarkAbsent + readSkillBodyOrThrow shared with the routing
  bootstrap; hardened runBootstrap's routing-eval parse to skip malformed lines.
- CLI: --bootstrap-from-skill short-circuit + 6-way mutual exclusion; parseFlags
  exported for unit tests. The benchmark-not-found hint + --help now point here.
- The generator's REVIEW line prints the paste-ready
  `--bootstrap-reviewed --split 1:1:1` next command (the default 4:1:5 split
  refuses a 15-task starter at D_sel >= 5).
- 20 hermetic cases incl. round-trip into loadBenchmark + splitBench(1:1:1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(skillopt): make --bootstrap-from-skill the primary no-benchmark path

The agent runs --bootstrap-from-skill, strengthens the generated judges (they
are weak drafts), deletes the sentinel, then runs --bootstrap-reviewed
--split 1:1:1. Freehand authoring is demoted to the fallback for the rare skill
the generator can't draft well. Updates the Iron Law, decision tree, and
anti-patterns to cover both bootstrap modes and the 15-task / --split 1:1:1
gotcha.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(release): v0.42.1.0 --bootstrap-from-skill

VERSION + package.json -> 0.42.1.0, CHANGELOG entry, CLAUDE.md skillopt
annotation, regenerated llms-full.txt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: surface --bootstrap-from-skill in README + skillopt reference

- docs/guides/skillopt.md: 30-second pitch leads with --bootstrap-from-skill;
  flag table adds --bootstrap-from-skill + --bootstrap-tasks rows.
- README.md: skillopt tutorial pointer mentions generating a starter benchmark.
- Regenerated llms-full.txt (README is in the bundle).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(ci): bump FULL_SIZE_BUDGET 700KB→750KB for legitimate CLAUDE.md growth

The skillopt wave annotations + merged v0.41.34-36 master releases pushed
llms-full.txt to 700,423 bytes — 423 over the 700KB cap — failing the
build-llms size-budget test on CI shard 6. CLAUDE.md is ~540KB (77% of the
bundle) and is the whole point of the one-fetch artifact, so it stays inlined;
the budget tracks its per-release growth. 750KB still fits 200k+ context models.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

design: type proliferation — 94 types should be ~14 (DRY/MECE unification proposal)

1 participant