Skip to content

feat: add gbrain check-update command and auto-update agent workflow#15

Merged
garrytan merged 7 commits into
masterfrom
garrytan/openclaw-autoupdate
Apr 9, 2026
Merged

feat: add gbrain check-update command and auto-update agent workflow#15
garrytan merged 7 commits into
masterfrom
garrytan/openclaw-autoupdate

Conversation

@garrytan

@garrytan garrytan commented Apr 9, 2026

Copy link
Copy Markdown
Owner

Summary

Auto-update notifications for GBrain. Users get notified when meaningful updates are available, always asked before anything is installed.

New CLI command:

  • gbrain check-update [--json] — checks GitHub Releases for new versions, compares semver (minor+ only, skips patches), fetches changelog diffs. Fail-silent on network errors.

Agent workflow (SKILLPACK Section 17):

  • Full 6-step upgrade flow: update binary, re-read all skills, re-read SKILLPACK/SCHEMA, run migration directives, schema sync (respects user's adopted/declined/custom choices), report
  • Daily check via cron, messages user on preferred channel, waits for explicit "yes" before upgrading
  • Frequency controls: daily (default), weekly, stop/resume

Standalone SKILLPACK users:

  • Version markers in SKILLPACK and RECOMMENDED_SCHEMA headers
  • Self-update via raw GitHub URL comparison, no CLI needed

DX:

  • Step 7 added to OpenClaw install paste (default-on update checks)
  • Setup skill Phase G for manual install users
  • skills/migrations/ directory convention for version-specific post-upgrade directives
  • ~/.gbrain/update-state.json schema state tracking

Test Coverage

Tests: 19 → 20 unit test files (+1)
E2E:   3 → 4 test files (+1)
  • test/check-update.test.ts: 20 unit tests (parseSemver, isMinorOrMajorBump, extractChangelogBetween, CLI wiring)
  • test/e2e/upgrade.test.ts: 5 E2E tests against real GitHub API

Coverage gate: PASS (100% of new code paths tested)

Pre-Landing Review

Pre-Landing Review: No issues found.

Adversarial Review

Both Claude and Codex adversarial reviews ran. Key finding fixed:

  • [FIXED] Broken semver comparison in extractChangelogBetween — minor version compared without major-version guard, causing incorrect changelog entries across major boundaries.

Other findings (informational, deferred):

  • Changelog fetched from master tip, not release tag — acceptable since releases always cut from master
  • Patch suppression is intentional (user requested minor+ only)
  • detectInstallMethod() PATH execution pre-existing, not introduced by this PR

Plan Completion

11/11 plan items DONE.

TODOS

No TODO items completed in this PR.

Test plan

  • All unit tests pass (20 files, 228 pass)
  • All E2E tests pass (4 files, 68 pass against real Postgres+pgvector)
  • gbrain check-update returns human-readable output
  • gbrain check-update --json returns valid JSON
  • gbrain check-update --help prints usage

🤖 Generated with Claude Code

garrytan and others added 7 commits April 9, 2026 12:15
Deterministic collector that checks GitHub Releases for new versions,
compares semver (minor+ only, skips patches), and fetches changelog diffs.
Exports `detectInstallMethod()` from upgrade.ts for reuse. Includes 15
unit tests covering version comparison, CLI wiring, and error handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exercises check-update CLI end-to-end: valid JSON output, human-readable
mode, help text, graceful no-releases handling, and version comparison
wiring. Skips gracefully when network is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full agent playbook for the update lifecycle: check, notify, consent,
upgrade, skills refresh, schema sync, report. Includes standalone
self-update for skillpack-only users via version markers and raw
GitHub URL fetching. Adds version markers to both SKILLPACK and
RECOMMENDED_SCHEMA headers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ons dir

Adds step 7 to the OpenClaw install paste (default-on update checks).
Setup skill gets Phase G (conditional offer for manual installs) and
schema state tracking via ~/.gbrain/update-state.json. Creates
skills/migrations/ directory for version-specific upgrade directives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds E2E test DB lifecycle instructions (spin up, run, tear down).
Documents version migration convention (skills/migrations/v[version].md)
and schema state tracking (~/.gbrain/update-state.json). Updates test
file counts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The version range check compared minor versions without guarding on
major being equal, causing incorrect changelog entries to be captured
(e.g., v0.5.0 would match when upgrading from v1.2.0). Extracted
semverGt/semverLte helpers for correct comparisons. Added 5 tests
for extractChangelogBetween covering cross-major, same-version, and
malformed input cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit f541f04 into master Apr 9, 2026
3 checks passed
garrytan added a commit that referenced this pull request May 15, 2026
…ederated_read + 3 more (#996)

* fix(mcp): skip stdin EOF handlers when MCP_STDIO=1

OpenClaw's bundle-mcp gateway and similar wrappers pipe the JSON-RPC
handshake on stdin then close their stdin half. Pre-fix, both stdin
'end' and 'close' listeners (server.ts:65-66 and serve.ts:204-206)
treated this as a permanent disconnect and shut the server down before
the first tool call arrived.

Guard both sites with `process.env.MCP_STDIO !== '1'`. Signal handlers
(SIGTERM/SIGINT/SIGHUP), transport.onclose, and the parent-process
watchdog still cover legitimate shutdown paths. The serve.ts site
threads the env read through an injectable `mcpStdio?: boolean` on
ServeOptions so tests stay isolated (no process.env mutation per
scripts/check-test-isolation.sh R1).

Tests: 3 new cases in test/serve-stdio-lifecycle.test.ts pin the
guard's invariants — mcpStdio=true must NOT trigger shutdown on stdin
EOF, signals must still drive shutdown with mcpStdio=true, and
mcpStdio=false (default) preserves existing CLI behavior. 25/25 pass.

Origin: PR #870.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(oauth): honor token_endpoint_auth_method=none for PKCE public clients

RFC 7591 §3.2.1: when a DCR client declares
token_endpoint_auth_method="none" (PKCE-only public clients like Claude
Code, Cursor), the authorization server MUST NOT issue a client_secret.
Pre-fix, registerClient unconditionally minted a secret, and the MCP
SDK's clientAuth middleware then rejected valid public-client flows on
/token because it expected client.client_secret to match.

Three changes to src/core/oauth-provider.ts:registerClient:

  - Gate clientSecret generation on isPublicClient = (auth_method === 'none').
    Public clients store client_secret_hash = NULL.
  - Omit client_secret from the response payload for public clients.
    Confidential clients (default client_secret_post and explicit
    client_secret_basic) keep their existing one-time-reveal shape.
  - Normalize NULL secret_hash to JS undefined in getClient so SDK
    middleware (which checks client.client_secret === undefined, not
    === null) correctly identifies public clients and skips the
    secret-comparison branch on /token.

Schema is already permissive (client_secret_hash TEXT, no NOT NULL on
both src/schema.sql and src/core/pglite-schema.ts) — no migration
needed.

Tests: 5 new cases in test/oauth.test.ts pin:
  - public client → no client_secret in response (#11 from plan)
  - default auth_method → secret unchanged (regression guard)
  - explicit client_secret_post → secret unchanged
  - getClient NULL→undefined normalization
  - PKCE full /authorize → /token end-to-end with no secret (#15 from plan)

69/69 oauth.test.ts cases pass. typecheck clean.

Origin: PR #909.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(serve-http): --bind HOST, default to loopback (127.0.0.1)

Adds `gbrain serve --http --bind <interface>` to control which network
interface the HTTP MCP server listens on. Default flipped from
`0.0.0.0` (pre-v0.34) to `127.0.0.1` (v0.34.0+).

Why the flip: gbrain's primary use case is a personal-knowledge brain on
a laptop. The previous default exposed brains on every interface — one
accidental `--http` invocation away from publishing the brain to a LAN.
Server operators who need remote access pass `--bind 0.0.0.0` (or a
specific interface). Codex's outside-voice on the original PR #864
correctly flagged that the additive flag wasn't actually the fix; the
default needed to change for the safety claim to hold.

If `--public-url` is set but `--bind` is unset, runServeHttp prints a
loud stderr WARN at startup recommending `--bind 0.0.0.0`. Declaring a
public URL while quietly binding loopback is almost always a
misconfiguration; we want the operator to see it on first start, not
silently fail remote requests.

Startup banner now includes a `Bind:` row so the listening interface is
visible alongside Port / Engine / Issuer.

Origin: PR #864, extended with D11 (default flip) per /plan-eng-review
codex outside-voice review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mcp): seal source-isolation leak on read path (P0)

Pre-fix, an authenticated OAuth MCP client scoped to source-A could
enumerate source-B pages via six read-side ops: search, query (text
AND image paths), list_pages, traverse_graph, and find_experts. The
v0.31.8 source-scoping pattern shipped through dispatch.ts but the op
handlers never threaded ctx.sourceId into their engine calls, and
hybridSearch.ts:223's explicit SearchOpts rebuild dropped sourceId
even when callers passed it.

Sealing the leak:

  - src/core/operations.ts adds sourceScopeOpts(ctx), the canonical
    precedence ladder: ctx.auth.allowedSources (federated) wins over
    ctx.sourceId (scalar) wins over nothing. Threaded into all 5
    read-side op handlers + the query-image-path searchVector call
    (the 6th leak surface codex caught in plan review).

  - src/core/search/hybrid.ts:223 now threads sourceId + sourceIds
    fields through the inner SearchOpts rebuild. The explicit pick
    shape is preserved (HNSW inner-CTE ordering depends on it) but
    extended.

  - src/core/types.ts adds sourceIds?: string[] to SearchOpts +
    PageFilters (D9: federated read needs array-shaped engine filter
    or fan-out; array wins for hot retrieval).

  - src/core/operations.ts AuthInfo gains sourceId + allowedSources
    (D2: identity surface symmetric with the federated_read column
    #876 will add).

  - Both engines now apply WHERE source_id = $N (scalar) or = ANY($N::text[])
    (array) at the SQL layer for searchKeyword, searchKeywordChunks,
    searchVector, listPages, traverseGraph, traversePaths. Array form
    wins when both are set. The searchVector filter pushes into the
    inner HNSW CTE (codex flagged this placement during plan review).

  - traverseGraph + traversePaths signatures gain opts.sourceId +
    opts.sourceIds; engine.ts interface updated.

  - findExperts (the whoknows op, D3 5th leak surface) accepts
    sourceId + sourceIds and threads them into its internal
    hybridSearch call. PR #861 was authored before v0.33 shipped so
    this op wasn't covered in the original PR.

Auth wiring:

  - GBrainOAuthProvider.verifyAccessToken populates AuthInfo.sourceId
    from oauth_clients.source_id. JOIN guarded by isUndefinedColumnError
    so pre-v55 brains degrade to legacy projection rather than refusing
    every token verification.

  - GBrainOAuthProvider.registerClientManual gains a sourceId
    parameter (defaults to 'default'). DCR registerClient also sets
    source_id='default' on the inserted row.

  - serve-http.ts:929 cleanup: AuthInfo.sourceId is now a real typed
    field. The cast + GBRAIN_SOURCE env fallback chain is gone (D13).
    Legacy bearer tokens default to 'default' source in
    verifyAccessToken.

  - http-transport.ts (legacy access_tokens path) threads
    sourceId='default' through DispatchOpts so v0.22.7 callers stay
    source-scoped.

  - auth.ts CLI adds --source flag to gbrain auth register-client.

Migration v55 (D10 + D13):

  - ALTER TABLE oauth_clients ADD COLUMN source_id TEXT (nullable).
  - Backfill UPDATE source_id = 'default' WHERE source_id IS NULL —
    preserves v0.33 effective behavior verbatim for legacy clients.
  - ADD CONSTRAINT FK ... REFERENCES sources(id) ON DELETE SET NULL,
    wrapped in DO block so re-runs against fresh-install brains (where
    the FK already lives inline in SCHEMA_SQL) no-op cleanly.
  - CREATE INDEX idx_oauth_clients_source_id WHERE source_id IS NOT NULL
    for the verifyAccessToken JOIN.
  - GBRAIN_ACCEPT_SILENT_WIDEN env-flag wired through the runner via
    SET LOCAL gbrain.accept_silent_widen — reserved for future migrations
    that hit the silent-widen footgun codex flagged. This migration
    doesn't need it (column is brand new; no pre-existing stale values
    possible by definition).
  - src/core/pglite-schema.ts + src/schema.sql include the column +
    FK + index inline for fresh installs.

Tests: new test/e2e/source-isolation-pglite.test.ts with 13 regression
cases — one per leak surface (search/list_pages/traverse/etc.) plus
explicit AuthInfo.sourceId and AuthInfo.allowedSources op-handler
threading checks. Full unit suite: 6034 pass / 0 fail. PGLite
initSchema time dropped from 2.4s to 850ms after consolidating v55's
DO blocks (multiple DO blocks were slow on PGLite; one DO block for
the FK install only is fine).

Origin: PR #861 + plan-eng-review decisions D2/D3/D4/D9/D10/D13 + F2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): multimodal embedding for openai-compatible providers

Pre-fix, embedMultimodal hardcoded a recipe.id === 'voyage' branch and
threw AIConfigError for every other recipe. Multimodal-capable providers
fronted by LiteLLM (or any openai-compatible proxy) were unreachable
even when the operator had wired up the model.

The fix:

  - src/core/ai/gateway.ts adds embedMultimodalOpenAICompat() that
    POSTs to the standard /embeddings endpoint with content arrays
    carrying image_url entries. Routing comes from the existing
    recipe.implementation switch — Voyage stays on its own
    /multimodalembeddings path; every other openai-compatible recipe
    flows through the new helper.

  - src/core/ai/recipes/litellm-proxy.ts declares
    supports_multimodal: true so embedMultimodal accepts the recipe.
    No multimodal_models allow-list: LiteLLM is a passthrough proxy
    and the user owns model-id selection; provider rejection (400 from
    upstream) is the right enforcement layer there. Voyage's static
    allow-list shape stays unchanged (its 12 models share
    supports_multimodal but only one is multimodal-capable).

  - D12 runtime dimension validation: the new helper checks the
    returned vector length against the recipe's declared default_dims
    (preferred) or the brain's embedding_dimensions config. Mismatch
    throws AIConfigError with model id + observed + expected so the
    operator can swap models or rebuild the column. Pre-fix, a
    wrong-dim response would surface as a cryptic pgvector
    "vector dimension mismatch" at INSERT time.

  - Auth resolution routes through the existing defaultResolveAuth
    helper so optional-auth recipes (LiteLLM proxy with no
    LITELLM_API_KEY) and required-auth recipes both share one code
    path. Optional-auth sends "Authorization: Bearer unauthenticated"
    which servers like Ollama / llama-server ignore but the SDK
    contract requires.

Tests: 11 new cases in test/openai-compat-multimodal.test.ts cover
happy-path, multi-input batching, unauthenticated proxy, D12 dim
mismatch + default-dim fallback, 401 / 400 / malformed-JSON / non-array
error paths, and an explicit Voyage-regression test pinning that the
new openai-compat route doesn't accidentally hijack the Voyage path.
All 41 multimodal-related tests pass (existing voyage suite + new).
typecheck clean.

Origin: PR #875 + plan-eng-review D12 (runtime dim validation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(oauth): federated_read read scope (#876)

Pre-fix, OAuth clients had a single source-scope axis (source_id, added
in v55). A client could either write+read one source OR be a super-reader
across all sources (via NULL source_id). There was no middle ground —
WeCare-style L3 dept clients that need to write to dept-x but read
dept-x + parent canon + shared canon had no expression.

#876 adds federated_read TEXT[] as an orthogonal read-scope axis. source_id
is the WRITE authority; federated_read is the READ authority. They default
to matching values (read scope == write scope, the pre-v0.34 default)
when a client is registered without an explicit federated read list.

Migrations v56-v60 (six new migrations on top of v55):

  - v56: ALTER TABLE ... ADD COLUMN federated_read TEXT[] NOT NULL DEFAULT '{}'.
  - v57 (F5): explicit CASE backfill so source_id IS NULL → '{}' (not an
    array containing NULL — codex caught this ambiguity during plan review).
  - v58: post-backfill validation. Fails loud if any row's source_id isn't
    in its federated_read array, pointing at a logic bug in v57 if fired.
  - v59: flip the source_id FK from ON DELETE SET NULL to ON DELETE
    RESTRICT now that federated_read provides the alternative scope-loss
    path. Pre-flip, deleting a source could silently widen any oauth_client
    to super-reader; post-flip, source delete is refused if any client
    references it (operator must revoke/re-scope first).
  - v60: GIN index on federated_read for array-containment queries.

Auth wiring:

  - GBrainOAuthProvider.verifyAccessToken JOINs c.federated_read and
    populates AuthInfo.allowedSources. Pre-v56 / pre-v55 brains degrade
    via the existing isUndefinedColumnError fallback chain.
  - registerClientManual gains a federatedRead?: string[] parameter
    (defaults to [sourceId]).
  - DCR registerClient sets source_id='default' + federated_read=['default']
    on the inserted row.
  - auth.ts CLI adds --federated-read SRC1,SRC2,... flag. The
    register-client output now prints "Federated reads:" so operators
    confirm the scope they set.

Engines consume the federated array through the SearchOpts.sourceIds /
PageFilters.sourceIds field that #861 added (no engine changes here — the
plumbing was D9). sourceScopeOpts in operations.ts already prefers the
auth.allowedSources array over scalar ctx.sourceId when set.

Test seam:
  - test/book-mirror.test.ts now spawns the CLI with GBRAIN_HOME pointed
    at a tempdir so the test isn't sensitive to the developer's local
    ~/.gbrain/config.json. Pre-fix the test could silently inherit a real
    Postgres connection and hang past the default 5s test timeout. Fresh
    GBRAIN_HOME → "No brain configured" → exit 1 in <1s.
  - test/e2e/source-isolation-pglite.test.ts gains one more regression
    case: AuthInfo.allowedSources = [] (explicit empty) MUST NOT widen
    scope to "all sources" — the silent-widen footgun precedence ladder.
  - test/openai-compat-multimodal.test.ts is part of the wave's commits
    via the migrate.ts changes that bump the schema chain. typecheck-only
    fix on a captured-auth type was already in #875's tree.

6045 unit tests pass / 0 fail. typecheck clean. PGLite initSchema runs
v55-v60 in ~786ms total (within the test-harness budget for tests using
the canonical beforeAll engine pattern).

Origin: PR #876 + plan-eng-review F5 (CASE backfill).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.34.0.0: MCP fix wave (#870 #909 #864 #861 #875 #876)

VERSION + package.json + CHANGELOG bump for the six-PR MCP fix wave.
Schema chain extends from v54 → v60; oauth_clients gains source_id +
federated_read columns; auth'd MCP clients now stay inside their scope
across all read-side ops; PKCE-only DCR works; --bind defaults to
loopback; LiteLLM multimodal embedding ships.

Contributed by @Hansen1018 (#870), @ding-modding (#909), @DukeDawg
(#864), @toilalesondev (#861 + #876), @yoelgal (#875).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.34.0.0

Sync README, CLAUDE.md, SECURITY.md, docs/architecture/topologies.md,
and docs/mcp/DEPLOY.md to reflect the v0.34.0.0 MCP fix wave:

- README: document --bind HOST default (loopback), --source +
  --federated-read register-client flags, PKCE public-client gate
- SECURITY.md: note loopback-by-default for serve --http, update the
  trust-proxy contract to point at the new default
- CLAUDE.md: annotate operations.ts (sourceScopeOpts helper),
  oauth-provider.ts (verifyAccessToken JOIN + PKCE public clients),
  serve-http.ts (--bind flag), gateway.ts (openai-compat multimodal +
  dim validation), mcp/server.ts (MCP_STDIO guard), auth.ts (--source
  + --federated-read), migrate.ts (v58-v63 chain), engine.ts
  (sourceIds field). Add 4 new test-file entries for
  source-isolation-pglite, openai-compat-multimodal,
  serve-stdio-lifecycle, oauth.test.ts PKCE cases
- docs/architecture/topologies.md: source-scoped register-client
  example, --bind 0.0.0.0 for thin-client host setup
- docs/mcp/DEPLOY.md: --bind explanation in the ngrok section,
  source-scoped client recipe
- llms-full.txt: regenerated per the CLAUDE.md-edit chaser rule

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump v0.34.0.0 → v0.34.1.0

Renumbering the MCP fix wave from v0.34.0.0 to v0.34.1.0 so the
release slot lands between master's v0.33.2.1 and the next minor.

Touches every release-artifact mention:
- VERSION: 0.34.0.0 → 0.34.1.0
- package.json: same
- CHANGELOG.md header + "To take advantage" block
- CLAUDE.md key-files annotations (8 entries that document this wave)
- llms-full.txt (regen from CLAUDE.md)
- README.md / SECURITY.md / docs/architecture/topologies.md / docs/mcp/DEPLOY.md
- Wave code-comment markers ("// v0.34.0 (#NNN):" → "// v0.34.1 (#NNN):")

Test files renamed alongside since they were committed with the wave.

Commit subjects on the original 6 PR commits + the v0.34.0.0 bump
commit (4f533c76b47db7) intentionally NOT rewritten — those are
history. `git log` finds the implementation by message subject, not by
version tag.

6275 unit tests pass, typecheck clean, migration chain v58-v63 unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 19, 2026
T10 of brain-health-100 wave — load-bearing decision-pinning tests.

test/brain-score-recommendations.test.ts (22 cases):
  - Healthy brain → empty plan
  - Per-component remediation paths (sync, embed, backlinks, extract)
  - depends_on wiring (extract → sync; embed → sync when stale)
  - Severity ordering (critical > high > medium > low)
  - D6 #5 determinism: same input twice → byte-identical output
  - D9 idempotency keys: content-hash format, no time-slot
  - D9 source isolation: different --source → different key
  - D13 status field always 'remediable' in output
  - +A cost-estimate populated for embed
  - classifyChecks: remediable / blocked / human_only triage
  - maxReachableScore: all-remediable → 100; all-blocked → current

test/op-checkpoint.test.ts (20 cases):
  - fingerprint stability + key-order invariance (canonical-JSON)
  - codex #11: extract links vs timeline get different fingerprints
  - codex #12: reindex markdown vs code get different fingerprints
  - codex #15: embed model+dim variation produces different fingerprints
  - reindex chunker_version bump invalidates checkpoint
  - DB round-trip (load → record → load)
  - Cross-fingerprint isolation (linksKey vs timelineKey)
  - clearOpCheckpoint idempotency on missing rows
  - resumeFilter purity (no I/O, deterministic)
  - purgeStaleCheckpoints TTL respect

42 new tests, all pass. PGLite engine + resetPgliteState pattern per
CLAUDE.md test-isolation guide.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T10 + D6 #5 + D9 + D12 + D13 + codex #11/#12/#15).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 19, 2026
…--remediate + Minions (#1193)

* feat(schema): op_checkpoints table + doctor_run_id partial GIN (v67+v68)

T1 of brain-health-100 wave. Two new migrations underpin autonomous
remediation via Minions:

- v67 op_checkpoints — shared checkpoint table for long-running ops
  (embed, extract, lint, backlinks, reindex, integrity). Pre-fix each
  op had its own file-backed checkpoint or none. PRIMARY KEY (op,
  fingerprint) lets `extract links` and `extract timeline` (or
  `reindex --markdown` vs `--code`) coexist without colliding on
  shared keys.

- v68 minion_jobs_doctor_run_id_idx — partial GIN on
  `minion_jobs.data WHERE data ? 'doctor_run_id'`. Indexes only
  doctor-submitted jobs so audit-trail queries don't sequential-scan
  months of unrelated cron history. PGLite skips via empty sqlFor.

Applied to src/schema.sql + src/core/pglite-schema.ts so both engines
get the table on fresh-install. Bootstrap coverage test +
122-case migrate test both pass.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(D12 + folded scope B from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): op-checkpoint module — DB-backed checkpoint primitive

T2 of brain-health-100 wave. Six exports plus per-op fingerprint helpers:

  loadOpCheckpoint(engine, key)     → string[]   (completed keys; [] if none)
  recordCompleted(engine, key, ks)  → void       (UPSERT atomic)
  clearOpCheckpoint(engine, key)    → void       (clean-exit drop)
  resumeFilter(all, completed)      → string[]   (pure; drives batched walks)
  purgeStaleCheckpoints(engine, ttl)→ number     (cycle purge phase consumer)

Fingerprint helpers:
  fingerprint(params)               — sha8 of canonical-JSON
  embedFingerprint(p)               — model+dim+slug+source variation
  extractFingerprint(p)             — mode (links vs timeline)
  reindexFingerprint(p)             — markdown vs code vs slug + chunker_version
  lintFingerprint, backlinksFingerprint, integrityFingerprint, importFingerprint

Canonical-JSON over keys-sorted ensures the same params produce the
same fingerprint across runs and hosts. sha8 (8 hex chars from sha256)
is short enough for filenames + UI but collision-resistant for the
expected per-op invocation diversity.

DB-backed for both engines (PGLite has the table too via v67). Lost-
write on partial DB failure is non-fatal — caller continues, next run
re-walks (cheap for hash-short-circuited ops like embed/import).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(D12 + codex #10–16 from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): brain-score-recommendations — shared data layer

T4 of brain-health-100 wave. Pure module — no engine I/O. Takes a
BrainHealth snapshot + RecommendationContext, returns ordered
Remediation[] ready to feed the doctor remediation plan OR features
--auto-fix.

Three public exports:
  computeRecommendations(health, ctx)  → Remediation[]
  classifyChecks(checks, ctx)          → CheckClassification[]
  maxReachableScore(health, classes)   → number (0-100 ceiling)

D13 — three-state classification per check: remediable / human_only /
blocked. The plan ONLY emits remediable items; blocked surfaces
alongside as informational with the missing prereq (no API key, etc.).
Closes the spin-loop bug on empty / API-key-missing brains (codex #20).

D14 — every Remediation has a stable string id (sync.repo, embed.stale,
backlinks.fix, extract.all). depends_on references ids, not check names.

D9 — idempotency_key is content-hash from canonical-JSON of params.
Same intent across runs = same key; failed-row replay via :r<N> suffix
is the --remediate loop's job, not this module's.

Scope item +A (cost-budget gate) — Remediation.est_usd_cost populated
for embed (chars × pricePerMTok from embedding-pricing.ts) and Anthropic
jobs (estimateAnthropicCost helper). doctor --remediate --max-usd N
gates submission against est_total_usd_cost.

Both consumers (doctor + features per D15) import from here. Features
executes inline (D15 contract preserved), doctor submits via queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handlers): 11 new Minion handlers + 3 added to PROTECTED + sync noExtract fix

T5 of brain-health-100 wave.

PROTECTED_JOB_NAMES extension (D11): synthesize, patterns, consolidate.
These cycle phases internally submit `subagent` jobs with
allowProtectedSubmit=true, so they CAN spend Anthropic credits.
Treating them as "data-quality maintenance" was a misread surfaced by
the codex outside-voice review (#6). Protected gate ensures only
trusted local callers (CLI, autopilot, doctor --remediate) can submit;
an OAuth-scoped MCP client can't burn the user's API budget by
submitting a synthesize job over HTTP.

11 new handlers registered in jobs.ts registerBuiltinHandlers:

  PROTECTED (3) — phase-wrappers that spawn subagent children:
    synthesize, patterns, consolidate

  Open (8) — DB/fs writes only, no LLM spend:
    reindex, repair-jsonb, orphans, integrity, purge,
    extract_facts, resolve_symbol_edges, recompute_emotional_weight

Phase-wrappers all delegate to `runCycle({ phases: [name] })` rather
than extracting standalone phase functions. Cycle.ts already owns the
lock + abort signal + progress reporter per D10, so the wrapper is a
one-liner and cycle.ts remains the single source of truth for phase
semantics. Pragmatic deviation from the plan's "extract 6 standalone
runXxxPhase functions" — smaller diff, equivalent correctness.

Standalone `sync` handler now passes `noExtract: true` (codex #5 fix).
Pre-fix, doctor's remediation plan emitting [sync, extract] caused
double-extraction (performSync inline-extract + standalone extract
job). Now sync defers extract to the dedicated handler. Callers that
want inline extract pass { noExtract: false } in job params.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T5 + D10 + D11 + codex #5/#6 from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): --remediation-plan + --remediate CLI surfaces

T6 of brain-health-100 wave. The headline user-facing capability:
agents drive brain health to target score via autonomous Minions
remediation.

Two new flags on `gbrain doctor`:

  --remediation-plan [--json] [--target-score N]
    Read-only. Emits ordered Remediation[] from BrainHealth + context.
    Uses cheap path (D7) — engine.getHealth() + computeRecommendations,
    NOT a full doctor walk. JSON shape is stable agent contract.

  --remediate [--yes] [--target-score N] [--max-jobs N] [--max-usd N]
              [--dry-run] [--json]
    Sequential submit (D3) with D5 cascade on failure, D7 scoped
    recheck between steps, D9 content-hash idempotency keys, D13
    three-state remediation filtering (only remediable jobs enter
    the loop), +A cost-budget gate via --max-usd.

Check.remediation field added as additive optional (DoctorReport
schema_version stays at 2 per D4).

PGLite path: synchronous in-process execution with short polling.
Postgres path: durable queue submission with waitForCompletion.

The --remediate loop:
  1. Compute initial plan from BrainHealth
  2. Refuse if --target-score > maxReachableScore(health, classes)
  3. Refuse if est_total_usd_cost > --max-usd
  4. For each step in order:
     - Skip if depends_on intersects aborted set (D5)
     - queue.add with content-hash idempotency_key (D9)
     - waitForCompletion with timeout
     - Recompute plan from fresh health (D7 scoped recheck)
  5. Exit 0 if all completed; 1 if any failed/aborted

doctor_run_id UUID stamps every submitted job's data field so
operators can later query `SELECT * FROM minion_jobs WHERE
data->>'doctor_run_id' = '<uuid>'` (indexed via v68 partial GIN).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T6 + D1/D3/D5/D7/D9/D13 + folded scope A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): maybeBackground helper + apply --background to embed

T7 of brain-health-100 wave. New helper in src/core/cli-options.ts
formalizes the --background flag pattern. Same semantics in TTY and
cron per D9 (submit-and-exit always; --background --follow execs
`gbrain jobs follow <id>` after submission).

  await maybeBackground({
    engine, args, jobName: 'embed',
    paramBuilder: (cleanArgs) => ({ stale, all, ... }),
  })
  // returns true if backgrounded → caller exits

Content-hash idempotency key (D9): `cli:embed:sha8(canonical-JSON(params))`.
No time-slot. Same intent across runs = same key. Failed-row replay
is the doctor --remediate loop's job, not this path's.

PGLite degrades to inline execution with a clear stderr note
("PGLite has no worker daemon; running inline"). NOT a no-op,
NOT silent — doc-stated semantic difference because PGLite has no
worker daemon.

Applied to `gbrain embed` as the reference integration. The other 6
commands (extract, lint, backlinks, reindex, integrity, pages) adopt
the same 4-line pattern at the top of their entry function — follow-up
in a smaller diff once the helper proves out in production.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T7 + D9 + Gap 6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(autopilot): targeted-submit loop + op_checkpoints GC in purge phase

T8 of brain-health-100 wave.

Autopilot dispatch changes (src/commands/autopilot.ts):

Pre-fix: every tick submitted ONE autopilot-cycle job, full phase
set, regardless of brain state. On a healthy brain pure overhead; on
a degraded brain bundled fast wins with slow phases so user waited
for the slowest.

New decision logic (T8 from plan):
  - score >= 95 AND empty plan AND <60min since last full → SLEEP
  - score >= 95 AND empty plan AND >=60min → submit autopilot-cycle
    (phase-coupling exercise)
  - plan <= 3 steps AND est_total < 5min → submit individual handlers
    (targeted; uses D9 content-hash idempotency keys per step;
    maxWaiting:1 per submit per codex #17)
  - else → submit autopilot-cycle (the hammer)

D10 cycle-lock invariant guarantees targeted-submit and autopilot-cycle
can never run concurrently (both acquire gbrain-cycle), closing the
"60-min floor double-processes queued targeted jobs" failure mode.

Computation uses cheap path (D7) — engine.getHealth() + computeRecommendations,
NOT a full doctor walk. Adds ~1 SQL count query per tick; negligible
on a 50K-page brain.

PROTECTED handlers (synthesize/patterns/consolidate) are submitted with
allowProtectedSubmit:true; autopilot is a trusted local caller.

Cycle purge phase (src/core/cycle.ts):

Added op_checkpoints GC (+C folded scope item). 7-day TTL — any
reasonable long-running op finishes inside that window. Non-fatal
on pre-v67 brains (table missing).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T8 + D7/D9/D10 + codex #17 + folded scope +C).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(core): brain-score-recommendations + op-checkpoint unit tests

T10 of brain-health-100 wave — load-bearing decision-pinning tests.

test/brain-score-recommendations.test.ts (22 cases):
  - Healthy brain → empty plan
  - Per-component remediation paths (sync, embed, backlinks, extract)
  - depends_on wiring (extract → sync; embed → sync when stale)
  - Severity ordering (critical > high > medium > low)
  - D6 #5 determinism: same input twice → byte-identical output
  - D9 idempotency keys: content-hash format, no time-slot
  - D9 source isolation: different --source → different key
  - D13 status field always 'remediable' in output
  - +A cost-estimate populated for embed
  - classifyChecks: remediable / blocked / human_only triage
  - maxReachableScore: all-remediable → 100; all-blocked → current

test/op-checkpoint.test.ts (20 cases):
  - fingerprint stability + key-order invariance (canonical-JSON)
  - codex #11: extract links vs timeline get different fingerprints
  - codex #12: reindex markdown vs code get different fingerprints
  - codex #15: embed model+dim variation produces different fingerprints
  - reindex chunker_version bump invalidates checkpoint
  - DB round-trip (load → record → load)
  - Cross-fingerprint isolation (linksKey vs timelineKey)
  - clearOpCheckpoint idempotency on missing rows
  - resumeFilter purity (no I/O, deterministic)
  - purgeStaleCheckpoints TTL respect

42 new tests, all pass. PGLite engine + resetPgliteState pattern per
CLAUDE.md test-isolation guide.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T10 + D6 #5 + D9 + D12 + D13 + codex #11/#12/#15).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v0.36.0.0 — brain-health-100 wave + docs/llms refresh

T12 of brain-health-100 wave. VERSION + package.json bumped 0.35.6.0
→ 0.36.0.0. CHANGELOG entry leads ELI10 ("your agent can now drive
your brain to 90/100 by itself, on a cron, without you watching")
then drills into the precise mechanics per CLAUDE.md voice rules.

llms.txt + llms-full.txt regenerated via bun run build:llms.

Trio audit (CLAUDE.md mandatory pre-push check):
  VERSION:     0.36.0.0
  package.json: 0.36.0.0
  CHANGELOG:   ## [0.36.0.0] - 2026-05-18

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update README/CLAUDE/AGENTS/maintain for v0.36.4.0 brain-health-100 wave

- README.md: New-in-v0.36.4.0 callout — `gbrain doctor --remediate` headline,
  autopilot health-aware tick, eleven new background-job types, three PROTECTED.
- CLAUDE.md: Key Files entries for `op-checkpoint.ts`, `brain-score-recommendations.ts`,
  doctor.ts / jobs.ts / protected-names.ts / autopilot.ts / cycle.ts / embed.ts /
  cli-options.ts extensions; new "Key commands added in v0.36.4.0" section.
- AGENTS.md: Common-tasks entry pointing agents at the one-command remediation loop.
- skills/maintain/SKILL.md: Autonomous Phase (gbrain doctor --remediate) at the top,
  manual per-dimension walk preserved as the fallback path.
- llms-full.txt: regenerated to pick up the CLAUDE.md changes (project rule).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): respectful tone on spend caps for v0.36.4.0

Reframed the cost-budget callout. Pre-fix language said the spend cap
prevents a synthesize loop from "burning $100 of Anthropic credits
while you're at lunch" — casually treating $100 as the throwaway number
is tone-deaf. $100 is a meaningful amount for many people.

New language: "spend cap so a synthesize loop can't run up your
Anthropic bill while you're at lunch. The cap is yours to set per run."
And: "Pass --max-usd 5 (or whatever cap you're comfortable with)."
And: "Pick the cap that fits your wallet."

Also reframed three adjacent lines:
- "healthy brains stop burning cycles" → "stop spending tokens on
  work that has nothing to do"
- "agent can't submit them and burn your API budget" → "can't submit
  them on your behalf. Your provider bill stays in your hands"
- Table cell "Cron with cost cap" / "--max-usd 5" → "Cron with spend
  cap" / "--max-usd N"

llms-full.txt regenerated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 23, 2026
…on + audit-writer unification (#1300)

* v0.40.4.0 T1: shared audit-writer primitive

Extract createAuditWriter() helper. Five hand-rolled JSONL audit
modules (rerank-audit, shell-audit, supervisor-audit, audit-slug-
fallback, phantom-audit) duplicated the same ISO-week filename math,
best-effort write loop, and read-current-plus-previous-week loop.
T2 refactors all 5 onto this primitive.

Behavior preservation: filename format, JSONL line shape, mkdir
recursive, appendFileSync utf8, stderr-on-failure all byte-identical
to the existing modules so their tests pass unchanged.

resolveAuditDir() moves here from shell-audit.ts; shell-audit.ts
will re-export for back-compat (T2). Honors GBRAIN_AUDIT_DIR with
whitespace-trim, falls back to ~/.gbrain/audit/.

Test coverage: 22 cases covering ISO-week math + year-boundary edges
(2027-01-01 → 2026-W53), env override, mkdir-recursive, fail-open
stderr-warn shape, cross-week readback, corrupt-row skip, non-finite-
ts skip, round-trip with nested fields, computeFilename + resolveDir
accessors.

Plan ref: D5=B audit unification cathedral expansion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T2: refactor 5 audit modules onto shared writer

Replace the duplicated ISO-week filename math + best-effort write loop
+ read-current-plus-previous-week loop in:
  - src/core/rerank-audit.ts (rerank-failures-*.jsonl)
  - src/core/audit-slug-fallback.ts (slug-fallback-*.jsonl)
  - src/core/minions/handlers/shell-audit.ts (shell-jobs-*.jsonl)
  - src/core/minions/handlers/supervisor-audit.ts (supervisor-*.jsonl)
  - src/core/facts/phantom-audit.ts (phantoms-*.jsonl)

All five now delegate file I/O to createAuditWriter from T1. Public
API preserved bit-for-bit:
  - logRerankFailure, readRecentRerankFailures, computeRerankAuditFilename
  - logSlugFallback, readRecentSlugFallbacks, computeSlugFallbackAuditFilename
  - logShellSubmission, computeAuditFilename, resolveAuditDir
  - writeSupervisorEvent, readSupervisorEvents, computeSupervisorAuditFilename
    plus isCrashExit, summarizeCrashes, CrashSummary (domain-specific
    helpers stay in supervisor-audit.ts; only file I/O moves)
  - logPhantomEvent, readRecentPhantomEvents, computePhantomAuditFilename

Domain-specific behavior preserved:
  - audit-slug-fallback emits per-call stderr (D7 dual logging) in the
    caller; the shared writer is failure-only stderr
  - rerank-audit truncates error_summary to 200 chars before write
  - phantom-audit spreads optional fields conditionally (skip undefined)
  - supervisor-audit keeps single-file readback (no cross-week walk)
    to preserve pre-v0.40.4 doctor assertions

resolveAuditDir lives in src/core/audit/audit-writer.ts; shell-audit.ts
re-exports it so existing imports keep working (every other audit
module + gbrain-home-isolation.test.ts + minions.test.ts +
minions-shell.test.ts pull resolveAuditDir from shell-audit.ts).

Operator-visible drift: rerank-audit stderr line drops the
'rerank-failure audit' qualifier — was '[gbrain] rerank-failure audit
write failed (...)' now '[gbrain] write failed (...); search continues'.
Stderr is human-debugging, not machine-parsed; the file written gives
the qualifier away in `tail -f audit/*`.

Test coverage: 128/128 audit-touching tests pass unchanged.

Plan ref: D5=B audit unification cathedral expansion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T3: getAdjacencyBoosts engine method (PG+PGLite parity)

Add BrainEngine.getAdjacencyBoosts(pageIds) returning Map<page_id,
AdjacencyRow{hits, cross_source_hits}>. Returns ALL pages with
hits >= 1 (callers apply their own threshold).

Cross-source semantic (D15=A): cross_source_hits EXCLUDES the target
page's own source. A page in source A linked from 2 pages in source A
reports cross_source_hits = 0. Linked from 1 in source B + 1 in
source C reports 2.

Source-scope contract: pageIds MUST already be source-scoped by the
caller. Method does NOT filter by source_id. The in-set restriction
makes cross-source leakage impossible by construction. JSDoc spells
this out; same trust posture as cosineReScore's chunk_id handling.

COALESCE(p.source_id, 'default') on both target and from-page sides
for defense-in-depth even though pages.source_id is NOT NULL today.

JSDoc/SQL contract alignment (codex #2): HAVING >= 1 matches the
"returns ALL pages with hits >= 1" contract; threshold of 2 is the
caller's call in applyGraphSignals.

Known limitation (codex #15): cross_source_hits cannot distinguish
"genuinely linked from another team" from "mirrored imports from
another source." T-todo-4 captures the v0.41+ refinement.

SearchResult type extension (D4=A flat fields, D12=A attribution):
  - graph_adjacency_hits, graph_cross_source_hits,
    graph_session_demoted, graph_session_prefix
  - base_score, backlink_boost, salience_boost, recency_boost,
    exact_match_boost, graph_adjacency_boost, graph_cross_source_boost,
    session_demote_factor, reranker_delta
All optional; T4-T6 populate them.

Test coverage: 7/7 hermetic PGLite cases. Empty input, singleton,
same-source hub, cross-source attribution including the
"linked-only-from-other-source" case (widget in source b, linked
from alice+bob in source a → cross_source_hits=1), JSDoc HAVING>=1
contract. Postgres parity asserted by SQL-shape identity (will get a
mirror Postgres E2E in T10's eval gate work via DATABASE_URL when
set; PGLite hermetic case shipped now).

NULL source_id COALESCE branch noted as untestable in current PGLite
schema (pages.source_id is NOT NULL); kept as defense-in-depth.

Plan ref: T3 in v0.40.4.0 wave plan; D1=A, D3=A, D15=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T4+T11: applyGraphSignals 4th stage in runPostFusionStages

New file src/core/search/graph-signals.ts. Three signals:

  1. Adjacency-within-top-K (×1.05): hits >= 2 inbound from in-set.
  2. Cross-source adjacency (×1.10, stacks): cross_source_hits >= 2.
     Dormant on single-source brains.
  3. Session diversification (×0.95): if multiple top-K share a slug
     prefix, keep highest scoring, DEMOTE the rest. NOT amplify —
     codex caught the original framing was backwards (amplification
     of redundancy makes the cited "weak chunks compete for budget"
     problem worse, not better).

Conservative magnitudes (D14=B): 1.05/1.10/0.95. Score-distribution
probe (onScoreDistribution) collects min/p25/p50/p75/p95/max +
reorder_band_width to feed T-todo-2 magnitude calibration wave.

Slot: 4th stage inside runPostFusionStages (hybrid.ts:248), AFTER
backlink/salience/recency, pre-dedup. Inherits the v0.35.6.0
floor-ratio gate from computeFloorThreshold — this is the structural
protection that prevents a low-cosine hub from outranking a strong
non-hub (codex T2 / D1=A).

PostFusionOpts extends with graphSignalsEnabled, onGraphMeta,
onScoreDistribution. Caller (hybridSearch in subsequent T5 work)
resolves graph_signals from the mode bundle.

Source-scope contract preserved: getAdjacencyBoosts takes raw
page_ids, no source filter. Adjacency is in-set restricted so
cross-source leakage is impossible by construction (D3=A).

Fail-open: engine throw → JSONL audit row via shared createAuditWriter
(T1/T2 primitive, featureName='graph-signals-failures') + meta.errored
+ caller's results unchanged. Session diversification ALSO skips on
failure (predictable all-or-nothing posture).

Mutation note (codex #9): score mutated in place. base_score must be
stamped at runPostFusionStages entry BEFORE this stage so eval-capture
sees pre-boost score (T6 attribution wave).

Test coverage (24 cases, including T11 IRON RULE regression):
  - sessionPrefix multi/single/empty cases
  - computeScoreDistribution percentile math
  - Disabled + empty short-circuits
  - Adjacency hit, no-hit, cross-source stacking, cross-source alone
  - Session diversification 3-share + single-segment + singleton
  - Test seam injection (no engine call)
  - Fail-open: throw → audit row + meta.errored + unchanged
  - Empty Map → session still runs
  - Score-distribution always emits when enabled
  - Meta carries fire counts + duration_ms
  - Missing page_id silently skipped from dedup set
  - **T11 IRON RULE regression (3 cases):**
    * weak hub BELOW floor_threshold does NOT get boosted past
      above-floor non-hub (the bug class the floor gate exists for)
    * hub AT floor still gets boosted (gate is < not <=)
    * NaN score → NaN >= threshold is false → no boost

Plan ref: T4 + T11 in v0.40.4.0 wave plan; D1=A, D2=A, D11=B, D14=B,
D9=A, D5=B. Codex outside-voice #1 + #2 + #6 + #8 + #9 addressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T5: graph_signals mode-bundle knob + KNOBS_HASH bump 3→4

ModeBundle gains graph_signals: boolean. Per-mode defaults:
  - conservative: false (cost-sensitive tier)
  - balanced:     true (the wave's primary surface for default-on)
  - tokenmax:     true (power-user tier, capstone fit)

SearchKeyOverrides + SearchPerCallOpts gain optional graph_signals
field. resolveSearchMode picks via the standard per-call → config
override → mode bundle chain.

loadOverridesFromConfig parses 'search.graph_signals' from the config
table ('1' or 'true' → true). SEARCH_MODE_CONFIG_KEYS adds the key
so `gbrain search modes --reset` clears it alongside other knobs.

KNOBS_HASH_VERSION bump 3→4 (append-only per CDX2-F13). New `gs=`
parts entry appended AFTER cross-modal + column + prov entries. A
graph-on cache write cannot be served to a graph-off lookup —
mid-deploy hit-rate dip clears within cache.ttl_seconds (3600s).

src/commands/search.ts KNOB_DESCRIPTIONS gains graph_signals entry
so `gbrain search modes` dashboard renders the new knob.

Test coverage:
  - test/search-mode.test.ts (+ 8 new cases): per-mode defaults
    canonical, config override both directions, per-call override
    wins, knobsHash distinct for on/off, config key registered,
    attributeKnob reports per-call + mode sources correctly.
  - test/search/knobs-hash-reranker.test.ts: version assertion
    bumped 3→4 with v0.40.4 rationale comment.
  - test/cross-modal-phase1.test.ts: version assertion bumped
    3→4 with v0.40.4 rationale comment.
  - Canonical-bundle assertions updated to include graph_signals
    in expected shape (3 cases).

50/50 search-mode tests pass. 45/45 cross-modal pass. 17/17
knobs-hash-reranker pass. 10/10 balanced-reranker pass.

Plan ref: T5 in v0.40.4.0 wave plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T6: per-stage attribution stamping in every boost

Every boost stage that mutates SearchResult.score now stamps a field
recording WHAT it multiplied:

  - applyBacklinkBoost  → backlink_boost (skipped when count == 0)
  - applySalienceBoost  → salience_boost (skipped when score == 0)
  - applyRecencyBoost   → recency_boost (skipped on evergreen prefix)
  - applyExactMatchBoost → exact_match_boost (skipped on no-match
    OR when intent's exactMatchBoost == 1.0 no-op)
  - runPostFusionStages → base_score stamped ONCE at entry, BEFORE
    any boost mutates r.score. Idempotent: caller-pre-stamped value
    preserved. Empty-results short-circuit unchanged.
  - applyReranker → reranker_delta = original_index - new_index
    (positive = rank improved; raw rerank score stays in rerank_score)
  - applyGraphSignals → graph_adjacency_boost, graph_cross_source_boost,
    session_demote_factor (T4 already stamped these)

Why: feeds the T7 `gbrain search --explain` formatter so it can
attribute the final score to its components. Without these stamps,
"why did this rank where it did?" is grep-and-guess.

SearchResult.reranker_delta doc updated to clarify it's a RANK delta
(positive = improved), not a score delta. The raw relevance score
stays in `rerank_score` (untyped, for back-compat with telemetry that
already reads it).

Test coverage: 16 new cases in test/search/attribution-stamping.test.ts.
Pins: every boost stamps when it fires AND skips stamping when it
doesn't (no false attribution on no-op stages). base_score idempotency
preserved. reranker_delta computed correctly across rank-improved +
rank-degraded cases.

All 178/178 search tests pass (no regressions).

Plan ref: T6 cathedral expansion in v0.40.4.0 wave plan; D12=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T7: gbrain search --explain per-stage attribution

New file src/core/search/explain-formatter.ts renders SearchResult[]
as a multi-line breakdown of how the final score was formed:

  1. people/alice (score=12.4)
     base=10.2 (rrf+cosine)
     + backlink ×1.08
     + salience ×1.05
     + adjacency ×1.05 (hits=3)
     + cross_source ×1.10 (other_sources=2)
     ↑ reranker rank +2
     = final 12.4

Reads the boost_* / base_score / *_hits fields populated by T4 + T6.
Empty path: "no boosts applied" when no stage stamped anything.
Session demote rendered with `-` prefix (not `+`) so the demotion
direction is visually distinct from boosts.

CliOptions gains `explain: boolean`; parseGlobalFlags recognizes
`--explain` anywhere in argv. cli.ts formatResult for `search` +
`query` cases reads CliOptions.explain via the module-level
singleton and routes to formatResultsExplain when set. Lazy import
keeps the hot path narrow for the common non-explain case.

Number formatting: 4-decimal precision, trailing zeros stripped
('1.0000' → '1', '0.1234' → '0.1234'). NaN preserved as 'NaN'.

Test coverage:
  - test/search/explain-formatter.test.ts: 19 cases pin output
    format. Each boost type renders correctly, every-stage stacking
    composes, reranker_delta=0 doesn't render, empty list short-
    circuits, rank numbering 1-based, number formatting edge cases.
  - test/cli-options.test.ts: 3 new cases for --explain parsing
    (basic, absent default, any-argv-position).

Existing CliOptions literals in test/cli-options.test.ts +
test/thin-client-upgrade-prompt.test.ts updated for new required
explain field.

JSON envelope unchanged — the same attribution fields surface in
existing --json output via JSON.stringify; no separate JSON formatter
needed.

Plan ref: T7 cathedral expansion in v0.40.4.0 wave plan; D12=A + D6=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T8: doctor check graph_signals_coverage

New checkGraphSignalsCoverage in src/commands/doctor.ts. Wired into
both runDoctor (local engine) and doctorReportRemote (HTTP MCP /
JSON path) so local AND remote-server brains both surface the metric.

Logic:
  1. Resolve active graph_signals setting: config override
     'search.graph_signals' wins, else mode bundle default
     ('search.mode' → conservative=false, balanced/tokenmax=true).
  2. When disabled → silent ok ("disabled — coverage not checked").
     Avoids polluting doctor output on installs that don't use the
     feature.
  3. When enabled, compute global inbound-link density:
     COUNT(DISTINCT to_page_id) / COUNT(*) across non-deleted pages.
  4. <10% → warn ("signal will rarely fire") with paste-ready
     `gbrain extract all` fix hint.
  5. >=30% → ok ("fire on most queries") with metric.
  6. 10-29% → ok ("fire occasionally") with metric.

Known limitation (codex outside-voice #14): global density is an
imperfect proxy for "top-K subgraphs have enough edges to fire."
T-todo-5 captures the v0.41+ refinement that measures actual fire
rate from search-stats after 30 days of data.

Best-effort: SQL errors → warn with the underlying message. Never
breaks doctor.

Test coverage (7 new cases in test/doctor.test.ts):
  - conservative mode → silent ok regardless of coverage
  - balanced default + 0 links → warn at 0% with fix hint
  - balanced default + 40% inbound → ok "fire on most queries"
  - balanced default + 20% inbound → ok "fire occasionally"
  - explicit search.graph_signals=false overrides mode default
  - empty brain → ok with explanation
  - check is wired into runDoctor (source-grep regression guard)

All 55/55 doctor.test.ts cases pass.

Plan ref: T8 in v0.40.4.0 wave plan; D6=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T9: gbrain search stats graph_signals section

runStatsSubcommand in src/commands/search.ts gains a graph_signals
section in both --json and human output:

  Graph signals:
    enabled:    true (mode default)
    failures:   3 fail-open event(s)
      ECONNREFUSED         2
      timeout              1

Data sources:
  - config: 'search.graph_signals' override → enabled + source=config,
    otherwise mode-bundle default → enabled + source=mode_default.
  - JSONL audit: readRecentGraphSignalsFailures(days) returns events;
    failures_count is len, failures_by_reason buckets by first word of
    error_summary (e.g. 'ECONNREFUSED', 'timeout').

JSON envelope (schema_version 2 unchanged; graph_signals is a new
sibling property of stats, so consumers reading the existing fields
keep working):

  {
    "schema_version": 2,
    ...stats...,
    "graph_signals": {
      "enabled": bool,
      "source": "config" | "mode_default",
      "failures_count": int,
      "failures_by_reason": { reason: count }
    },
    "_meta": { metric_glossary: { ..., graph_signals_enabled: ..., graph_signals_failures_count: ... } }
  }

Fire-rate metrics (adjacency_fires, cross_source_fires,
session_demotions) and score-distribution stats are NOT in this
section yet — they require telemetry-table writes from the
applyGraphSignals onMeta callback. Wired in v0.41+ via T-todo-2
calibration wave (the wave that needs them). For v0.40.4: status +
error count is the actionable surface for "is graph_signals on, and
is it failing?"

Human output: prints the section after the existing stats block.
Edge case: when total_calls is 0 BUT graph_signals is enabled OR
has historical failures, still prints the section so operators
don't lose the signal on a brain with no telemetry yet.

Test coverage (6 cases in test/search/search-stats-graph-signals.test.ts):
  - search.graph_signals=true → enabled true, source=config
  - mode=conservative → enabled false, source=mode_default
  - no config → enabled true (balanced default), source=mode_default
  - JSONL failures bucketed by first word of error_summary
  - empty audit → failures_count 0, empty failures_by_reason
  - human output includes "Graph signals:" header

Plan ref: T9 in v0.40.4.0 wave plan; D6=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T10: eval gates (longmemeval-mini A/B + paired bootstrap)

New test/e2e/graph-signals-eval.test.ts runs each longmemeval-mini
question twice (graph_signals off, graph_signals on) and asserts:

  Gate 1 (QUALITY) — paired bootstrap, 10,000 resamples:
    - If signals-on is significantly WORSE than off
      (delta < 0 AND p < 0.05) → fail.
    - Otherwise pass. p>=0.05 either direction OR delta >= 0 → ok.

  Gate 2a (CHANGE-MAGNITUDE): mean Jaccard@5 over result-set overlap
    must be >= 0.5. If results overlap less than half, the change is
    too large and needs human review before default-on.

  Gate 2b (CHANGE-MAGNITUDE): top-1 stability rate >= 0.7. If 30%+
    of top picks change, hard look required.

  Gate 3 (HARD ABSOLUTE FLOOR): recall@5 drop <= 5pt. Catastrophic
    regression catch (codex outside-voice #18 — addresses the "top-5
    must not drop at all" brittleness on tiny fixtures).

Bootstrap implementation:
  - Per-question observation is binary (recall@5 hit/miss).
  - Paired pairing on question_id between on/off branches.
  - Centered distribution under null (subtract observed mean) per
    standard paired-bootstrap-shift approach for binary outcomes.
  - Two-tailed p-value: |resampled delta| >= |observed delta|.
  - Deterministic seeded RNG so test runs are stable across CI.

pairedBootstrapPValue exported as a pure function with separate
tests for edge cases (empty input, all-equal, strong positive, strong
negative, determinism). Reusable from future calibration waves.

Hermetic: in-memory PGLite via createBenchmarkBrain + resetTables
between questions. No API keys needed (--no-embed import path
exercises keyword-only retrieval). Skips gracefully via describe.skip
when the fixture is missing.

Plan ref: T10 in v0.40.4.0 wave plan; D7=C absolute floor + D13=A
paired bootstrap; codex #4 + #18 stability-vs-quality distinction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 T12: VERSION + package.json + CHANGELOG + TODOS

VERSION: 0.37.11.0 → 0.40.4.0
package.json: 0.37.11.0 → 0.40.4.0
CHANGELOG.md: top entry for v0.40.4.0 in ELI10-lead voice per
  CLAUDE.md release rules. Lead is plain-English ("Your search now
  notices when a page is a hub for your query"); precise file paths
  / SQL semantics / numbers live in the "Itemized changes" section
  below. Includes the cathedral-expansion notes (D5=B audit
  unification, D12=A per-stage attribution, D13=A eval gates) and
  the "To take advantage of v0.40.4.0" verify-and-fix block.

TODOS.md: 5 new items captured under "v0.40.4 graph signals —
deferred follow-ups (v0.41+)":
  - T-todo-1: profile graph-signal SQL latency, merge if hot (D8=C)
  - T-todo-2: magnitude calibration wave from probe data (D14=B / D17)
  - T-todo-3: DB-backed audit table for cross-deploy observability (codex #15)
  - T-todo-4: sync-topology-aware cross-source signal (codex #11)
  - T-todo-5: replace doctor's global density with fire-rate (codex #14)

Verified the 3-line audit: VERSION + package.json + CHANGELOG topmost
all match 0.40.4.0. `bun install` ran (lockfile unchanged — root
package version isn't stored in bun.lock). `bun run build:llms`
refreshed llms.txt + llms-full.txt for the next commit.

Plan ref: T12 in v0.40.4.0 wave plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 TODO: document pre-existing shard-2 flake noticed during ship

3 isCacheSafe test failures in shard 2 reproduce on stashed clean
master. Confirmed pre-existing — not introduced by v0.40.4. Filed
under "Pre-existing flake on master (noticed during v0.40.4 ship)"
with reproduction commands + remediation options. Shipping v0.40.4
through it; future wave can fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 privacy scrub: replace wintermute → media in example slugs

CLAUDE.md line 550 bans the private OpenClaw fork name in public
artifacts. Example session prefix in sessionPrefix() docs + 3 test
fixtures swept to 'media/chat/...' instead. Pre-existing
scripts/check-privacy.sh in `bun run verify` caught it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 fix: wire graph_signals from mode bundle to runPostFusionStages

CRITICAL: pre-landing review (codex outside-voice via /ship Step 9)
caught that hybrid.ts's `postFusionOpts` literal at line 566 was
building PostFusionOpts WITHOUT threading `resolvedMode.graph_signals`
to `graphSignalsEnabled`. The gate at hybrid.ts:358 read the field
from a literal that never set it.

Result before this fix: the entire v0.40.4 graph-signals wave was
dead code in production. Mode bundles set
`balanced.graph_signals = true` and `tokenmax.graph_signals = true`,
but no production call site ever reached applyGraphSignals. The
KNOBS_HASH bump 3→4 correctly varied the cache key by the flag, so
contamination was prevented — but the feature itself never fired.

All shipped infrastructure (engine SQL, fail-open audit, attribution
stamps, --explain formatter, doctor coverage check, search-stats
section) was reachable only through the unit-test seam
(`opts.adjacencyFn`). The CHANGELOG-advertised behavior never
landed in user-visible search.

Fix: thread `graphSignalsEnabled: resolvedMode.graph_signals` into
the postFusionOpts literal (1 line). Inline comment names codex's
catch so future refactors see the regression class.

Tests: new test/search/graph-signals-wire-integration.test.ts pins
the wire end-to-end. Three cases:
  1. balanced mode → hybridSearch on a seeded brain with adjacency
     hub produces a result with base_score stamped (proves
     runPostFusionStages actually ran).
  2. search.graph_signals=false config override → no graph_* fields
     stamped (proves the gate honors the override path).
  3. Source-grep regression guard pinning the
     `graphSignalsEnabled: resolvedMode.graph_signals` literal in
     hybrid.ts so a future refactor can't silently disconnect.

All 57 existing v0.40.4 wave tests still pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 fix: pre-landing review AUTO-FIX findings (audit msg drift + deleted_at)

Two informational findings from /ship pre-landing review (Step 9):

1. Stderr message qualifier drift (rerank/slug-fallback/phantom audits)
   Pre-v0.40.4 messages included a per-feature qualifier:
     [gbrain] rerank-failure audit write failed (...)
     [gbrain] slug-fallback audit write failed (...)
     [gbrain] phantom audit write failed (...)
   The T2 refactor dropped the qualifier (plan promised "byte-identical"
   operator-visible behavior, but stderr lines did drift). Restored via
   new `errorMessagePrefix` option on `createAuditWriter` (optional, ''
   default). Three modules pass the per-feature qualifier; shell-audit
   and supervisor-audit unaffected (their pre-v0.40.4 messages didn't
   have a separate qualifier — label already carried the feature name).

2. Defense-in-depth `deleted_at IS NULL` on getAdjacencyBoosts
   SQL was previously protected by-construction (hybridSearch's
   visibility filter ensures input pageIds are live), but matches the
   v0.35.5.0 findOrphanPages pattern and closes the bug class if a
   future caller bypasses hybridSearch. Added to both Postgres and
   PGLite engines for parity. Three JOIN sites guarded (targets CTE,
   FROM-pages join). One inline comment per engine cites the codex
   review and the v0.35.5.0 precedent.

Plan ref: /ship pre-landing review v0.40.4.0 (codex finding C and F).

All 84 audit+graph-signals tests pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 fix: adversarial review HIGH findings (codex H1+H2 + Claude F1)

Three HIGH-severity issues from /ship adversarial pass:

H1 (Codex): Eval gate was a no-op.
  Test passed `graph_signals: graphSignalsOn` via `as any` cast, but
  SearchOpts had no field and hybridSearch's perCall didn't thread it.
  Both off/on branches resolved to the mode-bundle default — gate
  measured identical behavior, could pass while detecting nothing.

  Fix: add `graph_signals?: boolean` to SearchOpts (types.ts:794).
  Thread `opts.graph_signals` into perCall in both hybridSearch
  (hybrid.ts:425) AND hybridSearchCached (hybrid.ts:1027) so the
  cache-key resolver also sees the override. Drop the `as any` from
  the eval test — types are real now.

H2 (Codex): Session diversification fired on entity directories.
  sessionPrefix() used "any shared parent directory" as the session
  signal. Result: a search for "people in SF" returned `people/alice`
  + `people/bob` + `people/charlie` and the latter two got demoted
  to 0.95×. Every common entity-search query silently penalized
  legitimate same-type results. Default-on for balanced/tokenmax
  means production behavior was wrong.

  Fix: narrow sessionPrefix() to fire ONLY when the slug contains a
  session-like marker (`chat`/`session`/`sessions` segment OR a
  `YYYY-MM-DD` date segment). Entity directories (`people/`,
  `companies/`, `docs/`) return null → diversification skips.
  Returns NULL (not the slug itself) so the loop skips clean.
  Examples in JSDoc:
    your-agent/chat/2026-05-20-foo → 'your-agent/chat/2026-05-20-foo'
    daily/2026-05-20/journal-entry-1 → 'daily/2026-05-20'
    transcripts/chat/funding-discussion → 'transcripts/chat/funding-discussion'
    people/alice → null  ← codex H2 regression
    docs/quickstart → null

F1 (Claude adversarial subagent): case-sensitivity drift across 3 sites.
  loadOverridesFromConfig in mode.ts is case-insensitive +
  whitespace-trimmed for 'search.graph_signals' values. But
  doctor's checkGraphSignalsCoverage (doctor.ts:899) AND
  search-stats's readGraphSignalsStats (search.ts:288) used
  case-sensitive compare. User sets `search.graph_signals TRUE`:
  production enables the feature, but doctor + search-stats both
  silently report disabled. Operators lose the only observability
  surface for the new feature on values like 'True'/'TRUE'.

  Fix: trim + lowercase parity at both sites. Mirror the parser's
  semantic. Also case-normalized `search.mode` reads at both sites
  for the same divergence class.

Tests:
  - sessionPrefix block rewritten with 7 cases covering chat marker
    + date anchor + entity dirs (now-NULL) + degenerate (no /).
  - Added regression test pinning codex H2: people/alice +
    people/bob + people/charlie do NOT get diversified.
  - graph-signals-eval.test.ts drops `as any` — typed field works.
  - Existing tests using `chat/a`/`chat/b` updated to session-shaped
    `media/2026-05-20/chunk-a` so the date anchor actually fires.

111/111 graph-signals + doctor + search-stats tests pass. Typecheck clean.

Plan ref: /ship adversarial review v0.40.4.0 (codex H1, H2; Claude F1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.4.0 TODOs: capture 11 LOW adversarial findings for v0.41+

Codex L1 (audit window underreport) + Claude F2/F3/F5-F8/F11/F12/F14/F16
from /ship adversarial review. None are load-bearing; all captured under
'v0.40.4 adversarial review LOW findings — captured for v0.41+'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.40.4.0

- README: surface v0.40.4.0 graph signals + --explain in Hybrid search capability
- CLAUDE.md: annotate engine.ts getAdjacencyBoosts, new graph-signals.ts /
  explain-formatter.ts / audit/audit-writer.ts, plus hybrid.ts post-fusion
  4th stage, mode.ts graph_signals knob + KNOBS_HASH 3→4, cli-options.ts
  --explain flag, search stats + doctor coverage check
- llms-full.txt: regenerated from CLAUDE.md per the build:llms chaser rule

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ci): pin bun-version to 1.3.13 across all workflows

setup-bun action with `bun-version: latest` calls the GitHub API
(https://api.github.com/repos/oven-sh/bun/git/refs/tags) to resolve
the tag. CI started failing today with HTTP 401 "Bad credentials"
even though the action receives a token (visible as `token: ***`
in the run log). Pinning the version eliminates the API call
entirely.

Affected workflows: test.yml, e2e.yml, release.yml, heavy-tests.yml
(5 invocations total). Pinned to 1.3.13 — matches package.json
engines (`bun >= 1.3.10`) and the version v0.40.4.0 was developed
against.

Bump cadence: when a new bun version is required, update this
pin in one PR. Trading "always-latest" for "always-deterministic"
is the right trade for a 5-shard CI matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 27, 2026
…e before (#1521)

* feat(schema): migrations v98/v99/v100 for onboard wave (A6 A10 A11 A13 A25, codex #1 #9 #10 #11 #12)

Three schema additions supporting the gbrain onboard wave:

v98 — links.link_kind nullable column (A10, codex finding #12).
The NER extraction was originally going to add a new link_source='ner'
provenance, but that would have forced every existing link_source='mentions'
query (backlink-count filter, orphan-ratio, doctor checks) to update or
metrics would drift across the cutover. Instead: keep link_source='mentions'
for the storage layer AND add a nullable link_kind column. Three kinds:
'plain', 'typed_ner', NULL (legacy/unknown — semantically 'plain'). NOT in
the links UNIQUE constraint so the storage shape stays compatible.

v99 — timeline_entries dedup widening (A11, codex finding #11).
Pre-v99 dedup key was (page_id, date, summary). The new --from-meetings
extraction writes timeline entries with source='extract-timeline-from-
meetings:<meeting-slug>', and codex caught that two meetings with the same
date+summary on the same entity page would silently DO NOTHING — the
second meeting's provenance is lost. Widened to (page_id, date, summary,
source). Legacy rows (source='') preserve current dedup behavior.

v100 — migration_impact_log table + content_chunks_stale_idx partial
(A6 + A25 + A13 + codex findings #10 + #9). Bundled because both are
consumed by the onboard pipeline and ship together. Impact log captures
before/after metric stats so gbrain onboard --history shows real deltas;
attribution columns (job_id, source_id, brain_id, started_at,
idempotency_key) prevent concurrent runs misattributing to wrong
migrations. content_chunks_stale_idx partial WHERE embedding IS NULL
supports gbrain embed --stale + --priority recent (outer ORDER BY
p.updated_at DESC uses existing idx_pages_updated_at_desc via JOIN).
Plain NUMERIC columns; delta computed at read time (NOT a stored
GENERATED column per eng-review D2 — zero PGLite parity risk).

Slot history note: plan originally proposed v97/v98/v99 but master had
already used v95 (links 'mentions' CHECK widening), v96 (facts conversation
session index), and v97 (pages_dedup_partial_index) by ship time. Codex
caught the collision; renumbered to v98/v99/v100.

Test pin: test/schema-bootstrap-coverage.test.ts (100/100 migrations
apply clean on PGLite), test/migrate.test.ts (152 cases pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(remediation): extract doctor remediation library (A1, codex finding #2)

Pre-fix: src/commands/doctor.ts contained two CLI-shaped functions
(runRemediationPlan + runRemediate) with hardcoded argv parsing,
process.exit calls, and console.log emission. Onboard CLI shell and the
upcoming MCP run_onboard op couldn't compose against them — the plan
file's "100-LOC thin wrapper" assumption didn't survive codex's review
of the actual source.

Post-fix: src/core/remediation/ exports a library shape that all three
consumers (doctor CLI, onboard CLI, MCP run_onboard) wrap.

  src/core/remediation/types.ts
    RemediationPlanOpts, RemediationPlan, RemediationOpts,
    RemediationResult, StepResult, RemediationHooks (the observability
    seam — library never calls console.* itself).

  src/core/remediation/context.ts
    loadRecommendationContext moved verbatim from doctor.ts. Re-exports
    RecommendationContext from brain-score-recommendations.ts since
    that's still the canonical home for the type (consumed by
    computeRecommendations).

  src/core/remediation/plan.ts
    computeRemediationPlan(engine, opts): Promise<RemediationPlan>.
    Pure read; produces the stable JSON envelope downstream agents
    bind to. Pulls in computeRecommendations + classifyChecks +
    maxReachableScore behind one library entry point.

  src/core/remediation/run.ts
    runRemediation(engine, opts, hooks): Promise<RemediationResult>.
    Orchestrator with BudgetTracker, checkpoint resume, D5 dep
    cascade, D7 per-step recheck. Returns a result object instead
    of process.exit calls; the CLI shell maps result.budget_exhausted
    / .target_unreachable / .submitted to exit codes.

  src/core/remediation/index.ts
    Barrel for the three modules above.

doctor.ts is now a thin wrapper:
  runRemediationPlan: parse argv → computeRemediationPlan → human/JSON render
  runRemediate: parse argv → TTY confirm gate → runRemediation(hooks: console.*)
The TTY confirmation step deliberately stays in the CLI shell — the library
never asks for confirmation; that's a CLI concern.

Net: ~340 LOC removed from doctor.ts; ~470 LOC added across the library
module (with full JSDoc + per-A-decision rationale comments). Functional
behavior preserved bit-for-bit: 67 tests pass across doctor.test.ts +
v0_37_gap_fill.serial.test.ts.

The Lane E.4 source-text test (test/v0_37_gap_fill.serial.test.ts:329)
followed loadRecommendationContext to its new home at
src/core/remediation/context.ts — assertions otherwise unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(remediation): generalize computeRecommendations to accept extras (A2, codex finding #3)

Pre-fix: computeRecommendations at brain-score-recommendations.ts:170 was a
hardcoded planner for 5 synthetic check categories. Adding a Check.remediation
field to a new doctor check would NOT auto-wire into --remediation-plan —
the planner simply ignored it. Codex caught this when reviewing the plan's
"checks ARE specs" framing.

Post-fix: optional third arg `extraRemediations: RemediationStep[]` lets
callers inject step entries discovered outside the hardcoded planner. The
existing 5-category surface is preserved bit-for-bit; on id collision the
hardcoded entry wins, so an extra accidentally duplicating a hardcoded id
doesn't shadow legacy behavior.

RemediationPlanOpts gains the matching field; computeRemediationPlan in
src/core/remediation/plan.ts threads opts.extraRemediations through. The
4 new doctor checks (T4) will produce per-check helper functions that
return RemediationStep[]; onboard's render layer (T12) aggregates them
into the opts.extraRemediations slot. doctor's existing
--remediation-plan call passes empty (no behavior change for legacy CLI).

84 tests pass across brain-score-recommendations + doctor suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): 4 new onboard checks (embed_staleness, link_coverage, timeline_coverage, takes_count) (A16, T4)

Adds src/core/onboard/checks.ts: 4 check helpers + a runAllOnboardChecks
aggregator. Each helper returns {check, remediations}, so doctor pushes
the Check entry (for human/JSON rendering) AND onboard's plan path
collects the RemediationStep[] (via T3's new extraRemediations seam in
computeRecommendations).

embed_staleness: COUNT(*) on content_chunks WHERE embedding IS NULL.
  Cheap thanks to content_chunks_stale_idx partial (v100).
  warn at 1+ stale, fail at 1000+; remediation points at embed-catch-up
  handler (built in T6).

entity_link_coverage: fraction of entity pages with inbound links.
  Per A21 + codex #15: TABLESAMPLE BERNOULLI on PG when total_pages > 50K
  with pinned sample formula (LEAST 100, GREATEST 2, target ~5000 rows)
  AND ±sqrt(p(1-p)/n) confidence interval embedded in message
  ("coverage: 31% ± 1.3%") so warn/fail decisions show their margin of error.
  PGLite path: full scan (rare >50K).
  warn <70%, fail <40%; remediation points at extract-ner handler.

timeline_coverage: same TABLESAMPLE policy. warn <90%, fail <70%;
  remediation points at extract-timeline-from-meetings handler.

takes_count: COUNT(*) on takes table. Per A12 two-gate consent: the
  remediation only emits when `takes.bootstrap_enabled` config is true.
  Otherwise the check shows "0 takes (takes.bootstrap_enabled is false;
  opt in to enable)" without an autopilot-eligible remediation. Prevents
  unattended LLM-bearing extractions on brains that haven't opted in.

runDoctor wires runAllOnboardChecks at the end of the DB-checks block
(after stale_locks); fast-mode skipped to preserve --fast UX.

Thin-client parity (A16 spec) deferred to T16 — the MCP run_onboard op
will run these helpers server-side where engine.executeRaw works,
which is the real federated path. Adding them to doctor-remote.ts
would duplicate the logic without functional benefit since the helpers
are server-side queries.

55 doctor tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(engine): listStaleChunks --priority recent + executeRaw AbortSignal (A13/A20, codex #7 #9)

Two interface extensions on BrainEngine, with parity across postgres-engine
and pglite-engine. Plus a follow-on fix for v99's timeline_entries dedup
widening.

listStaleChunks gains:
  - orderBy?: 'page_id' | 'updated_desc' (default 'page_id' = legacy)
  - afterUpdatedAt?: string | null (composite cursor for updated_desc)

When orderBy === 'updated_desc' the query JOINs pages and orders by
  p.updated_at DESC NULLS LAST, p.id ASC, cc.chunk_index ASC
backed by idx_pages_updated_at_desc + content_chunks_stale_idx partial
(both indexes added in v100). The cursor "next row" semantic with DESC
NULLS LAST + ASC tiebreakers is:
  (updated_at < prev) OR
  (updated_at = prev AND page_id > prev_page_id) OR
  (updated_at = prev AND page_id = prev_page_id AND chunk_index > prev_chunk_index)
First page (afterUpdatedAt undefined AND afterPageId 0) bypasses the
cursor predicate. Both engines parity-tested via 100/100 pglite-engine
tests; Postgres path mirrors the same WHERE clause structure.

executeRaw gains:
  - opts?: {signal?: AbortSignal}

Postgres impl: real cancellation via postgres.js's .cancel() on the
pending query. Pre-aborted signal short-circuits before the network
round-trip; mid-flight abort fires .cancel(). The query throws on
abort which the caller catches.

PGLite impl: in-process WASM has no kernel-level cancellation.
Best-effort: pre-check, then race the query against a signal-rejection
promise. The query keeps running in WASM but the awaited result is
discarded (DOMException AbortError thrown). Documented gap.

ReservedConnection.executeRaw extends the signature for type
compatibility but doesn't wire the signal (its only callers are
migrations + cycle-lock writes that explicitly don't want cancellation).

V99 timeline dedup follow-on: the dedup widening in migration v99
changed the unique index from (page_id, date, summary) to
(page_id, date, summary, source). The ON CONFLICT clauses in both
engines' addTimelineEntriesBatch + addTimelineEntry impls were still
using the old 3-tuple, causing 12 PGLite tests to fail with SQLSTATE
42P10 "no unique constraint matching ON CONFLICT specification".
Updated all 4 sites (2 per engine) to the 4-tuple.

Typecheck clean, 100/100 PGLite engine tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(embed): --batch-size + --priority recent + --catch-up + embed-catch-up handler (A13)

CLI surface on gbrain embed gains 3 flags:
  --batch-size N       Override hardcoded PAGE_SIZE=2000 (clamped 1..10000)
  --priority recent    Walk stale chunks newest-first (page.updated_at DESC)
                       backed by content_chunks_stale_idx + idx_pages_updated_at_desc
                       via T5's listStaleChunks(orderBy='updated_desc') extension.
                       Composite cursor (updated_at, page_id, chunk_index).
  --catch-up           Removes the GBRAIN_EMBED_TIME_BUDGET_MS wall-clock cap;
                       loops until countStaleChunks() returns 0.

EmbedOpts gains matching fields; embedAll + embedAllStale plumb them through.
The cursor tracking in embedAllStale now advances (afterUpdatedAt, afterPageId,
afterChunkIndex) instead of just (afterPageId, afterChunkIndex) when in
'updated_desc' mode. The engine returns p.updated_at as Date|string; the
caller normalizes to ISO string for the next page's cursor.

New Minion handler `embed-catch-up` registered in jobs.ts. Wraps runEmbedCore
with stale=true + catchUp=true + the priority/batchSize the caller supplies.
NOT in PROTECTED_JOB_NAMES (embedding spend only — same posture as the
existing embed-backfill handler). Consumed by the gbrain onboard remediation
pipeline (T11) when embed_staleness check fires.

63 embed tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extract): NER link extraction via schema-pack inference.regex (A10, T7, codex #12)

NEW src/core/extract-ner.ts: extractNerLinks(engine, opts). Walks pages,
reuses the by-mention gazetteer, applies the active schema-pack's
link_types[].inference.regex patterns to assign a typed verb to each
mention ("CEO of Acme" + Acme is a company → 'works_at' linking the
source page to Acme).

Codex finding #12 design: do NOT split link_source='ner' as a new
provenance. NER is still mention-derived; splitting would break every
existing link_source='mentions' query (backlink-count, orphan-ratio,
doctor checks). Instead: keep link_source='mentions' AND set
link_kind='typed_ner' (v98 column).

LinkBatchInput type gains link_kind field. Both engines'
addLinksBatch impls add the column to the INSERT projection + unnest()
tuple (column #11). The links UNIQUE constraint excludes link_kind so
an existing plain mention row + a typed_ner row for the same (from, to,
type, source, origin) collide DO NOTHING; the typed link goes in as a
separate row with a DIFFERENT link_type (the inferred verb), so they
don't collide on the typical case.

CLI: `gbrain extract links --ner` (DB source only). Combined
`--by-mention --ner` walk shares ONE gazetteer build across both passes
— saves a full walk on big brains. Either flag alone runs its pass
solo. Each gets its own --source-id filter inheritance.

Minion handler: `extract-ner` (NOT in PROTECTED_JOB_NAMES — regex-only,
no LLM spend). Consumed by onboard's entity_link_coverage remediation
when coverage <70%.

Target-type lookup: one round-trip SELECT slug, source_id, type FROM
pages WHERE type IN ('person', 'company', 'organization', 'entity')
AND deleted_at IS NULL — built once at extraction start, consulted
per-mention. Avoids the N+1 getPage cost.

Pack best-effort: when no active pack OR no link_types declared OR
no inference.regex on any link_type, returns pack_unavailable=true and
0 created. CLI prints a one-line note; handler returns silently.

122 tests pass (pglite-engine + by-mention); typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extract): timeline from meetings — gbrain extract timeline --from-meetings (A11, T8, codex #11)

NEW src/core/extract-timeline-from-meetings.ts:
extractTimelineFromMeetings(engine, opts). Walks meeting pages, finds
discussed entities via two sources, writes a timeline entry on each
entity page.

Discussed-entity sources merged:
  1. Existing 'attended' links from the meeting (canonical attendees).
     One round-trip SELECT pulls all attended edges for the loaded
     meeting set; in-memory Map<meetingSlug → attendees[]> for O(1)
     lookup per meeting.
  2. Body-text mentions via the existing by-mention gazetteer
     (findMentionedEntities + cross-source guard). Catches entities
     discussed in the meeting body even when no explicit 'attended'
     link exists.

De-duped via Map<sourceId::slug → entity> within each meeting so a
person who's both an attendee AND mentioned in the body gets exactly
one timeline row per meeting, not two.

Timeline write uses TimelineBatchInput with:
  source = 'extract-timeline-from-meetings:<meeting-slug>'
  summary = 'Discussed in <meeting-title>'
  date = meeting.effective_date

Per v99 dedup widening (codex #11): the source field is now in the
uniqueness key (page_id, date, summary, source). Two meetings on the
same date with the same summary on the same entity page survive as
distinct rows — the second meeting's provenance is no longer silently
dropped.

CLI: `gbrain extract timeline --from-meetings` (DB source only). Mode
dispatch — runs SOLO (does not combine with --by-mention/--ner; those
are links passes).

Minion handler: `extract-timeline-from-meetings` (NOT in
PROTECTED_JOB_NAMES — pure SQL + string scan). Consumed by onboard's
timeline_coverage remediation when coverage <90%.

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(takes): takes-bootstrap from concept/atom/lore pages (A12, A24, T9)

NEW src/core/extract-takes-from-pages.ts: Haiku classifier loop. Walks
pages WHERE type IN ('concept','atom','lore','briefing','writing',
'originals') AND deleted_at IS NULL AND length(compiled_truth) > 200,
ordered by updated_at DESC. Each page is truncated to 20K chars and
sent to Haiku with a strict-JSON classifier prompt:
  {"claim", "kind": fact|take|bet|hunch, "weight": 0..1}

Inserts via addTakesBatch with source='cli:takes-bootstrap-from-pages'.

Two-gate consent per A12:
  1. `takes.bootstrap_enabled` config (default false) — even the manual
     CLI refuses without it explicitly set.
  2. --yes flag (CLI) — interactive confirmation that this sends content
     to Haiku.

The handler-side gate also reads takes.bootstrap_enabled, so even a
trusted local Minion submitter (allowProtectedSubmit=true) cannot
fire takes-bootstrap on a brain that hasn't opted in.

CLI: `gbrain takes extract --from-pages [--yes] [--dry-run] [--source-id X]
[--max-pages N] [--holder name]`. Surfaces consent-gate-blocked vs
llm-unavailable distinctly so users see the actual blocker.

Minion handler `extract-takes-from-pages` added to PROTECTED_JOB_NAMES.
Consumed by onboard's takes_count remediation when count=0 AND
takes.bootstrap_enabled=true (handler-side double-check).

Per A24: ships with classifier infrastructure ONLY. Per-prompt eval suite
deferred to v0.42.1 follow-up; autopilot remediation tier for takes-bootstrap
stays manual_only until eval coverage catches up. Manual `gbrain takes
extract --from-pages --yes` is the only path that triggers it in v0.42.0.

parseClaimsJson exported for unit testing — strict JSON parse + ```json
fence strip + kind allowlist filter, returns [] on any parse failure.

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): recordMinionJobSpend primitive for MCP client_id attribution (A7+A23, codex finding #4)

NEW src/core/minion-spend.ts: small primitive that closes the per-OAuth-
client spend chain gap codex flagged when MCP run_onboard submits child
Minion jobs.

Pre-fix: only subagent loops via budget-meter.ts recorded spend against
the originating OAuth client. Generic Minion handlers (embed-catch-up,
extract-ner, extract-timeline-from-meetings, extract-takes-from-pages)
wrote to the gateway with no per-client attribution — admin-scope tokens
would have unbounded indirect spend via the run_onboard fan-out.

Convention for v0.42.0 (deferred schema column to v0.42.1):
  - run_onboard MCP op sets job.data.client_id when submitting each
    child handler.
  - Handlers that spend LLM/embedding budget call
    recordMinionJobSpend(engine, job, {operation, spendCents, ...})
    which reads job.data.client_id and writes mcp_spend_log with
    the right attribution.
  - Local-submitted jobs (CLI, autopilot tick) pass no client_id;
    the row still lands with client_id=null for global accounting.

Two exports:
  getJobClientId(job): undefined for local jobs; the OAuth client_id
    string for MCP-submitted ones.
  recordMinionJobSpend(engine, job, entry): wraps recordSpend with
    job-aware attribution. Best-effort throughout — spend telemetry
    failures MUST NOT fail the user's call.

A23 full schema column (minion_jobs.client_id + index) deferred to
v0.42.1; today's JSONB-pass-through is sufficient for the MCP
run_onboard chain to land per-client attribution end-to-end. Handlers
adopt the primitive over time; no behavior change for callers that
haven't migrated.

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboard): impact capture module + writeImpactLogRow primitive (A6 + A25 + A17, T11)

NEW src/core/onboard/impact-capture.ts. Three exports:

captureMetric(engine, metric)
  Pure-ish: returns the current numeric value for one of 5 metrics
  (orphan_count, stale_count, entity_link_coverage, timeline_coverage,
  takes_count). Returns null on any throw per A17 best-effort posture
  — a stat-query failure MUST NOT block the extraction itself.

writeImpactLogRow(engine, attribution, metric, before, after, details?)
  Best-effort INSERT into v100's migration_impact_log table. Attribution
  columns (job_id, source_id, brain_id, started_at, idempotency_key,
  applied_by) per A25 + codex finding #10 so concurrent runs can't
  misattribute deltas.

withImpactCapture(engine, attribution, metric, runner, details?)
  Convenience: capture-before → run → capture-after → write log row.
  Per A17 the log row lands even when the runner throws (after-on-fail
  + error in details), so downstream consumers see a "ran but impact
  unknown" entry instead of silent loss.

Designed to be picked up by the 4 new Minion handlers (embed-catch-up,
extract-ner, extract-timeline-from-meetings, extract-takes-from-pages)
when they wrap their main runner. Handlers stay decoupled from the
log-write path — they just call withImpactCapture with the metric they
move. Per-handler integration follows in T12/T13/T15 as those wrappers
land.

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboard): types + render layer (A8, T12)

NEW src/core/onboard/types.ts: OnboardRecommendation (extends
RemediationStep with apply_policy + prompt_text + migration_id),
OnboardReport (stable JSON envelope), OnboardOpts.

NEW src/core/onboard/render.ts:
  toOnboardRecommendation(step): RemediationStep → OnboardRecommendation
    Sets apply_policy per A8 tiered rules:
      - protected + job === extract-takes-from-pages → 'manual_only' (A12/A24)
      - protected + other → 'prompt_required'
      - non-protected → 'auto_apply'
  buildOnboardReport(plan, opts?): assembles the stable JSON envelope.
  renderHuman(report): string. Echoes the "Recommendation + WHY" framing
    the CEO + Eng + Codex reviews settled on; CLI shell prints to stdout.

Stable JSON envelope shape:
  schema_version: 1
  brain_id?: string
  recommendations: OnboardRecommendation[]
  summary: { total, auto_eligible, prompt_required, manual_only,
             est_total_usd }
  history?: Array<{ remediation_id, metric_name, metric_before,
                    metric_after, delta, applied_at }>

Library-shaped — no console.* / process.exit. T13 (onboard CLI shell)
calls these from the wrapping CLI. MCP run_onboard (T16) returns the
JSON envelope unmodified.

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboard): gbrain onboard CLI shell (A1, T13)

NEW src/commands/onboard.ts (~180 LOC). Thin wrapper that composes:
  - T2 library (computeRemediationPlan + runRemediation)
  - T4 onboard checks (runAllOnboardChecks → extraRemediations)
  - T12 render layer (buildOnboardReport + renderHuman)

Three modes:
  --check    (default): print plan, no submission. Computes plan via
             T2 library with T4 check-derived extraRemediations.
             Renders human (default) or JSON envelope (--json).
  --auto:    submit auto_apply tier. Requires --max-usd N (cron-safety
             per A12 + A20 — refuses without explicit cap to avoid
             surprise spend).
  --auto --yes: also submit prompt_required tier.
  --history: dump last 50 migration_impact_log entries.

Library hooks wired into stderr (per CLI/library separation): onStepStart,
onStepEnd, onBudgetRefused, onBudgetExhausted, onNothingToDo,
onTargetUnreachable. Final JSON envelope (--json) or human summary
lands on stdout.

CLI dispatch: registered in src/cli.ts CLI_ONLY set + case dispatch
between 'takes' and 'founder'.

Typecheck clean. Manual smoke-test pending T20 E2E (DATABASE_URL gated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboard): init nudge + upgrade banner (A4, A18, A20, T14)

NEW src/core/onboard/init-nudge.ts exports two fail-open hooks:

runInitNudge(engine):
  Post-initSchema 5-query AbortSignal-bound parallel check against a
  3-second wallclock budget. Per A20: uses REAL cancellation via the
  T5 executeRaw signal extension — Promise.race against a timer was
  codex's #7 wrong shape. Postgres queries actually .cancel(); PGLite
  documented gap.
  Partial-results path: if some checks complete and the budget fires
  on others, prints what landed + a fallthrough hint pointing at
  `gbrain onboard --check` for the full picture.
  Per A18: fail-open — ANY throw is caught, logged to stderr, and
  suppressed so init returns successfully.
  Bypass: GBRAIN_NO_ONBOARD_NUDGE=1 short-circuits. Non-TTY default
  short-circuits too (CI/scripted callers see nothing).
  Nudge format: one-line summary of opportunities ("Brain has
  opportunities: 23000 stale chunks, link coverage 32%, 0 takes")
  + a 'gbrain onboard --check' nudge.

runUpgradeBanner(_engine):
  Lighter post-upgrade banner. Doesn't engine-query — just prints a
  one-line nudge that upgrades may surface new opportunities. Same
  fail-open posture.

Wired into:
  src/commands/init.ts:initPGLite (end-of-function, after reportModStatus)
  src/commands/init.ts:initPostgres (same)
  src/commands/upgrade.ts:runPostUpgrade (end-of-function, after
  postUpgradeReferenceSweep)

Each wire site uses dynamic import + try/catch so even an import
failure can't crash init/upgrade.

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(autopilot): tick consults onboard recommendations (A5, A19, A22, T15)

Pre-fix: autopilot tick's per-source recommendation walk called
computeRecommendations(health, ctx) — doctor's hardcoded 5-category
planner. The 4 new onboard checks (embed_staleness,
entity_link_coverage, timeline_coverage, takes_count) had nowhere to
hook in, so even with takes.bootstrap_enabled flipped on, autopilot
never noticed 0 takes and never proposed bootstrap.

Post-fix: tick body now ALSO calls runAllOnboardChecks(engine) and
threads the result's RemediationStep[] into the T3-generalized third
arg of computeRecommendations. The planner merges onboard's extras
with the legacy hardcoded entries (hardcoded wins on id collision).

Per A19 fail-open: any throw in the onboard-checks path is caught,
logged to stderr, and suppressed. The legacy plan (without extras)
runs as before — autopilot can't crash from an onboard-check failure.

A22 (idempotency-key dedupe across concurrent manual + autopilot
runs): inherits from the existing computeRecommendations →
remediation.idempotency_key chain. T7-T9 handlers each get their
content-hash key from the makeRemediationStep factory; an autopilot
tick + a manual `gbrain onboard --auto` submitting the same step
in the same brain produce the SAME key, so queue.add(...) dedupes.

No behavior change for brains where all 4 onboard metrics already
look healthy (extras=[]; legacy plan unchanged).

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(mcp): run_onboard op with run_protected_onboard scope binding (A7, T16, codex finding #5)

NEW MCP op `run_onboard`. Admin scope (NOT localOnly) so federated /
thin-client brain installs can probe brain health + submit auto-eligible
remediation handlers over OAuth-authenticated MCP.

Two-tier authorization per A7 + codex #5:
  - Admin scope: sufficient for mode='check' (read-only OnboardReport JSON)
    AND for submitting non-protected handlers in mode='auto'/'auto-with-prompt'.
  - run_protected_onboard scope (NEW, additive): MUST be granted in
    addition to admin for any PROTECTED_JOB_NAMES handler to fire
    (synthesize, patterns, consolidate, extract-takes-from-pages,
    contextual_reindex_per_chunk).

Without the new scope tier, an admin-scoped OAuth token would silently
bypass the same protected-name gate `submit_job` enforces at
operations.ts:2288. The codex finding #5 caught this: admin scope alone
was insufficient guard. Now the run_onboard op explicitly FILTERS
protected extras from the recommendation plan when the caller lacks
run_protected_onboard; filtered items appear in the response as
skipped_missing_scope[] so the caller knows what would have been
available with the right grants.

Modes:
  check               — read-only OnboardReport JSON envelope.
  auto                — submits auto_apply tier (plus prompt_required
                        when --yes/auto-with-prompt).
  auto-with-prompt    — adds prompt_required tier.

Both auto modes REQUIRE max_usd per A12 + A20 cron-safety (rejects
with invalid_params if missing).

Per A26 source-scope: future extension will scope plans by ctx.sourceId
/ ctx.auth.allowedSources. Today the recommendation planner is
brain-wide; the source-scope thread doesn't change correctness, just
optimization.

Per A19 fail-open: any error in runAllOnboardChecks during plan-build
caught + suppressed; the plan still returns with extras=[] rather than
crashing the op.

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(verify): add check-source-scope-onboard lint (A26, T17)

NEW scripts/check-source-scope-onboard.sh. Grep guard for SQL sites in
onboard surfaces (src/core/onboard/, src/commands/onboard.ts) that
touch source_id-bearing tables (pages, content_chunks, takes, links,
timeline_entries) WITHOUT either:
  (a) source_id / sourceIds in the WHERE clause, OR
  (b) the opt-out marker `sourcescope:brain-wide` within 4 lines above
      the SQL.

File-level opt-out: `sourcescope:file-brain-wide` in the file header
(first 30 lines) treats every SQL site in that file as intentionally
brain-wide. Used by onboard/checks.ts, onboard/impact-capture.ts, and
commands/onboard.ts because the onboard CHECKS are explicitly brain-wide
aggregates (orphan_count, stale_count, link_coverage are reported
across all sources by design).

Wired into bun run verify (23 checks total now, all green).

Without this gate, any future onboard SQL touching per-source data
without source-scoping would silently leak rows across sources —
exactly the class of bug v0.34.1's P0 seal closed at the engine layer.
The lint adds an explicit forcing function for new code in the onboard
surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(install): onboard surface agent prescription (D13, T18)

Adds a v0.42.0+ section to INSTALL_FOR_AGENTS.md describing:
  - First-connect probe: gbrain onboard --check --json
  - Post-upgrade re-probe (after gbrain upgrade)
  - Unattended remediation: gbrain onboard --auto --max-usd 5
  - MCP run_onboard op for federated/thin-client installs
  - run_protected_onboard scope requirement for LLM-bearing handlers
  - Two-gate consent for takes-bootstrap (takes.bootstrap_enabled + --yes)
  - GBRAIN_NO_ONBOARD_NUDGE=1 bypass for CI

Per D13: agents should run --check on first connect AND after every
upgrade as a hygiene step. The autopilot path makes this auto-improve
on a 24h cycle; the explicit agent probe surfaces opportunities
immediately on connect rather than waiting for the next autopilot tick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): hermetic onboard surface contracts (T20)

NEW test/e2e/onboard-full-flow.test.ts. 13 hermetic PGLite cases
(no DATABASE_URL needed) covering the key onboard contracts:

  captureMetric — all 5 metrics return expected values on empty brain
    (0 for counts; 1 for coverage = vacuous truth).

  runAllOnboardChecks — returns exactly 4 results with correct names;
    empty brain shows stale/link/timeline ok BUT takes_count warns
    (0 takes); 0 remediations emitted because takes.bootstrap_enabled
    defaults to false per A12 two-gate consent.

  computeRemediationPlan — extras (T3 generalization) thread through to
    plan.plan output; stable schema_version: 2 envelope.

  buildOnboardReport — stable schema_version: 1 envelope with the right
    summary fields populated.

  toOnboardRecommendation tier policy (A8):
    - non-protected job → auto_apply
    - extract-takes-from-pages → manual_only (A12 + A24)
    - other protected jobs (synthesize, patterns, ...) → prompt_required

Full DATABASE_URL-gated end-to-end (real Postgres, actual extractions
through Minion handlers) deferred to v0.42.1 once the per-handler test
seam lands; the hermetic suite covers the data-shape contracts that
matter for downstream consumers binding to the JSON envelopes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.42.0.0 gbrain onboard mega PR — activation surface (closes #1383, completes #1409)

VERSION + package.json bumped to 0.42.0.0. CHANGELOG with full ELI10 lead
+ "What you can do that you couldn't before" itemized list + "To take
advantage of v0.42.0.0" upgrade steps per CLAUDE.md voice rules.

TODOS.md: 9 follow-up items filed (TODO-A through TODO-I) for the
v0.42.1+ wave: pack-aware linkable types, LLM-disambiguation NER,
onboard --explain, live-brain impact measurement, 100+-case takes
classifier eval, admin SPA UI, full DATABASE_URL E2E, minion_jobs
client_id schema column, thin-client doctor-remote parity.

llms-full.txt regenerated per CLAUDE.md rule (every CHANGELOG edit
followed by bun run build:llms in the same commit).

23/23 verify checks pass.

Full implementation across 21 commits on this branch (T0-T21):
  T0  merge master
  T1  schema migrations v98/v99/v100
  T2  extract doctor remediation library
  T3  generalize computeRecommendations
  T4  4 new doctor checks
  T5  engine API: listStaleChunks orderBy + executeRaw AbortSignal
  T6  embed --batch-size / --priority recent / --catch-up
  T7  NER extraction + extract-ner handler
  T8  timeline-from-meetings + extract-timeline-from-meetings handler
  T9  takes-bootstrap + extract-takes-from-pages handler
  T10 recordMinionJobSpend primitive
  T11 impact capture module + writeImpactLogRow
  T12 onboard render layer (types + render)
  T13 gbrain onboard CLI shell
  T14 init nudge + upgrade banner
  T15 autopilot tick consults onboard
  T16 MCP run_onboard + run_protected_onboard scope
  T17 check-source-scope-onboard lint
  T18 INSTALL_FOR_AGENTS.md agent prescription
  T20 hermetic PGLite E2E (13 cases)
  T21 ship (this commit)

Reviews: CEO + Eng + Codex on plan
~/.claude/plans/system-instruction-you-are-working-lively-hollerith.md.
27 A-decisions locked; 18 codex findings absorbed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): connection-resilience regex + doctor warn-not-fail + v0.41.18.0

Two CI fixes from PR #1521 + version renumber per user request.

Why fix #1 (connection-resilience.test.ts): T5/A20 extended
PostgresEngine.executeRaw signature to accept an optional
`opts?: { signal?: AbortSignal }` 3rd arg and rewrote the body as
multi-line. The regression test's regex was anchored to the legacy
single-line `(sql: string, params?: unknown[])` shape and the
assertions banned `try {` / `catch` (which T5 legitimately added for
AbortSignal cancellation swallow, NOT for retry). Updated regex to
tolerate both shapes; replaced the wrong `not.toContain('conn.unsafe(
sql, params')` assertion (which incorrectly flagged the legitimate
single call) with a count assertion: `conn.unsafe(` must appear
exactly ONCE in the body. Preserves the original D3 intent (no
per-call retry — recovery is supervisor-driven via reconnect()) while
accepting the new try/catch shape that swallows AbortSignal aborts.

Why fix #2 (src/core/onboard/checks.ts): Three of the four new
onboard doctor checks (entity_link_coverage, timeline_coverage,
embed_staleness) emitted `status = 'fail'` on healthy DBs that simply
hadn't run extractions yet. This flipped `gbrain doctor`'s exit code
to non-zero on freshly initialized brains, breaking
test/e2e/mechanical.test.ts:1280 ("gbrain doctor exits 0 on healthy
DB"). Downgraded all three to `status = 'warn'` — these are
remediation opportunities, not assertion failures. Doctor exit
codes are reserved for actual failures; remediation surfaces use
warn-level signaling so they can be picked up by `--remediate`
without polluting the exit code.

Why fix #3 (version renumber 0.42.0.0 → 0.41.18.0): Per user
directive, this wave ships as v0.41.18.0 rather than v0.42.0.0.
Master is at 0.41.16.0; 0.41.17.0 is reserved for an in-flight
wave. Renamed every reference my branch added (54 files touched):
VERSION, package.json, CHANGELOG.md header, TODOS.md, plus inline
version-stamp comments across src/, test/, and scripts/. Preserved
13 files with PRE-EXISTING `v0.42.0.0` references on master (from
earlier waves originally planned for v0.42 that landed at v0.41.x —
those stay as historical record). Verified via per-file diff against
origin/master: every renamed reference is one I added in this branch.

Audit trio aligned: VERSION=0.41.18.0, package.json=0.41.18.0,
CHANGELOG topmost entry=[0.41.18.0]. llms-full.txt regenerated to
match CLAUDE.md updates.

Bisect contract: this commit fixes CI test failures from PR #1521's
landing. Typecheck clean; connection-resilience suite 26/26 pass.

Refs A20 (executeRaw AbortSignal), A16 (4 new onboard checks),
codex #1 (master collision avoidance via renumber).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant