feat: add gbrain check-update command and auto-update agent workflow#15
Merged
Conversation
Deterministic collector that checks GitHub Releases for new versions, compares semver (minor+ only, skips patches), and fetches changelog diffs. Exports `detectInstallMethod()` from upgrade.ts for reuse. Includes 15 unit tests covering version comparison, CLI wiring, and error handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exercises check-update CLI end-to-end: valid JSON output, human-readable mode, help text, graceful no-releases handling, and version comparison wiring. Skips gracefully when network is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full agent playbook for the update lifecycle: check, notify, consent, upgrade, skills refresh, schema sync, report. Includes standalone self-update for skillpack-only users via version markers and raw GitHub URL fetching. Adds version markers to both SKILLPACK and RECOMMENDED_SCHEMA headers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ons dir Adds step 7 to the OpenClaw install paste (default-on update checks). Setup skill gets Phase G (conditional offer for manual installs) and schema state tracking via ~/.gbrain/update-state.json. Creates skills/migrations/ directory for version-specific upgrade directives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds E2E test DB lifecycle instructions (spin up, run, tear down). Documents version migration convention (skills/migrations/v[version].md) and schema state tracking (~/.gbrain/update-state.json). Updates test file counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The version range check compared minor versions without guarding on major being equal, causing incorrect changelog entries to be captured (e.g., v0.5.0 would match when upgrading from v1.2.0). Extracted semverGt/semverLte helpers for correct comparisons. Added 5 tests for extractChangelogBetween covering cross-major, same-version, and malformed input cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 15, 2026
…ederated_read + 3 more (#996) * fix(mcp): skip stdin EOF handlers when MCP_STDIO=1 OpenClaw's bundle-mcp gateway and similar wrappers pipe the JSON-RPC handshake on stdin then close their stdin half. Pre-fix, both stdin 'end' and 'close' listeners (server.ts:65-66 and serve.ts:204-206) treated this as a permanent disconnect and shut the server down before the first tool call arrived. Guard both sites with `process.env.MCP_STDIO !== '1'`. Signal handlers (SIGTERM/SIGINT/SIGHUP), transport.onclose, and the parent-process watchdog still cover legitimate shutdown paths. The serve.ts site threads the env read through an injectable `mcpStdio?: boolean` on ServeOptions so tests stay isolated (no process.env mutation per scripts/check-test-isolation.sh R1). Tests: 3 new cases in test/serve-stdio-lifecycle.test.ts pin the guard's invariants — mcpStdio=true must NOT trigger shutdown on stdin EOF, signals must still drive shutdown with mcpStdio=true, and mcpStdio=false (default) preserves existing CLI behavior. 25/25 pass. Origin: PR #870. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): honor token_endpoint_auth_method=none for PKCE public clients RFC 7591 §3.2.1: when a DCR client declares token_endpoint_auth_method="none" (PKCE-only public clients like Claude Code, Cursor), the authorization server MUST NOT issue a client_secret. Pre-fix, registerClient unconditionally minted a secret, and the MCP SDK's clientAuth middleware then rejected valid public-client flows on /token because it expected client.client_secret to match. Three changes to src/core/oauth-provider.ts:registerClient: - Gate clientSecret generation on isPublicClient = (auth_method === 'none'). Public clients store client_secret_hash = NULL. - Omit client_secret from the response payload for public clients. Confidential clients (default client_secret_post and explicit client_secret_basic) keep their existing one-time-reveal shape. - Normalize NULL secret_hash to JS undefined in getClient so SDK middleware (which checks client.client_secret === undefined, not === null) correctly identifies public clients and skips the secret-comparison branch on /token. Schema is already permissive (client_secret_hash TEXT, no NOT NULL on both src/schema.sql and src/core/pglite-schema.ts) — no migration needed. Tests: 5 new cases in test/oauth.test.ts pin: - public client → no client_secret in response (#11 from plan) - default auth_method → secret unchanged (regression guard) - explicit client_secret_post → secret unchanged - getClient NULL→undefined normalization - PKCE full /authorize → /token end-to-end with no secret (#15 from plan) 69/69 oauth.test.ts cases pass. typecheck clean. Origin: PR #909. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(serve-http): --bind HOST, default to loopback (127.0.0.1) Adds `gbrain serve --http --bind <interface>` to control which network interface the HTTP MCP server listens on. Default flipped from `0.0.0.0` (pre-v0.34) to `127.0.0.1` (v0.34.0+). Why the flip: gbrain's primary use case is a personal-knowledge brain on a laptop. The previous default exposed brains on every interface — one accidental `--http` invocation away from publishing the brain to a LAN. Server operators who need remote access pass `--bind 0.0.0.0` (or a specific interface). Codex's outside-voice on the original PR #864 correctly flagged that the additive flag wasn't actually the fix; the default needed to change for the safety claim to hold. If `--public-url` is set but `--bind` is unset, runServeHttp prints a loud stderr WARN at startup recommending `--bind 0.0.0.0`. Declaring a public URL while quietly binding loopback is almost always a misconfiguration; we want the operator to see it on first start, not silently fail remote requests. Startup banner now includes a `Bind:` row so the listening interface is visible alongside Port / Engine / Issuer. Origin: PR #864, extended with D11 (default flip) per /plan-eng-review codex outside-voice review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mcp): seal source-isolation leak on read path (P0) Pre-fix, an authenticated OAuth MCP client scoped to source-A could enumerate source-B pages via six read-side ops: search, query (text AND image paths), list_pages, traverse_graph, and find_experts. The v0.31.8 source-scoping pattern shipped through dispatch.ts but the op handlers never threaded ctx.sourceId into their engine calls, and hybridSearch.ts:223's explicit SearchOpts rebuild dropped sourceId even when callers passed it. Sealing the leak: - src/core/operations.ts adds sourceScopeOpts(ctx), the canonical precedence ladder: ctx.auth.allowedSources (federated) wins over ctx.sourceId (scalar) wins over nothing. Threaded into all 5 read-side op handlers + the query-image-path searchVector call (the 6th leak surface codex caught in plan review). - src/core/search/hybrid.ts:223 now threads sourceId + sourceIds fields through the inner SearchOpts rebuild. The explicit pick shape is preserved (HNSW inner-CTE ordering depends on it) but extended. - src/core/types.ts adds sourceIds?: string[] to SearchOpts + PageFilters (D9: federated read needs array-shaped engine filter or fan-out; array wins for hot retrieval). - src/core/operations.ts AuthInfo gains sourceId + allowedSources (D2: identity surface symmetric with the federated_read column #876 will add). - Both engines now apply WHERE source_id = $N (scalar) or = ANY($N::text[]) (array) at the SQL layer for searchKeyword, searchKeywordChunks, searchVector, listPages, traverseGraph, traversePaths. Array form wins when both are set. The searchVector filter pushes into the inner HNSW CTE (codex flagged this placement during plan review). - traverseGraph + traversePaths signatures gain opts.sourceId + opts.sourceIds; engine.ts interface updated. - findExperts (the whoknows op, D3 5th leak surface) accepts sourceId + sourceIds and threads them into its internal hybridSearch call. PR #861 was authored before v0.33 shipped so this op wasn't covered in the original PR. Auth wiring: - GBrainOAuthProvider.verifyAccessToken populates AuthInfo.sourceId from oauth_clients.source_id. JOIN guarded by isUndefinedColumnError so pre-v55 brains degrade to legacy projection rather than refusing every token verification. - GBrainOAuthProvider.registerClientManual gains a sourceId parameter (defaults to 'default'). DCR registerClient also sets source_id='default' on the inserted row. - serve-http.ts:929 cleanup: AuthInfo.sourceId is now a real typed field. The cast + GBRAIN_SOURCE env fallback chain is gone (D13). Legacy bearer tokens default to 'default' source in verifyAccessToken. - http-transport.ts (legacy access_tokens path) threads sourceId='default' through DispatchOpts so v0.22.7 callers stay source-scoped. - auth.ts CLI adds --source flag to gbrain auth register-client. Migration v55 (D10 + D13): - ALTER TABLE oauth_clients ADD COLUMN source_id TEXT (nullable). - Backfill UPDATE source_id = 'default' WHERE source_id IS NULL — preserves v0.33 effective behavior verbatim for legacy clients. - ADD CONSTRAINT FK ... REFERENCES sources(id) ON DELETE SET NULL, wrapped in DO block so re-runs against fresh-install brains (where the FK already lives inline in SCHEMA_SQL) no-op cleanly. - CREATE INDEX idx_oauth_clients_source_id WHERE source_id IS NOT NULL for the verifyAccessToken JOIN. - GBRAIN_ACCEPT_SILENT_WIDEN env-flag wired through the runner via SET LOCAL gbrain.accept_silent_widen — reserved for future migrations that hit the silent-widen footgun codex flagged. This migration doesn't need it (column is brand new; no pre-existing stale values possible by definition). - src/core/pglite-schema.ts + src/schema.sql include the column + FK + index inline for fresh installs. Tests: new test/e2e/source-isolation-pglite.test.ts with 13 regression cases — one per leak surface (search/list_pages/traverse/etc.) plus explicit AuthInfo.sourceId and AuthInfo.allowedSources op-handler threading checks. Full unit suite: 6034 pass / 0 fail. PGLite initSchema time dropped from 2.4s to 850ms after consolidating v55's DO blocks (multiple DO blocks were slow on PGLite; one DO block for the FK install only is fine). Origin: PR #861 + plan-eng-review decisions D2/D3/D4/D9/D10/D13 + F2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(gateway): multimodal embedding for openai-compatible providers Pre-fix, embedMultimodal hardcoded a recipe.id === 'voyage' branch and threw AIConfigError for every other recipe. Multimodal-capable providers fronted by LiteLLM (or any openai-compatible proxy) were unreachable even when the operator had wired up the model. The fix: - src/core/ai/gateway.ts adds embedMultimodalOpenAICompat() that POSTs to the standard /embeddings endpoint with content arrays carrying image_url entries. Routing comes from the existing recipe.implementation switch — Voyage stays on its own /multimodalembeddings path; every other openai-compatible recipe flows through the new helper. - src/core/ai/recipes/litellm-proxy.ts declares supports_multimodal: true so embedMultimodal accepts the recipe. No multimodal_models allow-list: LiteLLM is a passthrough proxy and the user owns model-id selection; provider rejection (400 from upstream) is the right enforcement layer there. Voyage's static allow-list shape stays unchanged (its 12 models share supports_multimodal but only one is multimodal-capable). - D12 runtime dimension validation: the new helper checks the returned vector length against the recipe's declared default_dims (preferred) or the brain's embedding_dimensions config. Mismatch throws AIConfigError with model id + observed + expected so the operator can swap models or rebuild the column. Pre-fix, a wrong-dim response would surface as a cryptic pgvector "vector dimension mismatch" at INSERT time. - Auth resolution routes through the existing defaultResolveAuth helper so optional-auth recipes (LiteLLM proxy with no LITELLM_API_KEY) and required-auth recipes both share one code path. Optional-auth sends "Authorization: Bearer unauthenticated" which servers like Ollama / llama-server ignore but the SDK contract requires. Tests: 11 new cases in test/openai-compat-multimodal.test.ts cover happy-path, multi-input batching, unauthenticated proxy, D12 dim mismatch + default-dim fallback, 401 / 400 / malformed-JSON / non-array error paths, and an explicit Voyage-regression test pinning that the new openai-compat route doesn't accidentally hijack the Voyage path. All 41 multimodal-related tests pass (existing voyage suite + new). typecheck clean. Origin: PR #875 + plan-eng-review D12 (runtime dim validation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): federated_read read scope (#876) Pre-fix, OAuth clients had a single source-scope axis (source_id, added in v55). A client could either write+read one source OR be a super-reader across all sources (via NULL source_id). There was no middle ground — WeCare-style L3 dept clients that need to write to dept-x but read dept-x + parent canon + shared canon had no expression. #876 adds federated_read TEXT[] as an orthogonal read-scope axis. source_id is the WRITE authority; federated_read is the READ authority. They default to matching values (read scope == write scope, the pre-v0.34 default) when a client is registered without an explicit federated read list. Migrations v56-v60 (six new migrations on top of v55): - v56: ALTER TABLE ... ADD COLUMN federated_read TEXT[] NOT NULL DEFAULT '{}'. - v57 (F5): explicit CASE backfill so source_id IS NULL → '{}' (not an array containing NULL — codex caught this ambiguity during plan review). - v58: post-backfill validation. Fails loud if any row's source_id isn't in its federated_read array, pointing at a logic bug in v57 if fired. - v59: flip the source_id FK from ON DELETE SET NULL to ON DELETE RESTRICT now that federated_read provides the alternative scope-loss path. Pre-flip, deleting a source could silently widen any oauth_client to super-reader; post-flip, source delete is refused if any client references it (operator must revoke/re-scope first). - v60: GIN index on federated_read for array-containment queries. Auth wiring: - GBrainOAuthProvider.verifyAccessToken JOINs c.federated_read and populates AuthInfo.allowedSources. Pre-v56 / pre-v55 brains degrade via the existing isUndefinedColumnError fallback chain. - registerClientManual gains a federatedRead?: string[] parameter (defaults to [sourceId]). - DCR registerClient sets source_id='default' + federated_read=['default'] on the inserted row. - auth.ts CLI adds --federated-read SRC1,SRC2,... flag. The register-client output now prints "Federated reads:" so operators confirm the scope they set. Engines consume the federated array through the SearchOpts.sourceIds / PageFilters.sourceIds field that #861 added (no engine changes here — the plumbing was D9). sourceScopeOpts in operations.ts already prefers the auth.allowedSources array over scalar ctx.sourceId when set. Test seam: - test/book-mirror.test.ts now spawns the CLI with GBRAIN_HOME pointed at a tempdir so the test isn't sensitive to the developer's local ~/.gbrain/config.json. Pre-fix the test could silently inherit a real Postgres connection and hang past the default 5s test timeout. Fresh GBRAIN_HOME → "No brain configured" → exit 1 in <1s. - test/e2e/source-isolation-pglite.test.ts gains one more regression case: AuthInfo.allowedSources = [] (explicit empty) MUST NOT widen scope to "all sources" — the silent-widen footgun precedence ladder. - test/openai-compat-multimodal.test.ts is part of the wave's commits via the migrate.ts changes that bump the schema chain. typecheck-only fix on a captured-auth type was already in #875's tree. 6045 unit tests pass / 0 fail. typecheck clean. PGLite initSchema runs v55-v60 in ~786ms total (within the test-harness budget for tests using the canonical beforeAll engine pattern). Origin: PR #876 + plan-eng-review F5 (CASE backfill). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.34.0.0: MCP fix wave (#870 #909 #864 #861 #875 #876) VERSION + package.json + CHANGELOG bump for the six-PR MCP fix wave. Schema chain extends from v54 → v60; oauth_clients gains source_id + federated_read columns; auth'd MCP clients now stay inside their scope across all read-side ops; PKCE-only DCR works; --bind defaults to loopback; LiteLLM multimodal embedding ships. Contributed by @Hansen1018 (#870), @ding-modding (#909), @DukeDawg (#864), @toilalesondev (#861 + #876), @yoelgal (#875). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.34.0.0 Sync README, CLAUDE.md, SECURITY.md, docs/architecture/topologies.md, and docs/mcp/DEPLOY.md to reflect the v0.34.0.0 MCP fix wave: - README: document --bind HOST default (loopback), --source + --federated-read register-client flags, PKCE public-client gate - SECURITY.md: note loopback-by-default for serve --http, update the trust-proxy contract to point at the new default - CLAUDE.md: annotate operations.ts (sourceScopeOpts helper), oauth-provider.ts (verifyAccessToken JOIN + PKCE public clients), serve-http.ts (--bind flag), gateway.ts (openai-compat multimodal + dim validation), mcp/server.ts (MCP_STDIO guard), auth.ts (--source + --federated-read), migrate.ts (v58-v63 chain), engine.ts (sourceIds field). Add 4 new test-file entries for source-isolation-pglite, openai-compat-multimodal, serve-stdio-lifecycle, oauth.test.ts PKCE cases - docs/architecture/topologies.md: source-scoped register-client example, --bind 0.0.0.0 for thin-client host setup - docs/mcp/DEPLOY.md: --bind explanation in the ngrok section, source-scoped client recipe - llms-full.txt: regenerated per the CLAUDE.md-edit chaser rule Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump v0.34.0.0 → v0.34.1.0 Renumbering the MCP fix wave from v0.34.0.0 to v0.34.1.0 so the release slot lands between master's v0.33.2.1 and the next minor. Touches every release-artifact mention: - VERSION: 0.34.0.0 → 0.34.1.0 - package.json: same - CHANGELOG.md header + "To take advantage" block - CLAUDE.md key-files annotations (8 entries that document this wave) - llms-full.txt (regen from CLAUDE.md) - README.md / SECURITY.md / docs/architecture/topologies.md / docs/mcp/DEPLOY.md - Wave code-comment markers ("// v0.34.0 (#NNN):" → "// v0.34.1 (#NNN):") Test files renamed alongside since they were committed with the wave. Commit subjects on the original 6 PR commits + the v0.34.0.0 bump commit (4f533c7 → 6b47db7) intentionally NOT rewritten — those are history. `git log` finds the implementation by message subject, not by version tag. 6275 unit tests pass, typecheck clean, migration chain v58-v63 unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 19, 2026
T10 of brain-health-100 wave — load-bearing decision-pinning tests. test/brain-score-recommendations.test.ts (22 cases): - Healthy brain → empty plan - Per-component remediation paths (sync, embed, backlinks, extract) - depends_on wiring (extract → sync; embed → sync when stale) - Severity ordering (critical > high > medium > low) - D6 #5 determinism: same input twice → byte-identical output - D9 idempotency keys: content-hash format, no time-slot - D9 source isolation: different --source → different key - D13 status field always 'remediable' in output - +A cost-estimate populated for embed - classifyChecks: remediable / blocked / human_only triage - maxReachableScore: all-remediable → 100; all-blocked → current test/op-checkpoint.test.ts (20 cases): - fingerprint stability + key-order invariance (canonical-JSON) - codex #11: extract links vs timeline get different fingerprints - codex #12: reindex markdown vs code get different fingerprints - codex #15: embed model+dim variation produces different fingerprints - reindex chunker_version bump invalidates checkpoint - DB round-trip (load → record → load) - Cross-fingerprint isolation (linksKey vs timelineKey) - clearOpCheckpoint idempotency on missing rows - resumeFilter purity (no I/O, deterministic) - purgeStaleCheckpoints TTL respect 42 new tests, all pass. PGLite engine + resetPgliteState pattern per CLAUDE.md test-isolation guide. Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (T10 + D6 #5 + D9 + D12 + D13 + codex #11/#12/#15). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
7 tasks
garrytan
added a commit
that referenced
this pull request
May 19, 2026
…--remediate + Minions (#1193) * feat(schema): op_checkpoints table + doctor_run_id partial GIN (v67+v68) T1 of brain-health-100 wave. Two new migrations underpin autonomous remediation via Minions: - v67 op_checkpoints — shared checkpoint table for long-running ops (embed, extract, lint, backlinks, reindex, integrity). Pre-fix each op had its own file-backed checkpoint or none. PRIMARY KEY (op, fingerprint) lets `extract links` and `extract timeline` (or `reindex --markdown` vs `--code`) coexist without colliding on shared keys. - v68 minion_jobs_doctor_run_id_idx — partial GIN on `minion_jobs.data WHERE data ? 'doctor_run_id'`. Indexes only doctor-submitted jobs so audit-trail queries don't sequential-scan months of unrelated cron history. PGLite skips via empty sqlFor. Applied to src/schema.sql + src/core/pglite-schema.ts so both engines get the table on fresh-install. Bootstrap coverage test + 122-case migrate test both pass. Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (D12 + folded scope B from outside-voice review). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(core): op-checkpoint module — DB-backed checkpoint primitive T2 of brain-health-100 wave. Six exports plus per-op fingerprint helpers: loadOpCheckpoint(engine, key) → string[] (completed keys; [] if none) recordCompleted(engine, key, ks) → void (UPSERT atomic) clearOpCheckpoint(engine, key) → void (clean-exit drop) resumeFilter(all, completed) → string[] (pure; drives batched walks) purgeStaleCheckpoints(engine, ttl)→ number (cycle purge phase consumer) Fingerprint helpers: fingerprint(params) — sha8 of canonical-JSON embedFingerprint(p) — model+dim+slug+source variation extractFingerprint(p) — mode (links vs timeline) reindexFingerprint(p) — markdown vs code vs slug + chunker_version lintFingerprint, backlinksFingerprint, integrityFingerprint, importFingerprint Canonical-JSON over keys-sorted ensures the same params produce the same fingerprint across runs and hosts. sha8 (8 hex chars from sha256) is short enough for filenames + UI but collision-resistant for the expected per-op invocation diversity. DB-backed for both engines (PGLite has the table too via v67). Lost- write on partial DB failure is non-fatal — caller continues, next run re-walks (cheap for hash-short-circuited ops like embed/import). Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (D12 + codex #10–16 from outside-voice review). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(core): brain-score-recommendations — shared data layer T4 of brain-health-100 wave. Pure module — no engine I/O. Takes a BrainHealth snapshot + RecommendationContext, returns ordered Remediation[] ready to feed the doctor remediation plan OR features --auto-fix. Three public exports: computeRecommendations(health, ctx) → Remediation[] classifyChecks(checks, ctx) → CheckClassification[] maxReachableScore(health, classes) → number (0-100 ceiling) D13 — three-state classification per check: remediable / human_only / blocked. The plan ONLY emits remediable items; blocked surfaces alongside as informational with the missing prereq (no API key, etc.). Closes the spin-loop bug on empty / API-key-missing brains (codex #20). D14 — every Remediation has a stable string id (sync.repo, embed.stale, backlinks.fix, extract.all). depends_on references ids, not check names. D9 — idempotency_key is content-hash from canonical-JSON of params. Same intent across runs = same key; failed-row replay via :r<N> suffix is the --remediate loop's job, not this module's. Scope item +A (cost-budget gate) — Remediation.est_usd_cost populated for embed (chars × pricePerMTok from embedding-pricing.ts) and Anthropic jobs (estimateAnthropicCost helper). doctor --remediate --max-usd N gates submission against est_total_usd_cost. Both consumers (doctor + features per D15) import from here. Features executes inline (D15 contract preserved), doctor submits via queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(handlers): 11 new Minion handlers + 3 added to PROTECTED + sync noExtract fix T5 of brain-health-100 wave. PROTECTED_JOB_NAMES extension (D11): synthesize, patterns, consolidate. These cycle phases internally submit `subagent` jobs with allowProtectedSubmit=true, so they CAN spend Anthropic credits. Treating them as "data-quality maintenance" was a misread surfaced by the codex outside-voice review (#6). Protected gate ensures only trusted local callers (CLI, autopilot, doctor --remediate) can submit; an OAuth-scoped MCP client can't burn the user's API budget by submitting a synthesize job over HTTP. 11 new handlers registered in jobs.ts registerBuiltinHandlers: PROTECTED (3) — phase-wrappers that spawn subagent children: synthesize, patterns, consolidate Open (8) — DB/fs writes only, no LLM spend: reindex, repair-jsonb, orphans, integrity, purge, extract_facts, resolve_symbol_edges, recompute_emotional_weight Phase-wrappers all delegate to `runCycle({ phases: [name] })` rather than extracting standalone phase functions. Cycle.ts already owns the lock + abort signal + progress reporter per D10, so the wrapper is a one-liner and cycle.ts remains the single source of truth for phase semantics. Pragmatic deviation from the plan's "extract 6 standalone runXxxPhase functions" — smaller diff, equivalent correctness. Standalone `sync` handler now passes `noExtract: true` (codex #5 fix). Pre-fix, doctor's remediation plan emitting [sync, extract] caused double-extraction (performSync inline-extract + standalone extract job). Now sync defers extract to the dedicated handler. Callers that want inline extract pass { noExtract: false } in job params. Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (T5 + D10 + D11 + codex #5/#6 from outside-voice review). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(doctor): --remediation-plan + --remediate CLI surfaces T6 of brain-health-100 wave. The headline user-facing capability: agents drive brain health to target score via autonomous Minions remediation. Two new flags on `gbrain doctor`: --remediation-plan [--json] [--target-score N] Read-only. Emits ordered Remediation[] from BrainHealth + context. Uses cheap path (D7) — engine.getHealth() + computeRecommendations, NOT a full doctor walk. JSON shape is stable agent contract. --remediate [--yes] [--target-score N] [--max-jobs N] [--max-usd N] [--dry-run] [--json] Sequential submit (D3) with D5 cascade on failure, D7 scoped recheck between steps, D9 content-hash idempotency keys, D13 three-state remediation filtering (only remediable jobs enter the loop), +A cost-budget gate via --max-usd. Check.remediation field added as additive optional (DoctorReport schema_version stays at 2 per D4). PGLite path: synchronous in-process execution with short polling. Postgres path: durable queue submission with waitForCompletion. The --remediate loop: 1. Compute initial plan from BrainHealth 2. Refuse if --target-score > maxReachableScore(health, classes) 3. Refuse if est_total_usd_cost > --max-usd 4. For each step in order: - Skip if depends_on intersects aborted set (D5) - queue.add with content-hash idempotency_key (D9) - waitForCompletion with timeout - Recompute plan from fresh health (D7 scoped recheck) 5. Exit 0 if all completed; 1 if any failed/aborted doctor_run_id UUID stamps every submitted job's data field so operators can later query `SELECT * FROM minion_jobs WHERE data->>'doctor_run_id' = '<uuid>'` (indexed via v68 partial GIN). Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (T6 + D1/D3/D5/D7/D9/D13 + folded scope A). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): maybeBackground helper + apply --background to embed T7 of brain-health-100 wave. New helper in src/core/cli-options.ts formalizes the --background flag pattern. Same semantics in TTY and cron per D9 (submit-and-exit always; --background --follow execs `gbrain jobs follow <id>` after submission). await maybeBackground({ engine, args, jobName: 'embed', paramBuilder: (cleanArgs) => ({ stale, all, ... }), }) // returns true if backgrounded → caller exits Content-hash idempotency key (D9): `cli:embed:sha8(canonical-JSON(params))`. No time-slot. Same intent across runs = same key. Failed-row replay is the doctor --remediate loop's job, not this path's. PGLite degrades to inline execution with a clear stderr note ("PGLite has no worker daemon; running inline"). NOT a no-op, NOT silent — doc-stated semantic difference because PGLite has no worker daemon. Applied to `gbrain embed` as the reference integration. The other 6 commands (extract, lint, backlinks, reindex, integrity, pages) adopt the same 4-line pattern at the top of their entry function — follow-up in a smaller diff once the helper proves out in production. Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (T7 + D9 + Gap 6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(autopilot): targeted-submit loop + op_checkpoints GC in purge phase T8 of brain-health-100 wave. Autopilot dispatch changes (src/commands/autopilot.ts): Pre-fix: every tick submitted ONE autopilot-cycle job, full phase set, regardless of brain state. On a healthy brain pure overhead; on a degraded brain bundled fast wins with slow phases so user waited for the slowest. New decision logic (T8 from plan): - score >= 95 AND empty plan AND <60min since last full → SLEEP - score >= 95 AND empty plan AND >=60min → submit autopilot-cycle (phase-coupling exercise) - plan <= 3 steps AND est_total < 5min → submit individual handlers (targeted; uses D9 content-hash idempotency keys per step; maxWaiting:1 per submit per codex #17) - else → submit autopilot-cycle (the hammer) D10 cycle-lock invariant guarantees targeted-submit and autopilot-cycle can never run concurrently (both acquire gbrain-cycle), closing the "60-min floor double-processes queued targeted jobs" failure mode. Computation uses cheap path (D7) — engine.getHealth() + computeRecommendations, NOT a full doctor walk. Adds ~1 SQL count query per tick; negligible on a 50K-page brain. PROTECTED handlers (synthesize/patterns/consolidate) are submitted with allowProtectedSubmit:true; autopilot is a trusted local caller. Cycle purge phase (src/core/cycle.ts): Added op_checkpoints GC (+C folded scope item). 7-day TTL — any reasonable long-running op finishes inside that window. Non-fatal on pre-v67 brains (table missing). Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (T8 + D7/D9/D10 + codex #17 + folded scope +C). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(core): brain-score-recommendations + op-checkpoint unit tests T10 of brain-health-100 wave — load-bearing decision-pinning tests. test/brain-score-recommendations.test.ts (22 cases): - Healthy brain → empty plan - Per-component remediation paths (sync, embed, backlinks, extract) - depends_on wiring (extract → sync; embed → sync when stale) - Severity ordering (critical > high > medium > low) - D6 #5 determinism: same input twice → byte-identical output - D9 idempotency keys: content-hash format, no time-slot - D9 source isolation: different --source → different key - D13 status field always 'remediable' in output - +A cost-estimate populated for embed - classifyChecks: remediable / blocked / human_only triage - maxReachableScore: all-remediable → 100; all-blocked → current test/op-checkpoint.test.ts (20 cases): - fingerprint stability + key-order invariance (canonical-JSON) - codex #11: extract links vs timeline get different fingerprints - codex #12: reindex markdown vs code get different fingerprints - codex #15: embed model+dim variation produces different fingerprints - reindex chunker_version bump invalidates checkpoint - DB round-trip (load → record → load) - Cross-fingerprint isolation (linksKey vs timelineKey) - clearOpCheckpoint idempotency on missing rows - resumeFilter purity (no I/O, deterministic) - purgeStaleCheckpoints TTL respect 42 new tests, all pass. PGLite engine + resetPgliteState pattern per CLAUDE.md test-isolation guide. Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (T10 + D6 #5 + D9 + D12 + D13 + codex #11/#12/#15). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v0.36.0.0 — brain-health-100 wave + docs/llms refresh T12 of brain-health-100 wave. VERSION + package.json bumped 0.35.6.0 → 0.36.0.0. CHANGELOG entry leads ELI10 ("your agent can now drive your brain to 90/100 by itself, on a cron, without you watching") then drills into the precise mechanics per CLAUDE.md voice rules. llms.txt + llms-full.txt regenerated via bun run build:llms. Trio audit (CLAUDE.md mandatory pre-push check): VERSION: 0.36.0.0 package.json: 0.36.0.0 CHANGELOG: ## [0.36.0.0] - 2026-05-18 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update README/CLAUDE/AGENTS/maintain for v0.36.4.0 brain-health-100 wave - README.md: New-in-v0.36.4.0 callout — `gbrain doctor --remediate` headline, autopilot health-aware tick, eleven new background-job types, three PROTECTED. - CLAUDE.md: Key Files entries for `op-checkpoint.ts`, `brain-score-recommendations.ts`, doctor.ts / jobs.ts / protected-names.ts / autopilot.ts / cycle.ts / embed.ts / cli-options.ts extensions; new "Key commands added in v0.36.4.0" section. - AGENTS.md: Common-tasks entry pointing agents at the one-command remediation loop. - skills/maintain/SKILL.md: Autonomous Phase (gbrain doctor --remediate) at the top, manual per-dimension walk preserved as the fallback path. - llms-full.txt: regenerated to pick up the CLAUDE.md changes (project rule). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(changelog): respectful tone on spend caps for v0.36.4.0 Reframed the cost-budget callout. Pre-fix language said the spend cap prevents a synthesize loop from "burning $100 of Anthropic credits while you're at lunch" — casually treating $100 as the throwaway number is tone-deaf. $100 is a meaningful amount for many people. New language: "spend cap so a synthesize loop can't run up your Anthropic bill while you're at lunch. The cap is yours to set per run." And: "Pass --max-usd 5 (or whatever cap you're comfortable with)." And: "Pick the cap that fits your wallet." Also reframed three adjacent lines: - "healthy brains stop burning cycles" → "stop spending tokens on work that has nothing to do" - "agent can't submit them and burn your API budget" → "can't submit them on your behalf. Your provider bill stays in your hands" - Table cell "Cron with cost cap" / "--max-usd 5" → "Cron with spend cap" / "--max-usd N" llms-full.txt regenerated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 23, 2026
…on + audit-writer unification (#1300) * v0.40.4.0 T1: shared audit-writer primitive Extract createAuditWriter() helper. Five hand-rolled JSONL audit modules (rerank-audit, shell-audit, supervisor-audit, audit-slug- fallback, phantom-audit) duplicated the same ISO-week filename math, best-effort write loop, and read-current-plus-previous-week loop. T2 refactors all 5 onto this primitive. Behavior preservation: filename format, JSONL line shape, mkdir recursive, appendFileSync utf8, stderr-on-failure all byte-identical to the existing modules so their tests pass unchanged. resolveAuditDir() moves here from shell-audit.ts; shell-audit.ts will re-export for back-compat (T2). Honors GBRAIN_AUDIT_DIR with whitespace-trim, falls back to ~/.gbrain/audit/. Test coverage: 22 cases covering ISO-week math + year-boundary edges (2027-01-01 → 2026-W53), env override, mkdir-recursive, fail-open stderr-warn shape, cross-week readback, corrupt-row skip, non-finite- ts skip, round-trip with nested fields, computeFilename + resolveDir accessors. Plan ref: D5=B audit unification cathedral expansion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T2: refactor 5 audit modules onto shared writer Replace the duplicated ISO-week filename math + best-effort write loop + read-current-plus-previous-week loop in: - src/core/rerank-audit.ts (rerank-failures-*.jsonl) - src/core/audit-slug-fallback.ts (slug-fallback-*.jsonl) - src/core/minions/handlers/shell-audit.ts (shell-jobs-*.jsonl) - src/core/minions/handlers/supervisor-audit.ts (supervisor-*.jsonl) - src/core/facts/phantom-audit.ts (phantoms-*.jsonl) All five now delegate file I/O to createAuditWriter from T1. Public API preserved bit-for-bit: - logRerankFailure, readRecentRerankFailures, computeRerankAuditFilename - logSlugFallback, readRecentSlugFallbacks, computeSlugFallbackAuditFilename - logShellSubmission, computeAuditFilename, resolveAuditDir - writeSupervisorEvent, readSupervisorEvents, computeSupervisorAuditFilename plus isCrashExit, summarizeCrashes, CrashSummary (domain-specific helpers stay in supervisor-audit.ts; only file I/O moves) - logPhantomEvent, readRecentPhantomEvents, computePhantomAuditFilename Domain-specific behavior preserved: - audit-slug-fallback emits per-call stderr (D7 dual logging) in the caller; the shared writer is failure-only stderr - rerank-audit truncates error_summary to 200 chars before write - phantom-audit spreads optional fields conditionally (skip undefined) - supervisor-audit keeps single-file readback (no cross-week walk) to preserve pre-v0.40.4 doctor assertions resolveAuditDir lives in src/core/audit/audit-writer.ts; shell-audit.ts re-exports it so existing imports keep working (every other audit module + gbrain-home-isolation.test.ts + minions.test.ts + minions-shell.test.ts pull resolveAuditDir from shell-audit.ts). Operator-visible drift: rerank-audit stderr line drops the 'rerank-failure audit' qualifier — was '[gbrain] rerank-failure audit write failed (...)' now '[gbrain] write failed (...); search continues'. Stderr is human-debugging, not machine-parsed; the file written gives the qualifier away in `tail -f audit/*`. Test coverage: 128/128 audit-touching tests pass unchanged. Plan ref: D5=B audit unification cathedral expansion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T3: getAdjacencyBoosts engine method (PG+PGLite parity) Add BrainEngine.getAdjacencyBoosts(pageIds) returning Map<page_id, AdjacencyRow{hits, cross_source_hits}>. Returns ALL pages with hits >= 1 (callers apply their own threshold). Cross-source semantic (D15=A): cross_source_hits EXCLUDES the target page's own source. A page in source A linked from 2 pages in source A reports cross_source_hits = 0. Linked from 1 in source B + 1 in source C reports 2. Source-scope contract: pageIds MUST already be source-scoped by the caller. Method does NOT filter by source_id. The in-set restriction makes cross-source leakage impossible by construction. JSDoc spells this out; same trust posture as cosineReScore's chunk_id handling. COALESCE(p.source_id, 'default') on both target and from-page sides for defense-in-depth even though pages.source_id is NOT NULL today. JSDoc/SQL contract alignment (codex #2): HAVING >= 1 matches the "returns ALL pages with hits >= 1" contract; threshold of 2 is the caller's call in applyGraphSignals. Known limitation (codex #15): cross_source_hits cannot distinguish "genuinely linked from another team" from "mirrored imports from another source." T-todo-4 captures the v0.41+ refinement. SearchResult type extension (D4=A flat fields, D12=A attribution): - graph_adjacency_hits, graph_cross_source_hits, graph_session_demoted, graph_session_prefix - base_score, backlink_boost, salience_boost, recency_boost, exact_match_boost, graph_adjacency_boost, graph_cross_source_boost, session_demote_factor, reranker_delta All optional; T4-T6 populate them. Test coverage: 7/7 hermetic PGLite cases. Empty input, singleton, same-source hub, cross-source attribution including the "linked-only-from-other-source" case (widget in source b, linked from alice+bob in source a → cross_source_hits=1), JSDoc HAVING>=1 contract. Postgres parity asserted by SQL-shape identity (will get a mirror Postgres E2E in T10's eval gate work via DATABASE_URL when set; PGLite hermetic case shipped now). NULL source_id COALESCE branch noted as untestable in current PGLite schema (pages.source_id is NOT NULL); kept as defense-in-depth. Plan ref: T3 in v0.40.4.0 wave plan; D1=A, D3=A, D15=A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T4+T11: applyGraphSignals 4th stage in runPostFusionStages New file src/core/search/graph-signals.ts. Three signals: 1. Adjacency-within-top-K (×1.05): hits >= 2 inbound from in-set. 2. Cross-source adjacency (×1.10, stacks): cross_source_hits >= 2. Dormant on single-source brains. 3. Session diversification (×0.95): if multiple top-K share a slug prefix, keep highest scoring, DEMOTE the rest. NOT amplify — codex caught the original framing was backwards (amplification of redundancy makes the cited "weak chunks compete for budget" problem worse, not better). Conservative magnitudes (D14=B): 1.05/1.10/0.95. Score-distribution probe (onScoreDistribution) collects min/p25/p50/p75/p95/max + reorder_band_width to feed T-todo-2 magnitude calibration wave. Slot: 4th stage inside runPostFusionStages (hybrid.ts:248), AFTER backlink/salience/recency, pre-dedup. Inherits the v0.35.6.0 floor-ratio gate from computeFloorThreshold — this is the structural protection that prevents a low-cosine hub from outranking a strong non-hub (codex T2 / D1=A). PostFusionOpts extends with graphSignalsEnabled, onGraphMeta, onScoreDistribution. Caller (hybridSearch in subsequent T5 work) resolves graph_signals from the mode bundle. Source-scope contract preserved: getAdjacencyBoosts takes raw page_ids, no source filter. Adjacency is in-set restricted so cross-source leakage is impossible by construction (D3=A). Fail-open: engine throw → JSONL audit row via shared createAuditWriter (T1/T2 primitive, featureName='graph-signals-failures') + meta.errored + caller's results unchanged. Session diversification ALSO skips on failure (predictable all-or-nothing posture). Mutation note (codex #9): score mutated in place. base_score must be stamped at runPostFusionStages entry BEFORE this stage so eval-capture sees pre-boost score (T6 attribution wave). Test coverage (24 cases, including T11 IRON RULE regression): - sessionPrefix multi/single/empty cases - computeScoreDistribution percentile math - Disabled + empty short-circuits - Adjacency hit, no-hit, cross-source stacking, cross-source alone - Session diversification 3-share + single-segment + singleton - Test seam injection (no engine call) - Fail-open: throw → audit row + meta.errored + unchanged - Empty Map → session still runs - Score-distribution always emits when enabled - Meta carries fire counts + duration_ms - Missing page_id silently skipped from dedup set - **T11 IRON RULE regression (3 cases):** * weak hub BELOW floor_threshold does NOT get boosted past above-floor non-hub (the bug class the floor gate exists for) * hub AT floor still gets boosted (gate is < not <=) * NaN score → NaN >= threshold is false → no boost Plan ref: T4 + T11 in v0.40.4.0 wave plan; D1=A, D2=A, D11=B, D14=B, D9=A, D5=B. Codex outside-voice #1 + #2 + #6 + #8 + #9 addressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T5: graph_signals mode-bundle knob + KNOBS_HASH bump 3→4 ModeBundle gains graph_signals: boolean. Per-mode defaults: - conservative: false (cost-sensitive tier) - balanced: true (the wave's primary surface for default-on) - tokenmax: true (power-user tier, capstone fit) SearchKeyOverrides + SearchPerCallOpts gain optional graph_signals field. resolveSearchMode picks via the standard per-call → config override → mode bundle chain. loadOverridesFromConfig parses 'search.graph_signals' from the config table ('1' or 'true' → true). SEARCH_MODE_CONFIG_KEYS adds the key so `gbrain search modes --reset` clears it alongside other knobs. KNOBS_HASH_VERSION bump 3→4 (append-only per CDX2-F13). New `gs=` parts entry appended AFTER cross-modal + column + prov entries. A graph-on cache write cannot be served to a graph-off lookup — mid-deploy hit-rate dip clears within cache.ttl_seconds (3600s). src/commands/search.ts KNOB_DESCRIPTIONS gains graph_signals entry so `gbrain search modes` dashboard renders the new knob. Test coverage: - test/search-mode.test.ts (+ 8 new cases): per-mode defaults canonical, config override both directions, per-call override wins, knobsHash distinct for on/off, config key registered, attributeKnob reports per-call + mode sources correctly. - test/search/knobs-hash-reranker.test.ts: version assertion bumped 3→4 with v0.40.4 rationale comment. - test/cross-modal-phase1.test.ts: version assertion bumped 3→4 with v0.40.4 rationale comment. - Canonical-bundle assertions updated to include graph_signals in expected shape (3 cases). 50/50 search-mode tests pass. 45/45 cross-modal pass. 17/17 knobs-hash-reranker pass. 10/10 balanced-reranker pass. Plan ref: T5 in v0.40.4.0 wave plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T6: per-stage attribution stamping in every boost Every boost stage that mutates SearchResult.score now stamps a field recording WHAT it multiplied: - applyBacklinkBoost → backlink_boost (skipped when count == 0) - applySalienceBoost → salience_boost (skipped when score == 0) - applyRecencyBoost → recency_boost (skipped on evergreen prefix) - applyExactMatchBoost → exact_match_boost (skipped on no-match OR when intent's exactMatchBoost == 1.0 no-op) - runPostFusionStages → base_score stamped ONCE at entry, BEFORE any boost mutates r.score. Idempotent: caller-pre-stamped value preserved. Empty-results short-circuit unchanged. - applyReranker → reranker_delta = original_index - new_index (positive = rank improved; raw rerank score stays in rerank_score) - applyGraphSignals → graph_adjacency_boost, graph_cross_source_boost, session_demote_factor (T4 already stamped these) Why: feeds the T7 `gbrain search --explain` formatter so it can attribute the final score to its components. Without these stamps, "why did this rank where it did?" is grep-and-guess. SearchResult.reranker_delta doc updated to clarify it's a RANK delta (positive = improved), not a score delta. The raw relevance score stays in `rerank_score` (untyped, for back-compat with telemetry that already reads it). Test coverage: 16 new cases in test/search/attribution-stamping.test.ts. Pins: every boost stamps when it fires AND skips stamping when it doesn't (no false attribution on no-op stages). base_score idempotency preserved. reranker_delta computed correctly across rank-improved + rank-degraded cases. All 178/178 search tests pass (no regressions). Plan ref: T6 cathedral expansion in v0.40.4.0 wave plan; D12=A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T7: gbrain search --explain per-stage attribution New file src/core/search/explain-formatter.ts renders SearchResult[] as a multi-line breakdown of how the final score was formed: 1. people/alice (score=12.4) base=10.2 (rrf+cosine) + backlink ×1.08 + salience ×1.05 + adjacency ×1.05 (hits=3) + cross_source ×1.10 (other_sources=2) ↑ reranker rank +2 = final 12.4 Reads the boost_* / base_score / *_hits fields populated by T4 + T6. Empty path: "no boosts applied" when no stage stamped anything. Session demote rendered with `-` prefix (not `+`) so the demotion direction is visually distinct from boosts. CliOptions gains `explain: boolean`; parseGlobalFlags recognizes `--explain` anywhere in argv. cli.ts formatResult for `search` + `query` cases reads CliOptions.explain via the module-level singleton and routes to formatResultsExplain when set. Lazy import keeps the hot path narrow for the common non-explain case. Number formatting: 4-decimal precision, trailing zeros stripped ('1.0000' → '1', '0.1234' → '0.1234'). NaN preserved as 'NaN'. Test coverage: - test/search/explain-formatter.test.ts: 19 cases pin output format. Each boost type renders correctly, every-stage stacking composes, reranker_delta=0 doesn't render, empty list short- circuits, rank numbering 1-based, number formatting edge cases. - test/cli-options.test.ts: 3 new cases for --explain parsing (basic, absent default, any-argv-position). Existing CliOptions literals in test/cli-options.test.ts + test/thin-client-upgrade-prompt.test.ts updated for new required explain field. JSON envelope unchanged — the same attribution fields surface in existing --json output via JSON.stringify; no separate JSON formatter needed. Plan ref: T7 cathedral expansion in v0.40.4.0 wave plan; D12=A + D6=A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T8: doctor check graph_signals_coverage New checkGraphSignalsCoverage in src/commands/doctor.ts. Wired into both runDoctor (local engine) and doctorReportRemote (HTTP MCP / JSON path) so local AND remote-server brains both surface the metric. Logic: 1. Resolve active graph_signals setting: config override 'search.graph_signals' wins, else mode bundle default ('search.mode' → conservative=false, balanced/tokenmax=true). 2. When disabled → silent ok ("disabled — coverage not checked"). Avoids polluting doctor output on installs that don't use the feature. 3. When enabled, compute global inbound-link density: COUNT(DISTINCT to_page_id) / COUNT(*) across non-deleted pages. 4. <10% → warn ("signal will rarely fire") with paste-ready `gbrain extract all` fix hint. 5. >=30% → ok ("fire on most queries") with metric. 6. 10-29% → ok ("fire occasionally") with metric. Known limitation (codex outside-voice #14): global density is an imperfect proxy for "top-K subgraphs have enough edges to fire." T-todo-5 captures the v0.41+ refinement that measures actual fire rate from search-stats after 30 days of data. Best-effort: SQL errors → warn with the underlying message. Never breaks doctor. Test coverage (7 new cases in test/doctor.test.ts): - conservative mode → silent ok regardless of coverage - balanced default + 0 links → warn at 0% with fix hint - balanced default + 40% inbound → ok "fire on most queries" - balanced default + 20% inbound → ok "fire occasionally" - explicit search.graph_signals=false overrides mode default - empty brain → ok with explanation - check is wired into runDoctor (source-grep regression guard) All 55/55 doctor.test.ts cases pass. Plan ref: T8 in v0.40.4.0 wave plan; D6=A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T9: gbrain search stats graph_signals section runStatsSubcommand in src/commands/search.ts gains a graph_signals section in both --json and human output: Graph signals: enabled: true (mode default) failures: 3 fail-open event(s) ECONNREFUSED 2 timeout 1 Data sources: - config: 'search.graph_signals' override → enabled + source=config, otherwise mode-bundle default → enabled + source=mode_default. - JSONL audit: readRecentGraphSignalsFailures(days) returns events; failures_count is len, failures_by_reason buckets by first word of error_summary (e.g. 'ECONNREFUSED', 'timeout'). JSON envelope (schema_version 2 unchanged; graph_signals is a new sibling property of stats, so consumers reading the existing fields keep working): { "schema_version": 2, ...stats..., "graph_signals": { "enabled": bool, "source": "config" | "mode_default", "failures_count": int, "failures_by_reason": { reason: count } }, "_meta": { metric_glossary: { ..., graph_signals_enabled: ..., graph_signals_failures_count: ... } } } Fire-rate metrics (adjacency_fires, cross_source_fires, session_demotions) and score-distribution stats are NOT in this section yet — they require telemetry-table writes from the applyGraphSignals onMeta callback. Wired in v0.41+ via T-todo-2 calibration wave (the wave that needs them). For v0.40.4: status + error count is the actionable surface for "is graph_signals on, and is it failing?" Human output: prints the section after the existing stats block. Edge case: when total_calls is 0 BUT graph_signals is enabled OR has historical failures, still prints the section so operators don't lose the signal on a brain with no telemetry yet. Test coverage (6 cases in test/search/search-stats-graph-signals.test.ts): - search.graph_signals=true → enabled true, source=config - mode=conservative → enabled false, source=mode_default - no config → enabled true (balanced default), source=mode_default - JSONL failures bucketed by first word of error_summary - empty audit → failures_count 0, empty failures_by_reason - human output includes "Graph signals:" header Plan ref: T9 in v0.40.4.0 wave plan; D6=A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T10: eval gates (longmemeval-mini A/B + paired bootstrap) New test/e2e/graph-signals-eval.test.ts runs each longmemeval-mini question twice (graph_signals off, graph_signals on) and asserts: Gate 1 (QUALITY) — paired bootstrap, 10,000 resamples: - If signals-on is significantly WORSE than off (delta < 0 AND p < 0.05) → fail. - Otherwise pass. p>=0.05 either direction OR delta >= 0 → ok. Gate 2a (CHANGE-MAGNITUDE): mean Jaccard@5 over result-set overlap must be >= 0.5. If results overlap less than half, the change is too large and needs human review before default-on. Gate 2b (CHANGE-MAGNITUDE): top-1 stability rate >= 0.7. If 30%+ of top picks change, hard look required. Gate 3 (HARD ABSOLUTE FLOOR): recall@5 drop <= 5pt. Catastrophic regression catch (codex outside-voice #18 — addresses the "top-5 must not drop at all" brittleness on tiny fixtures). Bootstrap implementation: - Per-question observation is binary (recall@5 hit/miss). - Paired pairing on question_id between on/off branches. - Centered distribution under null (subtract observed mean) per standard paired-bootstrap-shift approach for binary outcomes. - Two-tailed p-value: |resampled delta| >= |observed delta|. - Deterministic seeded RNG so test runs are stable across CI. pairedBootstrapPValue exported as a pure function with separate tests for edge cases (empty input, all-equal, strong positive, strong negative, determinism). Reusable from future calibration waves. Hermetic: in-memory PGLite via createBenchmarkBrain + resetTables between questions. No API keys needed (--no-embed import path exercises keyword-only retrieval). Skips gracefully via describe.skip when the fixture is missing. Plan ref: T10 in v0.40.4.0 wave plan; D7=C absolute floor + D13=A paired bootstrap; codex #4 + #18 stability-vs-quality distinction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 T12: VERSION + package.json + CHANGELOG + TODOS VERSION: 0.37.11.0 → 0.40.4.0 package.json: 0.37.11.0 → 0.40.4.0 CHANGELOG.md: top entry for v0.40.4.0 in ELI10-lead voice per CLAUDE.md release rules. Lead is plain-English ("Your search now notices when a page is a hub for your query"); precise file paths / SQL semantics / numbers live in the "Itemized changes" section below. Includes the cathedral-expansion notes (D5=B audit unification, D12=A per-stage attribution, D13=A eval gates) and the "To take advantage of v0.40.4.0" verify-and-fix block. TODOS.md: 5 new items captured under "v0.40.4 graph signals — deferred follow-ups (v0.41+)": - T-todo-1: profile graph-signal SQL latency, merge if hot (D8=C) - T-todo-2: magnitude calibration wave from probe data (D14=B / D17) - T-todo-3: DB-backed audit table for cross-deploy observability (codex #15) - T-todo-4: sync-topology-aware cross-source signal (codex #11) - T-todo-5: replace doctor's global density with fire-rate (codex #14) Verified the 3-line audit: VERSION + package.json + CHANGELOG topmost all match 0.40.4.0. `bun install` ran (lockfile unchanged — root package version isn't stored in bun.lock). `bun run build:llms` refreshed llms.txt + llms-full.txt for the next commit. Plan ref: T12 in v0.40.4.0 wave plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 TODO: document pre-existing shard-2 flake noticed during ship 3 isCacheSafe test failures in shard 2 reproduce on stashed clean master. Confirmed pre-existing — not introduced by v0.40.4. Filed under "Pre-existing flake on master (noticed during v0.40.4 ship)" with reproduction commands + remediation options. Shipping v0.40.4 through it; future wave can fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 privacy scrub: replace wintermute → media in example slugs CLAUDE.md line 550 bans the private OpenClaw fork name in public artifacts. Example session prefix in sessionPrefix() docs + 3 test fixtures swept to 'media/chat/...' instead. Pre-existing scripts/check-privacy.sh in `bun run verify` caught it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 fix: wire graph_signals from mode bundle to runPostFusionStages CRITICAL: pre-landing review (codex outside-voice via /ship Step 9) caught that hybrid.ts's `postFusionOpts` literal at line 566 was building PostFusionOpts WITHOUT threading `resolvedMode.graph_signals` to `graphSignalsEnabled`. The gate at hybrid.ts:358 read the field from a literal that never set it. Result before this fix: the entire v0.40.4 graph-signals wave was dead code in production. Mode bundles set `balanced.graph_signals = true` and `tokenmax.graph_signals = true`, but no production call site ever reached applyGraphSignals. The KNOBS_HASH bump 3→4 correctly varied the cache key by the flag, so contamination was prevented — but the feature itself never fired. All shipped infrastructure (engine SQL, fail-open audit, attribution stamps, --explain formatter, doctor coverage check, search-stats section) was reachable only through the unit-test seam (`opts.adjacencyFn`). The CHANGELOG-advertised behavior never landed in user-visible search. Fix: thread `graphSignalsEnabled: resolvedMode.graph_signals` into the postFusionOpts literal (1 line). Inline comment names codex's catch so future refactors see the regression class. Tests: new test/search/graph-signals-wire-integration.test.ts pins the wire end-to-end. Three cases: 1. balanced mode → hybridSearch on a seeded brain with adjacency hub produces a result with base_score stamped (proves runPostFusionStages actually ran). 2. search.graph_signals=false config override → no graph_* fields stamped (proves the gate honors the override path). 3. Source-grep regression guard pinning the `graphSignalsEnabled: resolvedMode.graph_signals` literal in hybrid.ts so a future refactor can't silently disconnect. All 57 existing v0.40.4 wave tests still pass. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 fix: pre-landing review AUTO-FIX findings (audit msg drift + deleted_at) Two informational findings from /ship pre-landing review (Step 9): 1. Stderr message qualifier drift (rerank/slug-fallback/phantom audits) Pre-v0.40.4 messages included a per-feature qualifier: [gbrain] rerank-failure audit write failed (...) [gbrain] slug-fallback audit write failed (...) [gbrain] phantom audit write failed (...) The T2 refactor dropped the qualifier (plan promised "byte-identical" operator-visible behavior, but stderr lines did drift). Restored via new `errorMessagePrefix` option on `createAuditWriter` (optional, '' default). Three modules pass the per-feature qualifier; shell-audit and supervisor-audit unaffected (their pre-v0.40.4 messages didn't have a separate qualifier — label already carried the feature name). 2. Defense-in-depth `deleted_at IS NULL` on getAdjacencyBoosts SQL was previously protected by-construction (hybridSearch's visibility filter ensures input pageIds are live), but matches the v0.35.5.0 findOrphanPages pattern and closes the bug class if a future caller bypasses hybridSearch. Added to both Postgres and PGLite engines for parity. Three JOIN sites guarded (targets CTE, FROM-pages join). One inline comment per engine cites the codex review and the v0.35.5.0 precedent. Plan ref: /ship pre-landing review v0.40.4.0 (codex finding C and F). All 84 audit+graph-signals tests pass. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 fix: adversarial review HIGH findings (codex H1+H2 + Claude F1) Three HIGH-severity issues from /ship adversarial pass: H1 (Codex): Eval gate was a no-op. Test passed `graph_signals: graphSignalsOn` via `as any` cast, but SearchOpts had no field and hybridSearch's perCall didn't thread it. Both off/on branches resolved to the mode-bundle default — gate measured identical behavior, could pass while detecting nothing. Fix: add `graph_signals?: boolean` to SearchOpts (types.ts:794). Thread `opts.graph_signals` into perCall in both hybridSearch (hybrid.ts:425) AND hybridSearchCached (hybrid.ts:1027) so the cache-key resolver also sees the override. Drop the `as any` from the eval test — types are real now. H2 (Codex): Session diversification fired on entity directories. sessionPrefix() used "any shared parent directory" as the session signal. Result: a search for "people in SF" returned `people/alice` + `people/bob` + `people/charlie` and the latter two got demoted to 0.95×. Every common entity-search query silently penalized legitimate same-type results. Default-on for balanced/tokenmax means production behavior was wrong. Fix: narrow sessionPrefix() to fire ONLY when the slug contains a session-like marker (`chat`/`session`/`sessions` segment OR a `YYYY-MM-DD` date segment). Entity directories (`people/`, `companies/`, `docs/`) return null → diversification skips. Returns NULL (not the slug itself) so the loop skips clean. Examples in JSDoc: your-agent/chat/2026-05-20-foo → 'your-agent/chat/2026-05-20-foo' daily/2026-05-20/journal-entry-1 → 'daily/2026-05-20' transcripts/chat/funding-discussion → 'transcripts/chat/funding-discussion' people/alice → null ← codex H2 regression docs/quickstart → null F1 (Claude adversarial subagent): case-sensitivity drift across 3 sites. loadOverridesFromConfig in mode.ts is case-insensitive + whitespace-trimmed for 'search.graph_signals' values. But doctor's checkGraphSignalsCoverage (doctor.ts:899) AND search-stats's readGraphSignalsStats (search.ts:288) used case-sensitive compare. User sets `search.graph_signals TRUE`: production enables the feature, but doctor + search-stats both silently report disabled. Operators lose the only observability surface for the new feature on values like 'True'/'TRUE'. Fix: trim + lowercase parity at both sites. Mirror the parser's semantic. Also case-normalized `search.mode` reads at both sites for the same divergence class. Tests: - sessionPrefix block rewritten with 7 cases covering chat marker + date anchor + entity dirs (now-NULL) + degenerate (no /). - Added regression test pinning codex H2: people/alice + people/bob + people/charlie do NOT get diversified. - graph-signals-eval.test.ts drops `as any` — typed field works. - Existing tests using `chat/a`/`chat/b` updated to session-shaped `media/2026-05-20/chunk-a` so the date anchor actually fires. 111/111 graph-signals + doctor + search-stats tests pass. Typecheck clean. Plan ref: /ship adversarial review v0.40.4.0 (codex H1, H2; Claude F1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.4.0 TODOs: capture 11 LOW adversarial findings for v0.41+ Codex L1 (audit window underreport) + Claude F2/F3/F5-F8/F11/F12/F14/F16 from /ship adversarial review. None are load-bearing; all captured under 'v0.40.4 adversarial review LOW findings — captured for v0.41+'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.40.4.0 - README: surface v0.40.4.0 graph signals + --explain in Hybrid search capability - CLAUDE.md: annotate engine.ts getAdjacencyBoosts, new graph-signals.ts / explain-formatter.ts / audit/audit-writer.ts, plus hybrid.ts post-fusion 4th stage, mode.ts graph_signals knob + KNOBS_HASH 3→4, cli-options.ts --explain flag, search stats + doctor coverage check - llms-full.txt: regenerated from CLAUDE.md per the build:llms chaser rule Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): pin bun-version to 1.3.13 across all workflows setup-bun action with `bun-version: latest` calls the GitHub API (https://api.github.com/repos/oven-sh/bun/git/refs/tags) to resolve the tag. CI started failing today with HTTP 401 "Bad credentials" even though the action receives a token (visible as `token: ***` in the run log). Pinning the version eliminates the API call entirely. Affected workflows: test.yml, e2e.yml, release.yml, heavy-tests.yml (5 invocations total). Pinned to 1.3.13 — matches package.json engines (`bun >= 1.3.10`) and the version v0.40.4.0 was developed against. Bump cadence: when a new bun version is required, update this pin in one PR. Trading "always-latest" for "always-deterministic" is the right trade for a 5-shard CI matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 27, 2026
…e before (#1521) * feat(schema): migrations v98/v99/v100 for onboard wave (A6 A10 A11 A13 A25, codex #1 #9 #10 #11 #12) Three schema additions supporting the gbrain onboard wave: v98 — links.link_kind nullable column (A10, codex finding #12). The NER extraction was originally going to add a new link_source='ner' provenance, but that would have forced every existing link_source='mentions' query (backlink-count filter, orphan-ratio, doctor checks) to update or metrics would drift across the cutover. Instead: keep link_source='mentions' for the storage layer AND add a nullable link_kind column. Three kinds: 'plain', 'typed_ner', NULL (legacy/unknown — semantically 'plain'). NOT in the links UNIQUE constraint so the storage shape stays compatible. v99 — timeline_entries dedup widening (A11, codex finding #11). Pre-v99 dedup key was (page_id, date, summary). The new --from-meetings extraction writes timeline entries with source='extract-timeline-from- meetings:<meeting-slug>', and codex caught that two meetings with the same date+summary on the same entity page would silently DO NOTHING — the second meeting's provenance is lost. Widened to (page_id, date, summary, source). Legacy rows (source='') preserve current dedup behavior. v100 — migration_impact_log table + content_chunks_stale_idx partial (A6 + A25 + A13 + codex findings #10 + #9). Bundled because both are consumed by the onboard pipeline and ship together. Impact log captures before/after metric stats so gbrain onboard --history shows real deltas; attribution columns (job_id, source_id, brain_id, started_at, idempotency_key) prevent concurrent runs misattributing to wrong migrations. content_chunks_stale_idx partial WHERE embedding IS NULL supports gbrain embed --stale + --priority recent (outer ORDER BY p.updated_at DESC uses existing idx_pages_updated_at_desc via JOIN). Plain NUMERIC columns; delta computed at read time (NOT a stored GENERATED column per eng-review D2 — zero PGLite parity risk). Slot history note: plan originally proposed v97/v98/v99 but master had already used v95 (links 'mentions' CHECK widening), v96 (facts conversation session index), and v97 (pages_dedup_partial_index) by ship time. Codex caught the collision; renumbered to v98/v99/v100. Test pin: test/schema-bootstrap-coverage.test.ts (100/100 migrations apply clean on PGLite), test/migrate.test.ts (152 cases pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(remediation): extract doctor remediation library (A1, codex finding #2) Pre-fix: src/commands/doctor.ts contained two CLI-shaped functions (runRemediationPlan + runRemediate) with hardcoded argv parsing, process.exit calls, and console.log emission. Onboard CLI shell and the upcoming MCP run_onboard op couldn't compose against them — the plan file's "100-LOC thin wrapper" assumption didn't survive codex's review of the actual source. Post-fix: src/core/remediation/ exports a library shape that all three consumers (doctor CLI, onboard CLI, MCP run_onboard) wrap. src/core/remediation/types.ts RemediationPlanOpts, RemediationPlan, RemediationOpts, RemediationResult, StepResult, RemediationHooks (the observability seam — library never calls console.* itself). src/core/remediation/context.ts loadRecommendationContext moved verbatim from doctor.ts. Re-exports RecommendationContext from brain-score-recommendations.ts since that's still the canonical home for the type (consumed by computeRecommendations). src/core/remediation/plan.ts computeRemediationPlan(engine, opts): Promise<RemediationPlan>. Pure read; produces the stable JSON envelope downstream agents bind to. Pulls in computeRecommendations + classifyChecks + maxReachableScore behind one library entry point. src/core/remediation/run.ts runRemediation(engine, opts, hooks): Promise<RemediationResult>. Orchestrator with BudgetTracker, checkpoint resume, D5 dep cascade, D7 per-step recheck. Returns a result object instead of process.exit calls; the CLI shell maps result.budget_exhausted / .target_unreachable / .submitted to exit codes. src/core/remediation/index.ts Barrel for the three modules above. doctor.ts is now a thin wrapper: runRemediationPlan: parse argv → computeRemediationPlan → human/JSON render runRemediate: parse argv → TTY confirm gate → runRemediation(hooks: console.*) The TTY confirmation step deliberately stays in the CLI shell — the library never asks for confirmation; that's a CLI concern. Net: ~340 LOC removed from doctor.ts; ~470 LOC added across the library module (with full JSDoc + per-A-decision rationale comments). Functional behavior preserved bit-for-bit: 67 tests pass across doctor.test.ts + v0_37_gap_fill.serial.test.ts. The Lane E.4 source-text test (test/v0_37_gap_fill.serial.test.ts:329) followed loadRecommendationContext to its new home at src/core/remediation/context.ts — assertions otherwise unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(remediation): generalize computeRecommendations to accept extras (A2, codex finding #3) Pre-fix: computeRecommendations at brain-score-recommendations.ts:170 was a hardcoded planner for 5 synthetic check categories. Adding a Check.remediation field to a new doctor check would NOT auto-wire into --remediation-plan — the planner simply ignored it. Codex caught this when reviewing the plan's "checks ARE specs" framing. Post-fix: optional third arg `extraRemediations: RemediationStep[]` lets callers inject step entries discovered outside the hardcoded planner. The existing 5-category surface is preserved bit-for-bit; on id collision the hardcoded entry wins, so an extra accidentally duplicating a hardcoded id doesn't shadow legacy behavior. RemediationPlanOpts gains the matching field; computeRemediationPlan in src/core/remediation/plan.ts threads opts.extraRemediations through. The 4 new doctor checks (T4) will produce per-check helper functions that return RemediationStep[]; onboard's render layer (T12) aggregates them into the opts.extraRemediations slot. doctor's existing --remediation-plan call passes empty (no behavior change for legacy CLI). 84 tests pass across brain-score-recommendations + doctor suites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(doctor): 4 new onboard checks (embed_staleness, link_coverage, timeline_coverage, takes_count) (A16, T4) Adds src/core/onboard/checks.ts: 4 check helpers + a runAllOnboardChecks aggregator. Each helper returns {check, remediations}, so doctor pushes the Check entry (for human/JSON rendering) AND onboard's plan path collects the RemediationStep[] (via T3's new extraRemediations seam in computeRecommendations). embed_staleness: COUNT(*) on content_chunks WHERE embedding IS NULL. Cheap thanks to content_chunks_stale_idx partial (v100). warn at 1+ stale, fail at 1000+; remediation points at embed-catch-up handler (built in T6). entity_link_coverage: fraction of entity pages with inbound links. Per A21 + codex #15: TABLESAMPLE BERNOULLI on PG when total_pages > 50K with pinned sample formula (LEAST 100, GREATEST 2, target ~5000 rows) AND ±sqrt(p(1-p)/n) confidence interval embedded in message ("coverage: 31% ± 1.3%") so warn/fail decisions show their margin of error. PGLite path: full scan (rare >50K). warn <70%, fail <40%; remediation points at extract-ner handler. timeline_coverage: same TABLESAMPLE policy. warn <90%, fail <70%; remediation points at extract-timeline-from-meetings handler. takes_count: COUNT(*) on takes table. Per A12 two-gate consent: the remediation only emits when `takes.bootstrap_enabled` config is true. Otherwise the check shows "0 takes (takes.bootstrap_enabled is false; opt in to enable)" without an autopilot-eligible remediation. Prevents unattended LLM-bearing extractions on brains that haven't opted in. runDoctor wires runAllOnboardChecks at the end of the DB-checks block (after stale_locks); fast-mode skipped to preserve --fast UX. Thin-client parity (A16 spec) deferred to T16 — the MCP run_onboard op will run these helpers server-side where engine.executeRaw works, which is the real federated path. Adding them to doctor-remote.ts would duplicate the logic without functional benefit since the helpers are server-side queries. 55 doctor tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(engine): listStaleChunks --priority recent + executeRaw AbortSignal (A13/A20, codex #7 #9) Two interface extensions on BrainEngine, with parity across postgres-engine and pglite-engine. Plus a follow-on fix for v99's timeline_entries dedup widening. listStaleChunks gains: - orderBy?: 'page_id' | 'updated_desc' (default 'page_id' = legacy) - afterUpdatedAt?: string | null (composite cursor for updated_desc) When orderBy === 'updated_desc' the query JOINs pages and orders by p.updated_at DESC NULLS LAST, p.id ASC, cc.chunk_index ASC backed by idx_pages_updated_at_desc + content_chunks_stale_idx partial (both indexes added in v100). The cursor "next row" semantic with DESC NULLS LAST + ASC tiebreakers is: (updated_at < prev) OR (updated_at = prev AND page_id > prev_page_id) OR (updated_at = prev AND page_id = prev_page_id AND chunk_index > prev_chunk_index) First page (afterUpdatedAt undefined AND afterPageId 0) bypasses the cursor predicate. Both engines parity-tested via 100/100 pglite-engine tests; Postgres path mirrors the same WHERE clause structure. executeRaw gains: - opts?: {signal?: AbortSignal} Postgres impl: real cancellation via postgres.js's .cancel() on the pending query. Pre-aborted signal short-circuits before the network round-trip; mid-flight abort fires .cancel(). The query throws on abort which the caller catches. PGLite impl: in-process WASM has no kernel-level cancellation. Best-effort: pre-check, then race the query against a signal-rejection promise. The query keeps running in WASM but the awaited result is discarded (DOMException AbortError thrown). Documented gap. ReservedConnection.executeRaw extends the signature for type compatibility but doesn't wire the signal (its only callers are migrations + cycle-lock writes that explicitly don't want cancellation). V99 timeline dedup follow-on: the dedup widening in migration v99 changed the unique index from (page_id, date, summary) to (page_id, date, summary, source). The ON CONFLICT clauses in both engines' addTimelineEntriesBatch + addTimelineEntry impls were still using the old 3-tuple, causing 12 PGLite tests to fail with SQLSTATE 42P10 "no unique constraint matching ON CONFLICT specification". Updated all 4 sites (2 per engine) to the 4-tuple. Typecheck clean, 100/100 PGLite engine tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(embed): --batch-size + --priority recent + --catch-up + embed-catch-up handler (A13) CLI surface on gbrain embed gains 3 flags: --batch-size N Override hardcoded PAGE_SIZE=2000 (clamped 1..10000) --priority recent Walk stale chunks newest-first (page.updated_at DESC) backed by content_chunks_stale_idx + idx_pages_updated_at_desc via T5's listStaleChunks(orderBy='updated_desc') extension. Composite cursor (updated_at, page_id, chunk_index). --catch-up Removes the GBRAIN_EMBED_TIME_BUDGET_MS wall-clock cap; loops until countStaleChunks() returns 0. EmbedOpts gains matching fields; embedAll + embedAllStale plumb them through. The cursor tracking in embedAllStale now advances (afterUpdatedAt, afterPageId, afterChunkIndex) instead of just (afterPageId, afterChunkIndex) when in 'updated_desc' mode. The engine returns p.updated_at as Date|string; the caller normalizes to ISO string for the next page's cursor. New Minion handler `embed-catch-up` registered in jobs.ts. Wraps runEmbedCore with stale=true + catchUp=true + the priority/batchSize the caller supplies. NOT in PROTECTED_JOB_NAMES (embedding spend only — same posture as the existing embed-backfill handler). Consumed by the gbrain onboard remediation pipeline (T11) when embed_staleness check fires. 63 embed tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extract): NER link extraction via schema-pack inference.regex (A10, T7, codex #12) NEW src/core/extract-ner.ts: extractNerLinks(engine, opts). Walks pages, reuses the by-mention gazetteer, applies the active schema-pack's link_types[].inference.regex patterns to assign a typed verb to each mention ("CEO of Acme" + Acme is a company → 'works_at' linking the source page to Acme). Codex finding #12 design: do NOT split link_source='ner' as a new provenance. NER is still mention-derived; splitting would break every existing link_source='mentions' query (backlink-count, orphan-ratio, doctor checks). Instead: keep link_source='mentions' AND set link_kind='typed_ner' (v98 column). LinkBatchInput type gains link_kind field. Both engines' addLinksBatch impls add the column to the INSERT projection + unnest() tuple (column #11). The links UNIQUE constraint excludes link_kind so an existing plain mention row + a typed_ner row for the same (from, to, type, source, origin) collide DO NOTHING; the typed link goes in as a separate row with a DIFFERENT link_type (the inferred verb), so they don't collide on the typical case. CLI: `gbrain extract links --ner` (DB source only). Combined `--by-mention --ner` walk shares ONE gazetteer build across both passes — saves a full walk on big brains. Either flag alone runs its pass solo. Each gets its own --source-id filter inheritance. Minion handler: `extract-ner` (NOT in PROTECTED_JOB_NAMES — regex-only, no LLM spend). Consumed by onboard's entity_link_coverage remediation when coverage <70%. Target-type lookup: one round-trip SELECT slug, source_id, type FROM pages WHERE type IN ('person', 'company', 'organization', 'entity') AND deleted_at IS NULL — built once at extraction start, consulted per-mention. Avoids the N+1 getPage cost. Pack best-effort: when no active pack OR no link_types declared OR no inference.regex on any link_type, returns pack_unavailable=true and 0 created. CLI prints a one-line note; handler returns silently. 122 tests pass (pglite-engine + by-mention); typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extract): timeline from meetings — gbrain extract timeline --from-meetings (A11, T8, codex #11) NEW src/core/extract-timeline-from-meetings.ts: extractTimelineFromMeetings(engine, opts). Walks meeting pages, finds discussed entities via two sources, writes a timeline entry on each entity page. Discussed-entity sources merged: 1. Existing 'attended' links from the meeting (canonical attendees). One round-trip SELECT pulls all attended edges for the loaded meeting set; in-memory Map<meetingSlug → attendees[]> for O(1) lookup per meeting. 2. Body-text mentions via the existing by-mention gazetteer (findMentionedEntities + cross-source guard). Catches entities discussed in the meeting body even when no explicit 'attended' link exists. De-duped via Map<sourceId::slug → entity> within each meeting so a person who's both an attendee AND mentioned in the body gets exactly one timeline row per meeting, not two. Timeline write uses TimelineBatchInput with: source = 'extract-timeline-from-meetings:<meeting-slug>' summary = 'Discussed in <meeting-title>' date = meeting.effective_date Per v99 dedup widening (codex #11): the source field is now in the uniqueness key (page_id, date, summary, source). Two meetings on the same date with the same summary on the same entity page survive as distinct rows — the second meeting's provenance is no longer silently dropped. CLI: `gbrain extract timeline --from-meetings` (DB source only). Mode dispatch — runs SOLO (does not combine with --by-mention/--ner; those are links passes). Minion handler: `extract-timeline-from-meetings` (NOT in PROTECTED_JOB_NAMES — pure SQL + string scan). Consumed by onboard's timeline_coverage remediation when coverage <90%. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(takes): takes-bootstrap from concept/atom/lore pages (A12, A24, T9) NEW src/core/extract-takes-from-pages.ts: Haiku classifier loop. Walks pages WHERE type IN ('concept','atom','lore','briefing','writing', 'originals') AND deleted_at IS NULL AND length(compiled_truth) > 200, ordered by updated_at DESC. Each page is truncated to 20K chars and sent to Haiku with a strict-JSON classifier prompt: {"claim", "kind": fact|take|bet|hunch, "weight": 0..1} Inserts via addTakesBatch with source='cli:takes-bootstrap-from-pages'. Two-gate consent per A12: 1. `takes.bootstrap_enabled` config (default false) — even the manual CLI refuses without it explicitly set. 2. --yes flag (CLI) — interactive confirmation that this sends content to Haiku. The handler-side gate also reads takes.bootstrap_enabled, so even a trusted local Minion submitter (allowProtectedSubmit=true) cannot fire takes-bootstrap on a brain that hasn't opted in. CLI: `gbrain takes extract --from-pages [--yes] [--dry-run] [--source-id X] [--max-pages N] [--holder name]`. Surfaces consent-gate-blocked vs llm-unavailable distinctly so users see the actual blocker. Minion handler `extract-takes-from-pages` added to PROTECTED_JOB_NAMES. Consumed by onboard's takes_count remediation when count=0 AND takes.bootstrap_enabled=true (handler-side double-check). Per A24: ships with classifier infrastructure ONLY. Per-prompt eval suite deferred to v0.42.1 follow-up; autopilot remediation tier for takes-bootstrap stays manual_only until eval coverage catches up. Manual `gbrain takes extract --from-pages --yes` is the only path that triggers it in v0.42.0. parseClaimsJson exported for unit testing — strict JSON parse + ```json fence strip + kind allowlist filter, returns [] on any parse failure. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): recordMinionJobSpend primitive for MCP client_id attribution (A7+A23, codex finding #4) NEW src/core/minion-spend.ts: small primitive that closes the per-OAuth- client spend chain gap codex flagged when MCP run_onboard submits child Minion jobs. Pre-fix: only subagent loops via budget-meter.ts recorded spend against the originating OAuth client. Generic Minion handlers (embed-catch-up, extract-ner, extract-timeline-from-meetings, extract-takes-from-pages) wrote to the gateway with no per-client attribution — admin-scope tokens would have unbounded indirect spend via the run_onboard fan-out. Convention for v0.42.0 (deferred schema column to v0.42.1): - run_onboard MCP op sets job.data.client_id when submitting each child handler. - Handlers that spend LLM/embedding budget call recordMinionJobSpend(engine, job, {operation, spendCents, ...}) which reads job.data.client_id and writes mcp_spend_log with the right attribution. - Local-submitted jobs (CLI, autopilot tick) pass no client_id; the row still lands with client_id=null for global accounting. Two exports: getJobClientId(job): undefined for local jobs; the OAuth client_id string for MCP-submitted ones. recordMinionJobSpend(engine, job, entry): wraps recordSpend with job-aware attribution. Best-effort throughout — spend telemetry failures MUST NOT fail the user's call. A23 full schema column (minion_jobs.client_id + index) deferred to v0.42.1; today's JSONB-pass-through is sufficient for the MCP run_onboard chain to land per-client attribution end-to-end. Handlers adopt the primitive over time; no behavior change for callers that haven't migrated. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(onboard): impact capture module + writeImpactLogRow primitive (A6 + A25 + A17, T11) NEW src/core/onboard/impact-capture.ts. Three exports: captureMetric(engine, metric) Pure-ish: returns the current numeric value for one of 5 metrics (orphan_count, stale_count, entity_link_coverage, timeline_coverage, takes_count). Returns null on any throw per A17 best-effort posture — a stat-query failure MUST NOT block the extraction itself. writeImpactLogRow(engine, attribution, metric, before, after, details?) Best-effort INSERT into v100's migration_impact_log table. Attribution columns (job_id, source_id, brain_id, started_at, idempotency_key, applied_by) per A25 + codex finding #10 so concurrent runs can't misattribute deltas. withImpactCapture(engine, attribution, metric, runner, details?) Convenience: capture-before → run → capture-after → write log row. Per A17 the log row lands even when the runner throws (after-on-fail + error in details), so downstream consumers see a "ran but impact unknown" entry instead of silent loss. Designed to be picked up by the 4 new Minion handlers (embed-catch-up, extract-ner, extract-timeline-from-meetings, extract-takes-from-pages) when they wrap their main runner. Handlers stay decoupled from the log-write path — they just call withImpactCapture with the metric they move. Per-handler integration follows in T12/T13/T15 as those wrappers land. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(onboard): types + render layer (A8, T12) NEW src/core/onboard/types.ts: OnboardRecommendation (extends RemediationStep with apply_policy + prompt_text + migration_id), OnboardReport (stable JSON envelope), OnboardOpts. NEW src/core/onboard/render.ts: toOnboardRecommendation(step): RemediationStep → OnboardRecommendation Sets apply_policy per A8 tiered rules: - protected + job === extract-takes-from-pages → 'manual_only' (A12/A24) - protected + other → 'prompt_required' - non-protected → 'auto_apply' buildOnboardReport(plan, opts?): assembles the stable JSON envelope. renderHuman(report): string. Echoes the "Recommendation + WHY" framing the CEO + Eng + Codex reviews settled on; CLI shell prints to stdout. Stable JSON envelope shape: schema_version: 1 brain_id?: string recommendations: OnboardRecommendation[] summary: { total, auto_eligible, prompt_required, manual_only, est_total_usd } history?: Array<{ remediation_id, metric_name, metric_before, metric_after, delta, applied_at }> Library-shaped — no console.* / process.exit. T13 (onboard CLI shell) calls these from the wrapping CLI. MCP run_onboard (T16) returns the JSON envelope unmodified. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(onboard): gbrain onboard CLI shell (A1, T13) NEW src/commands/onboard.ts (~180 LOC). Thin wrapper that composes: - T2 library (computeRemediationPlan + runRemediation) - T4 onboard checks (runAllOnboardChecks → extraRemediations) - T12 render layer (buildOnboardReport + renderHuman) Three modes: --check (default): print plan, no submission. Computes plan via T2 library with T4 check-derived extraRemediations. Renders human (default) or JSON envelope (--json). --auto: submit auto_apply tier. Requires --max-usd N (cron-safety per A12 + A20 — refuses without explicit cap to avoid surprise spend). --auto --yes: also submit prompt_required tier. --history: dump last 50 migration_impact_log entries. Library hooks wired into stderr (per CLI/library separation): onStepStart, onStepEnd, onBudgetRefused, onBudgetExhausted, onNothingToDo, onTargetUnreachable. Final JSON envelope (--json) or human summary lands on stdout. CLI dispatch: registered in src/cli.ts CLI_ONLY set + case dispatch between 'takes' and 'founder'. Typecheck clean. Manual smoke-test pending T20 E2E (DATABASE_URL gated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(onboard): init nudge + upgrade banner (A4, A18, A20, T14) NEW src/core/onboard/init-nudge.ts exports two fail-open hooks: runInitNudge(engine): Post-initSchema 5-query AbortSignal-bound parallel check against a 3-second wallclock budget. Per A20: uses REAL cancellation via the T5 executeRaw signal extension — Promise.race against a timer was codex's #7 wrong shape. Postgres queries actually .cancel(); PGLite documented gap. Partial-results path: if some checks complete and the budget fires on others, prints what landed + a fallthrough hint pointing at `gbrain onboard --check` for the full picture. Per A18: fail-open — ANY throw is caught, logged to stderr, and suppressed so init returns successfully. Bypass: GBRAIN_NO_ONBOARD_NUDGE=1 short-circuits. Non-TTY default short-circuits too (CI/scripted callers see nothing). Nudge format: one-line summary of opportunities ("Brain has opportunities: 23000 stale chunks, link coverage 32%, 0 takes") + a 'gbrain onboard --check' nudge. runUpgradeBanner(_engine): Lighter post-upgrade banner. Doesn't engine-query — just prints a one-line nudge that upgrades may surface new opportunities. Same fail-open posture. Wired into: src/commands/init.ts:initPGLite (end-of-function, after reportModStatus) src/commands/init.ts:initPostgres (same) src/commands/upgrade.ts:runPostUpgrade (end-of-function, after postUpgradeReferenceSweep) Each wire site uses dynamic import + try/catch so even an import failure can't crash init/upgrade. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(autopilot): tick consults onboard recommendations (A5, A19, A22, T15) Pre-fix: autopilot tick's per-source recommendation walk called computeRecommendations(health, ctx) — doctor's hardcoded 5-category planner. The 4 new onboard checks (embed_staleness, entity_link_coverage, timeline_coverage, takes_count) had nowhere to hook in, so even with takes.bootstrap_enabled flipped on, autopilot never noticed 0 takes and never proposed bootstrap. Post-fix: tick body now ALSO calls runAllOnboardChecks(engine) and threads the result's RemediationStep[] into the T3-generalized third arg of computeRecommendations. The planner merges onboard's extras with the legacy hardcoded entries (hardcoded wins on id collision). Per A19 fail-open: any throw in the onboard-checks path is caught, logged to stderr, and suppressed. The legacy plan (without extras) runs as before — autopilot can't crash from an onboard-check failure. A22 (idempotency-key dedupe across concurrent manual + autopilot runs): inherits from the existing computeRecommendations → remediation.idempotency_key chain. T7-T9 handlers each get their content-hash key from the makeRemediationStep factory; an autopilot tick + a manual `gbrain onboard --auto` submitting the same step in the same brain produce the SAME key, so queue.add(...) dedupes. No behavior change for brains where all 4 onboard metrics already look healthy (extras=[]; legacy plan unchanged). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): run_onboard op with run_protected_onboard scope binding (A7, T16, codex finding #5) NEW MCP op `run_onboard`. Admin scope (NOT localOnly) so federated / thin-client brain installs can probe brain health + submit auto-eligible remediation handlers over OAuth-authenticated MCP. Two-tier authorization per A7 + codex #5: - Admin scope: sufficient for mode='check' (read-only OnboardReport JSON) AND for submitting non-protected handlers in mode='auto'/'auto-with-prompt'. - run_protected_onboard scope (NEW, additive): MUST be granted in addition to admin for any PROTECTED_JOB_NAMES handler to fire (synthesize, patterns, consolidate, extract-takes-from-pages, contextual_reindex_per_chunk). Without the new scope tier, an admin-scoped OAuth token would silently bypass the same protected-name gate `submit_job` enforces at operations.ts:2288. The codex finding #5 caught this: admin scope alone was insufficient guard. Now the run_onboard op explicitly FILTERS protected extras from the recommendation plan when the caller lacks run_protected_onboard; filtered items appear in the response as skipped_missing_scope[] so the caller knows what would have been available with the right grants. Modes: check — read-only OnboardReport JSON envelope. auto — submits auto_apply tier (plus prompt_required when --yes/auto-with-prompt). auto-with-prompt — adds prompt_required tier. Both auto modes REQUIRE max_usd per A12 + A20 cron-safety (rejects with invalid_params if missing). Per A26 source-scope: future extension will scope plans by ctx.sourceId / ctx.auth.allowedSources. Today the recommendation planner is brain-wide; the source-scope thread doesn't change correctness, just optimization. Per A19 fail-open: any error in runAllOnboardChecks during plan-build caught + suppressed; the plan still returns with extras=[] rather than crashing the op. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(verify): add check-source-scope-onboard lint (A26, T17) NEW scripts/check-source-scope-onboard.sh. Grep guard for SQL sites in onboard surfaces (src/core/onboard/, src/commands/onboard.ts) that touch source_id-bearing tables (pages, content_chunks, takes, links, timeline_entries) WITHOUT either: (a) source_id / sourceIds in the WHERE clause, OR (b) the opt-out marker `sourcescope:brain-wide` within 4 lines above the SQL. File-level opt-out: `sourcescope:file-brain-wide` in the file header (first 30 lines) treats every SQL site in that file as intentionally brain-wide. Used by onboard/checks.ts, onboard/impact-capture.ts, and commands/onboard.ts because the onboard CHECKS are explicitly brain-wide aggregates (orphan_count, stale_count, link_coverage are reported across all sources by design). Wired into bun run verify (23 checks total now, all green). Without this gate, any future onboard SQL touching per-source data without source-scoping would silently leak rows across sources — exactly the class of bug v0.34.1's P0 seal closed at the engine layer. The lint adds an explicit forcing function for new code in the onboard surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(install): onboard surface agent prescription (D13, T18) Adds a v0.42.0+ section to INSTALL_FOR_AGENTS.md describing: - First-connect probe: gbrain onboard --check --json - Post-upgrade re-probe (after gbrain upgrade) - Unattended remediation: gbrain onboard --auto --max-usd 5 - MCP run_onboard op for federated/thin-client installs - run_protected_onboard scope requirement for LLM-bearing handlers - Two-gate consent for takes-bootstrap (takes.bootstrap_enabled + --yes) - GBRAIN_NO_ONBOARD_NUDGE=1 bypass for CI Per D13: agents should run --check on first connect AND after every upgrade as a hygiene step. The autopilot path makes this auto-improve on a 24h cycle; the explicit agent probe surfaces opportunities immediately on connect rather than waiting for the next autopilot tick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): hermetic onboard surface contracts (T20) NEW test/e2e/onboard-full-flow.test.ts. 13 hermetic PGLite cases (no DATABASE_URL needed) covering the key onboard contracts: captureMetric — all 5 metrics return expected values on empty brain (0 for counts; 1 for coverage = vacuous truth). runAllOnboardChecks — returns exactly 4 results with correct names; empty brain shows stale/link/timeline ok BUT takes_count warns (0 takes); 0 remediations emitted because takes.bootstrap_enabled defaults to false per A12 two-gate consent. computeRemediationPlan — extras (T3 generalization) thread through to plan.plan output; stable schema_version: 2 envelope. buildOnboardReport — stable schema_version: 1 envelope with the right summary fields populated. toOnboardRecommendation tier policy (A8): - non-protected job → auto_apply - extract-takes-from-pages → manual_only (A12 + A24) - other protected jobs (synthesize, patterns, ...) → prompt_required Full DATABASE_URL-gated end-to-end (real Postgres, actual extractions through Minion handlers) deferred to v0.42.1 once the per-handler test seam lands; the hermetic suite covers the data-shape contracts that matter for downstream consumers binding to the JSON envelopes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.42.0.0 gbrain onboard mega PR — activation surface (closes #1383, completes #1409) VERSION + package.json bumped to 0.42.0.0. CHANGELOG with full ELI10 lead + "What you can do that you couldn't before" itemized list + "To take advantage of v0.42.0.0" upgrade steps per CLAUDE.md voice rules. TODOS.md: 9 follow-up items filed (TODO-A through TODO-I) for the v0.42.1+ wave: pack-aware linkable types, LLM-disambiguation NER, onboard --explain, live-brain impact measurement, 100+-case takes classifier eval, admin SPA UI, full DATABASE_URL E2E, minion_jobs client_id schema column, thin-client doctor-remote parity. llms-full.txt regenerated per CLAUDE.md rule (every CHANGELOG edit followed by bun run build:llms in the same commit). 23/23 verify checks pass. Full implementation across 21 commits on this branch (T0-T21): T0 merge master T1 schema migrations v98/v99/v100 T2 extract doctor remediation library T3 generalize computeRecommendations T4 4 new doctor checks T5 engine API: listStaleChunks orderBy + executeRaw AbortSignal T6 embed --batch-size / --priority recent / --catch-up T7 NER extraction + extract-ner handler T8 timeline-from-meetings + extract-timeline-from-meetings handler T9 takes-bootstrap + extract-takes-from-pages handler T10 recordMinionJobSpend primitive T11 impact capture module + writeImpactLogRow T12 onboard render layer (types + render) T13 gbrain onboard CLI shell T14 init nudge + upgrade banner T15 autopilot tick consults onboard T16 MCP run_onboard + run_protected_onboard scope T17 check-source-scope-onboard lint T18 INSTALL_FOR_AGENTS.md agent prescription T20 hermetic PGLite E2E (13 cases) T21 ship (this commit) Reviews: CEO + Eng + Codex on plan ~/.claude/plans/system-instruction-you-are-working-lively-hollerith.md. 27 A-decisions locked; 18 codex findings absorbed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): connection-resilience regex + doctor warn-not-fail + v0.41.18.0 Two CI fixes from PR #1521 + version renumber per user request. Why fix #1 (connection-resilience.test.ts): T5/A20 extended PostgresEngine.executeRaw signature to accept an optional `opts?: { signal?: AbortSignal }` 3rd arg and rewrote the body as multi-line. The regression test's regex was anchored to the legacy single-line `(sql: string, params?: unknown[])` shape and the assertions banned `try {` / `catch` (which T5 legitimately added for AbortSignal cancellation swallow, NOT for retry). Updated regex to tolerate both shapes; replaced the wrong `not.toContain('conn.unsafe( sql, params')` assertion (which incorrectly flagged the legitimate single call) with a count assertion: `conn.unsafe(` must appear exactly ONCE in the body. Preserves the original D3 intent (no per-call retry — recovery is supervisor-driven via reconnect()) while accepting the new try/catch shape that swallows AbortSignal aborts. Why fix #2 (src/core/onboard/checks.ts): Three of the four new onboard doctor checks (entity_link_coverage, timeline_coverage, embed_staleness) emitted `status = 'fail'` on healthy DBs that simply hadn't run extractions yet. This flipped `gbrain doctor`'s exit code to non-zero on freshly initialized brains, breaking test/e2e/mechanical.test.ts:1280 ("gbrain doctor exits 0 on healthy DB"). Downgraded all three to `status = 'warn'` — these are remediation opportunities, not assertion failures. Doctor exit codes are reserved for actual failures; remediation surfaces use warn-level signaling so they can be picked up by `--remediate` without polluting the exit code. Why fix #3 (version renumber 0.42.0.0 → 0.41.18.0): Per user directive, this wave ships as v0.41.18.0 rather than v0.42.0.0. Master is at 0.41.16.0; 0.41.17.0 is reserved for an in-flight wave. Renamed every reference my branch added (54 files touched): VERSION, package.json, CHANGELOG.md header, TODOS.md, plus inline version-stamp comments across src/, test/, and scripts/. Preserved 13 files with PRE-EXISTING `v0.42.0.0` references on master (from earlier waves originally planned for v0.42 that landed at v0.41.x — those stay as historical record). Verified via per-file diff against origin/master: every renamed reference is one I added in this branch. Audit trio aligned: VERSION=0.41.18.0, package.json=0.41.18.0, CHANGELOG topmost entry=[0.41.18.0]. llms-full.txt regenerated to match CLAUDE.md updates. Bisect contract: this commit fixes CI test failures from PR #1521's landing. Typecheck clean; connection-resilience suite 26/26 pass. Refs A20 (executeRaw AbortSignal), A16 (4 new onboard checks), codex #1 (master collision avoidance via renumber). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Auto-update notifications for GBrain. Users get notified when meaningful updates are available, always asked before anything is installed.
New CLI command:
gbrain check-update [--json]— checks GitHub Releases for new versions, compares semver (minor+ only, skips patches), fetches changelog diffs. Fail-silent on network errors.Agent workflow (SKILLPACK Section 17):
Standalone SKILLPACK users:
DX:
skills/migrations/directory convention for version-specific post-upgrade directives~/.gbrain/update-state.jsonschema state trackingTest Coverage
test/check-update.test.ts: 20 unit tests (parseSemver, isMinorOrMajorBump, extractChangelogBetween, CLI wiring)test/e2e/upgrade.test.ts: 5 E2E tests against real GitHub APICoverage gate: PASS (100% of new code paths tested)
Pre-Landing Review
Pre-Landing Review: No issues found.
Adversarial Review
Both Claude and Codex adversarial reviews ran. Key finding fixed:
extractChangelogBetween— minor version compared without major-version guard, causing incorrect changelog entries across major boundaries.Other findings (informational, deferred):
detectInstallMethod()PATH execution pre-existing, not introduced by this PRPlan Completion
11/11 plan items DONE.
TODOS
No TODO items completed in this PR.
Test plan
gbrain check-updatereturns human-readable outputgbrain check-update --jsonreturns valid JSONgbrain check-update --helpprints usage🤖 Generated with Claude Code