Skip to content

fix(serve): admin register-client supports authorization_code + PKCE clients#1077

Closed
lukejduncan wants to merge 1 commit into
garrytan:masterfrom
lukejduncan:fix/admin-public-client
Closed

fix(serve): admin register-client supports authorization_code + PKCE clients#1077
lukejduncan wants to merge 1 commit into
garrytan:masterfrom
lukejduncan:fix/admin-public-client

Conversation

@lukejduncan

Copy link
Copy Markdown
Contributor

Summary

The admin POST /admin/api/register-client endpoint hardcoded grant_types: ['client_credentials'] and redirect_uris: [], so clients registered through the dashboard cannot be used with any browser-based OAuth client (claude.ai's Custom Connector, Cursor's authorization_code flow, etc).

Two issues, one endpoint:

  1. No way to register a client with authorization_code grant or any redirect URI. Browser-based clients need both.
  2. Even after fixing (1) to register a confidential client with authorization_code, the MCP SDK's clientAuth middleware compares the request's client_secret against client.client_secret from the provider's getClient() — but GBrainClientsStore.getClient (src/core/oauth-provider.ts:150) returns the SHA-256 hash in that field. The comparison hash !== plaintext always fails with invalid_client / Invalid client_secret. (exchangeClientCredentials hashes the input first, which is why the client_credentials grant works.)

Fix

Add three optional fields to the request body:

  • grantTypes — pass-through to registerClientManual
  • redirectUris — pass-through to registerClientManual
  • tokenEndpointAuthMethod — when set to 'none', NULL out client_secret_hash so the SDK's if (client.client_secret) guard short-circuits past the broken comparison. PKCE carries the full integrity of the flow — the standard OAuth 2.1 pattern for public browser-based clients.

Defaults preserve old behavior: omitting all three yields the client_credentials-only / no-redirect-URI client the endpoint produced before.

Test plan

End-to-end verified against claude.ai's Custom Connector:

  • Register via curl -X POST /admin/api/register-client -d '{\"name\":\"claude-ai\",\"grantTypes\":[\"authorization_code\",\"refresh_token\"],\"redirectUris\":[\"https://claude.ai/api/mcp/auth_callback\"],\"tokenEndpointAuthMethod\":\"none\"}' → returns clientId only (no clientSecret)
  • GET /authorize → 302 to claude.ai/api/mcp/auth_callback?code=...
  • POST /token (no client_secret, PKCE code_verifier) → access_token
  • POST /mcp initialize with the token → valid MCP server info
  • claude.ai Custom Connector consent + tool calls work end-to-end from web and iOS

Without tokenEndpointAuthMethod: 'none', the same flow fails at the token exchange with invalid_client regardless of whether the supplied secret is correct.

Note

Fixing the SDK comparison properly (so manually-registered confidential clients also work with authorization_code) would be a separate, larger change — probably either custom token-handler middleware that hashes before comparing, or storing plaintext for compatibility. The PKCE-only path is the right answer for the browser-OAuth use case anyway, so this patch unblocks it without touching that broader question.

🤖 Generated with Claude Code

…clients

The admin /admin/api/register-client endpoint hardcoded
`grant_types: ['client_credentials']` and `redirect_uris: []`, making
clients registered through the dashboard unusable with any
browser-based OAuth client (claude.ai's Custom Connector,
Cursor's authorization_code flow, etc).

Two issues, one endpoint:

1. No way to register a client with `authorization_code` grant or any
   redirect URI. Browser-based clients need both.

2. Even if you patch (1) in to register a confidential client with
   authorization_code grant, the SDK's clientAuth middleware compares
   the request's client_secret against `client.client_secret` from the
   provider's getClient() — but GBrainClientsStore returns the SHA-256
   *hash* in that field (security-correct on its own, but the SDK
   expects plaintext). The comparison `hash !== plaintext` always
   fails, returning `invalid_client / Invalid client_secret` on every
   token exchange. gbrain's exchangeClientCredentials hashes the input
   before comparing, which is why client_credentials grant works.

This change adds three optional fields to the request body:
- `grantTypes`: pass-through to registerClientManual
- `redirectUris`: pass-through to registerClientManual
- `tokenEndpointAuthMethod`: when set to `'none'`, NULL out
  client_secret_hash so the SDK's `if (client.client_secret)` guard
  short-circuits past the broken comparison. PKCE then carries the
  full integrity of the flow — the standard OAuth 2.1 pattern for
  public clients.

Defaults preserve old behavior: omitting all three flags yields the
client_credentials-only / no-redirect-URI client the endpoint produced
before.

End-to-end verified with claude.ai's Custom Connector:
- /authorize → 302 to claude.ai/api/mcp/auth_callback?code=...
- POST /token (no client_secret, PKCE code_verifier) → access_token
- POST /mcp initialize → valid MCP server info

Without `tokenEndpointAuthMethod: 'none'`, the same flow fails at the
token exchange with `invalid_client` even though the secret is correct.
garrytan added a commit that referenced this pull request May 18, 2026
…ients

The admin dashboard's /admin/api/register-client endpoint hardcoded
client_credentials and ignored grantTypes, redirectUris, and
tokenEndpointAuthMethod. Result: you couldn't register a browser-based
PKCE client (claude.ai Custom Connector, Cursor, etc.) through the
dashboard — only confidential machine-to-machine clients worked.

Pass grantTypes / redirectUris through to registerClientManual. When
tokenEndpointAuthMethod === 'none', NULL out client_secret_hash so the
SDK's clientAuth middleware skips the hash-vs-plaintext compare that
would otherwise reject the no-secret PKCE flow.

Cherry-picked from PR #1077.

Co-Authored-By: lukejduncan <lukejduncan@users.noreply.github.com>
garrytan added a commit that referenced this pull request May 19, 2026
After the fix-wave shipped, an audit found 11 commits with no new test
file. Some were inherently structural (build pipelines, shell content)
or had existing test coverage that worked either way; others had real
regression risk with no guard. This commit closes the gaps that matter.

New regression tests for:

- OAuth `verifyAccessToken` throws `InvalidTokenError` (not bare Error)
  on both expired and unknown token paths. Pre-fix, the SDK's
  `requireBearerAuth` middleware fell through to 500 instead of 401 →
  client token-refresh logic never fired (#935).

- `loadConfig` translates legacy `{provider, model}` config shape to
  the canonical `embedding_model: <provider>:<model>`. 3 cases: pure
  legacy → migrated; canonical wins over legacy when both present;
  canonical-only is untouched. Pre-fix, Voyage/Cohere/Mistral users
  silently fell through to OpenAI (#1086).

- `configDir` rejects relative paths; rejects `..` segments via both
  separators (regression guard for the Windows path acceptance fix
  #1019 / cherry-pick #1083).

- `resolveBootstrapToken` (new exported helper extracted from
  `runServeHttp`). 9 cases: unset env generates fresh, valid env
  accepted, hyphens/underscores accepted, < 32 chars rejected, special
  chars rejected, whitespace trimmed, empty string rejected, 32-char
  boundary accepted, 31-char one-short rejected. Security-critical
  validation surface (#1024).

- GET /mcp returns 405 with `Allow: POST, DELETE` (E2E case in
  `serve-http-oauth.test.ts`). Pre-fix, claude.ai and other probing
  MCP clients saw 404 and gave up (#1076).

- apply-migrations `process.exit(0)` on list / dry-run / up-to-date
  paths. Source-shape assertion locks the rule in; shell scripts
  gating on `$?` work (#1062).

- Autopilot wrapper sources `~/.zshenv` BEFORE `~/.zshrc`. zshenv is
  the canonical place for env vars in non-interactive zsh; without
  this ordering, LaunchAgent subprocesses never inherit secrets
  exported in zshrc (#966).

- `test/fix-wave-structural.test.ts` consolidates source-shape
  regression guards for fixes whose behavior is hard to runtime-test
  without heavy mocking: query cache drain (#1125), admin embed
  manifest + handler (#1090), admin register-client PKCE branch
  (#1077), PGLite v0.11.0 phase A in-process routing (#1100), query
  `--no-expand` negation (#1124). 9 source-grep assertions.

Refactored `runServeHttp` to extract `resolveBootstrapToken` as a pure
helper. The boot path now consumes the helper's tagged-union result
({kind:'ok'|'error'}); side effects (`process.exit`, `console.error`)
moved to the caller. Unit-testable without spinning up Express.

Test counts: oauth 71 (was 69), config 20 (was 14), apply-migrations
19 (was 18), autopilot-install 5 (was 4), serve-http-bootstrap-token
9 (new file), fix-wave-structural 9 (new file). Net: +28 cases across
6 files; +1 new exported function with full coverage.

Remaining audit gaps (deferred):
- e82dda0 admin embed E2E (post-deploy curl smoke covers this)
- d93fa81 apply-migrations PGLite chain E2E (already smoke-tested
  manually in the original commit; subprocess test would be flaky in
  CI without DATABASE_URL gating)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 19, 2026
* fix(sync): accept .tf / .tfvars / .hcl in CODE_EXTENSIONS

Terraform repos were invisible to `gbrain sync --strategy code` because
the three HCL-family extensions never reached the file walker. Silent
data loss — the user thinks the sync covered the repo but the IaC layer
was dropped on the floor.

detectCodeLanguage() returns null for these extensions, so the chunker
falls back to recursive (no tree-sitter grammar for HCL) — the same
path toml/yaml take.

Closes #878.

Co-Authored-By: johnybradshaw <johnybradshaw@users.noreply.github.com>

* fix(upgrade): run `bun update gbrain` from Bun's global install root

`gbrain upgrade --strategy bun` was failing on canonical
`bun install -g github:garrytan/gbrain` installs because `execSync('bun
update gbrain')` ran in the user's shell cwd. Bun's update operates on
whatever package.json it finds via cwd-walk, so a user not standing in
the global root got "No package.json, so nothing to update".

resolveBunGlobalRoot() returns the right directory:
1. `$BUN_INSTALL/install/global` when set (operator override).
2. `~/.bun/install/global` (Bun's documented default).
3. Walk up from realpath(argv[1]) looking for `node_modules/gbrain` —
   handles non-standard installs without trusting argv naming.

execFileSync replaces execSync (no shell), with cwd pinned. Error path
prints the exact `cd && bun update` recovery command instead of a vague
hint.

Closes #1029. Cherry-picked from PR #1032.

Co-Authored-By: mvanhorn <mvanhorn@users.noreply.github.com>

* fix(config): redact sensitive values in `config set` output (closes #892)

`gbrain config set openai_api_key sk-...` was echoing the full key to
stderr via `console.log('Set %s = %s', key, value)`. Shell scrollback and
tmux scroll buffers commonly retain stderr for hours; a screen-share or
shoulder-glance during set leaked the secret.

The `show` path already redacted but used a naive `.includes('key')`
substring check that would mask 'monkey' or 'parsekey' (no false-negative
but ugly).

Single source of truth: `isSensitiveConfigKey()` uses a word-boundary
regex (`(^|[._-])(key|secret|token|password|pwd|passwd|auth)([._-]|$)/i`)
so 'openai_api_key' matches but 'monkey' doesn't. `redactConfigValue()`
composes the postgresql:// URL redactor + sensitive-key check, used by
both `show` and `set`. Helpers exported for unit tests.

Closes #892. Cherry-pick of @sharziki's PR #918 (config.ts hunk only —
the extract.ts walker change in that PR is unrelated and tracked in #202).

Co-Authored-By: sharziki <sharziki@users.noreply.github.com>

* fix(oauth): throw InvalidTokenError so bearerAuth returns 401, not 500

`verifyAccessToken` was throwing bare `Error` on expired or invalid
tokens. The MCP SDK's `requireBearerAuth` middleware catches
`InvalidTokenError` and returns 401 with WWW-Authenticate; bare Error
falls through to 500. Result: legitimate clients with stale tokens hit
500-not-401, so token-refresh logic (which keys off 401) never fires.

Two call sites in verifyAccessToken: token-expired path and
invalid-token path. Both now throw InvalidTokenError. Existing tests
continue to pass because they assert on the throw, not the message class.

Closes #935. Cherry-picked from PR #1012.

Co-Authored-By: Aashiqe10 <Aashiqe10@users.noreply.github.com>

* fix(serve): return 405 on GET /mcp instead of 404

MCP Streamable HTTP spec says GET /mcp opens an optional SSE backchannel
for server-initiated messages. gbrain's transport is stateless and
doesn't push server-initiated messages, so per spec we MUST return 405
with Allow: POST, DELETE — not 404. Probing clients (claude.ai, etc.)
distinguish "endpoint exists, no SSE channel" from "endpoint missing"
on this status code; 404 makes them give up.

Cherry-picked from PR #1076.

Co-Authored-By: lukejduncan <lukejduncan@users.noreply.github.com>

* fix(doctor): resolve whoknows fixture from module location, not cwd

`gbrain doctor` warned about a missing whoknows fixture for every install
that wasn't standing in the gbrain source repo at run time — which is
everyone. The check used `process.cwd()` to locate the fixture, so any
real user (running doctor against `~/.gbrain`) saw a spurious warning.

`resolveWhoknowsFixturePath()` walks up from `import.meta.url` looking
for the source-repo signature (`src/cli.ts` + `skills/RESOLVER.md`),
respects `GBRAIN_WHOKNOWS_FIXTURE_PATH` env override (absolute or
cwd-relative), and returns null with an actionable warning when the
fixture can't be located.

Closes #969. Cherry-picked from PR #1034.

Co-Authored-By: mvanhorn <mvanhorn@users.noreply.github.com>

* fix(frontmatter): centralize --fix backups under ~/.gbrain/backups/

`gbrain frontmatter validate --fix` and `gbrain frontmatter generate
--fix` wrote `<file>.bak` siblings into the source tree. Users running
gbrain over a brain repo found .bak files scattered through people/,
companies/, etc. that broke gitignore expectations and showed up in
`git status` after every fix pass.

Backups now land under `~/.gbrain/backups/frontmatter/<run-id>/<rel>.bak`
with an iso-week-sorted run-id so a multi-fix session keeps the same
parent directory. Backup directory + per-file structure mirrored from
the original file's relative path. The .bak safety contract is intact
for both git and non-git brain repos.

Also adds `--include-catch-all` opt-in to `frontmatter generate` so the
default catch-all rule (`type: note`) is no longer applied to arbitrary
workspace documents that happen to live under a brain root.

Closes #902. Cherry-picked from PR #903.

Co-Authored-By: 100yenadmin <100yenadmin@users.noreply.github.com>

* fix(config): use path.isAbsolute() for GBRAIN_HOME on Windows

The GBRAIN_HOME validator rejected every valid Windows path (`C:\\Users\\...`,
`D:\\gbrain`, etc.) because it used `trimmed.startsWith('/')` to check for
absoluteness — only POSIX absolute paths pass that. `path.isAbsolute()` is
the cross-platform check.

Same fix for the `..` traversal check: split on both `/` and `\` so
Windows path separators don't sneak `..` through.

Closes #1019. Cherry-picked from PR #1083.

Co-Authored-By: sharziki <sharziki@users.noreply.github.com>

* fix(ai): warn only for the configured embedding provider, not all recipes

Gateway construction was warning on stderr for every recipe with an
embedding touchpoint missing max_batch_tokens — including providers the
brain isn't using. Users on Voyage saw noise about OpenAI / Google /
DashScope / etc. recipes that never get loaded.

Filter the warning to recipes whose provider id is referenced by
`embedding_model` or `embedding_multimodal_model` in the active config.
The structural protection against forgetting max_batch_tokens stays in
place for the recipes that actually run; the noise for unrelated recipes
goes away.

Cherry-picked from PR #1117.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(sync): skip git pull when repo has no origin remote

`gbrain sync` ran `git pull` unconditionally and printed scary stderr
on every cycle for brains that have no `origin` remote (local-only
workflows, single-machine setups, brains initialized via `gbrain init
--pglite` against an arbitrary directory). The pull failed harmlessly
but the noise was confusing and made operators think sync was broken.

`hasOriginRemote()` probes `git remote get-url origin` with stdio
ignored; on failure (`no such remote`), skip the pull, print a single
informational line, and proceed with the local working tree.

Cherry-picked from PR #1119.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(query): drain cache writes before CLI exit

The query cache write was fired with `void promise.catch(...)` — true
fire-and-forget. On a fast CLI invocation (`gbrain query <q>` exits in
~50ms), the process terminates before the cache write commits. Result:
the cache effectively never warms from CLI use; every query is a miss.

`awaitPendingSearchCacheWrites()` tracks each in-flight cache write in a
module-level Set. The CLI dispatcher awaits the set after `query`
finishes formatting output but before the process exits. MCP server path
unchanged (long-lived process, fire-and-forget remains correct).

Cherry-picked from PR #1125.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(backlinks): dedupe (source, target) pairs within a single source page

A source page that mentions the same entity N times produced N
duplicate "Referenced in" lines on the target. `extractEntityRefs`
returns one EntityRef per occurrence, and the per-ref `hasBacklink`
check reads a snapshot of `target.content` that's frozen at outer
scope — so every iteration sees "no backlink yet" and appends another
gap. The cumulative effect on a long meeting note with multiple
mentions of the same person was visible in PRs landing 3-5 identical
Timeline entries.

Track seen target slugs per source page; cap gaps at one pair.

Cherry-picked from PR #967 with a current-master regression test
covering both markdown-link and Obsidian-wikilink formats in the same
source page.

Co-Authored-By: p3ob7o <p3ob7o@users.noreply.github.com>

* fix(dream): audit backlinks without mutating pages during cycle

The dream/autopilot maintenance cycle ran the backlinks phase in 'fix'
mode, which writes "Referenced in" timeline bullets into entity pages
every sync. The graph extractor + auto-link path is the canonical link
store during sync/dream/autopilot — the legacy filesystem fixer wrote
markdown that fought with both the user's manual edits and the graph
layer's own timeline.

Cycle now runs backlinks in 'check' mode (audit-only); the materializer
remains available via `gbrain check-backlinks fix` for users who really
want markdown backlinks committed to disk.

Cherry-picked from PR #1027.

Co-Authored-By: sliday <sliday@users.noreply.github.com>

* fix(autopilot --install): source ~/.zshenv before zshrc/bashrc

zshenv is the canonical place for env vars in zsh on macOS — zshrc is
sourced only for interactive shells, so vars exported in zshrc don't
reach a non-interactive subprocess like the autopilot wrapper. Users
who exported GBRAIN_DATABASE_URL, OPENAI_API_KEY, or ANTHROPIC_API_KEY
in zshrc and assumed autopilot would inherit them hit silent missing-
secret failures on the LaunchAgent.

Source ~/.zshenv first (always reaches non-interactive shells per zsh
docs), then fall back to ~/.zshrc / ~/.bashrc for users on other
profile conventions.

Cherry-picked from PR #966.

Co-Authored-By: p3ob7o <p3ob7o@users.noreply.github.com>

* fix(apply-migrations): return exit 0 on list/dry-run/up-to-date

`gbrain apply-migrations list`, `gbrain apply-migrations --dry-run`, and
the "All migrations up to date" path were returning from the async
function but never calling `process.exit(0)`. The CLI dispatcher in
cli.ts treated the implicit fall-through as exit 1 when the parent
process inspected status via shell scripts, breaking automation that
gates on `apply-migrations list && do-something`.

Three call sites: list, dry-run, and the no-op path. All three now
exit(0) explicitly.

Cherry-picked from PR #1062.

Co-Authored-By: nezovskii <nezovskii@users.noreply.github.com>

* fix(sync): scope auto-embed to source on incremental syncs

`gbrain sync --source-id X` triggered auto-embed for the affected slugs
but `runEmbed` ran with no `--source` flag, so it fell back to the
default source. For non-default-source syncs the page row lives at
(sourceId, slug) — the embed code saw "Page not found" for the right
slug under the wrong source, swallowed the error as best-effort, and
the sync result reported `embedded: 0` for the wrong reason.

`buildAutoEmbedArgs(slugs, sourceId)` is the new helper: when sourceId
is set, prepends `--source X`. Exported for the regression test.

Pairs with the upcoming source-id write-path audit (P1 #8). Cherry-picked
from PR #1120.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(query): honor source_id with no-expand for cross-source search

Two related corrections:

1. `gbrain query --no-expand` parsed `--no-expand` as the literal key
   `no_expand` instead of negating the boolean `expand` param. Result:
   the flag was silently ignored and expansion always ran. Now any
   `--no-<key>` where `<key>` is a boolean param flips it false.

2. The `query` op's source-id resolution treated `ctx.sourceId` as
   authoritative, so an explicit per-call `source_id` was overridden by
   the federated read scope. Now per-call `source_id` wins;
   `source_id=__all__` is an explicit opt-out for local cross-source
   search.

Cherry-picked from PR #1124.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(doctor): child-table orphan detection (closes #1063)

The autopilot orphans phase detects orphan PAGES (no inbound links via
page-graph) but never scans FK-child tables. After a bulk delete or a
pre-FK-migration code path, orphan rows can persist indefinitely in
content_chunks, page_versions, tags, takes, raw_data, timeline_entries,
or links — all declared ON DELETE CASCADE, so any orphan row is
unexpected.

`childTableOrphansCheck` enumerates 10 FK columns across 8 tables:
- 8 NOT NULL columns (cascade): any value not in pages.id is an orphan.
- 2 nullable SET NULL columns (links.origin_page_id, files.page_id):
  NULL is valid; only NOT-NULL-but-missing-in-pages counts.

Surfaces paste-ready cleanup SQL when orphans are found.

Cherry-picked from PR #1064.

Co-Authored-By: vincedk-alt <vincedk-alt@users.noreply.github.com>

* fix(autopilot,cycle): stop respawn-storm from steady-state 'partial' cycles

Two compounding bugs under KeepAlive=true:

1. Autopilot tripped its circuit breaker on cycle.status === 'partial',
   not just 'failed'. 'partial' means at least one phase warned/failed
   while others ran — a soft signal, not fatal. On every cycle that
   warned, autopilot logged a failure and the supervisor respawned the
   worker.

2. The orphans phase emitted 'warn' when `count > 20` orphan pages.
   That threshold was tuned for small dev brains; on any corpus past a
   few hundred pages it fires every cycle in steady state. Together
   with bug 1, this produced visible respawn storms.

Fix:
- Autopilot trips only on cycle.status === 'failed'.
- Orphans phase warns by ratio: orphans / total_pages > 0.5 (the real
  "your graph fell apart" signal), not by absolute count.

Cherry-picked from PR #1113.

Co-Authored-By: sergeclaesen <sergeclaesen@users.noreply.github.com>

* fix(ai): reject partial embedding responses before indexing

`embedSubBatch` only validated the FIRST embedding's dimension and never
asserted the response length matched the input length. If a provider
returned fewer embeddings than requested (rate-limit truncation,
malformed response, etc.), the gateway silently indexed an offset-shifted
result — every page after the missing index got the embedding of a
different page's chunk.

Two new guards:
1. `result.embeddings.length === texts.length` — fail loud if any count
   mismatch, with a paste-ready retry hint.
2. Validate dim on EVERY embedding, not just the first.

Cherry-picked from PR #926.

Co-Authored-By: 100yenadmin <100yenadmin@users.noreply.github.com>

* fix(serve): admin register-client supports auth_code + PKCE public clients

The admin dashboard's /admin/api/register-client endpoint hardcoded
client_credentials and ignored grantTypes, redirectUris, and
tokenEndpointAuthMethod. Result: you couldn't register a browser-based
PKCE client (claude.ai Custom Connector, Cursor, etc.) through the
dashboard — only confidential machine-to-machine clients worked.

Pass grantTypes / redirectUris through to registerClientManual. When
tokenEndpointAuthMethod === 'none', NULL out client_secret_hash so the
SDK's clientAuth middleware skips the hash-vs-plaintext compare that
would otherwise reject the no-secret PKCE flow.

Cherry-picked from PR #1077.

Co-Authored-By: lukejduncan <lukejduncan@users.noreply.github.com>

* fix(extract-facts): treat slugs:[] as no-op, not unscoped full-walk

`runExtractFacts` checked `opts.slugs && opts.slugs.length > 0` to
decide between scoped and full-brain walk. Both `undefined` (caller
omits → full walk intended) AND `[]` (sync no-op → zero work intended)
fall through to the same `else` branch and triggered
`engine.getAllSlugs()`.

On a multi-thousand-page brain, the unintended full walk exceeded
the autopilot-cycle ~600s timeout and dead-lettered the job — visible
in production as `[cycle.extract_facts] start` followed by silence
until `Autopilot stopping (cycle-failure-cap)`.

Use presence (`opts.slugs !== undefined`), not truthiness, to
distinguish the two modes. Empty array is a real incremental no-op.

Closes #1096. Three regression cases in test/extract-facts-phase.test.ts:
slugs=[] no-op, slugs=undefined still walks, slugs=['a'] walks just one.

Co-Authored-By: navin-moorthy <navin-moorthy@users.noreply.github.com>

* fix(serve): embed admin/dist into binary; serve from manifest (closes #1090)

Pre-fix, /admin returned 404 on every globally-installed binary because
serve-http.ts:780 resolved admin/dist via process.cwd(). The admin SPA
files are checked into git but `bun build --compile` does NOT embed
arbitrary directories — only assets imported via `with { type: 'file' }`
ESM imports land in the compiled binary.

Wire:

- scripts/build-admin-embedded.ts walks admin/dist/, emits
  src/admin-embedded.ts with one `with { type: 'file' }` import per
  file + a manifest map (request path → resolved path + mime).
  Auto-invoked by `bun run build:admin`.

- src/admin-embedded.ts is the auto-generated module. Bun resolves
  every file: import to a path that works at runtime inside the
  compiled binary (same pattern as src/core/chunkers/code.ts WASM
  imports).

- serve-http.ts switches to two-tier resolution: cwd-relative
  admin/dist for dev (Vite hot-rebuild), embedded manifest otherwise.
  Embedded path reads bytes lazily and caches per-asset for the
  lifetime of the process.

- scripts/check-admin-embedded.sh CI gate re-runs the generator and
  fails on drift (mirrors check-wasm-embedded.sh). PRs that rebuild
  admin/dist but forget to regenerate the embedded module fail loud.

- package.json wires build:admin-embedded + check:admin-embedded.

Closes #1090.

* test(source-id): lock in routing regression coverage (closes #891 #978 #1078)

Audit of every page write path (sync, embed, extract, dream, autopilot,
wikilinks, tags, chunks) confirmed that sourceId already threads
correctly through importFromContent → engine.putPage → SQL INSERT
since v0.18.0. The original bug reports from #891, #978, #1078 were
real at the time and got swept by the multi-source refactor; today's
master is correct.

This commit locks in that correctness with six PGLite regression cases
(no Postgres fixture needed; runs in CI everywhere):

1. importFromContent({sourceId:"work"}) lands at source_id=work, not
   the silent 'default' fallback.
2. Two sources hold the same slug independently.
3. Omitting sourceId falls through to 'default' (legacy contract).
4. Chunks land under the requested source.
5. Tags land under the requested source.
6. FK integrity smoke (originally #1078).

The earlier issue reports stay closed by the existing threading; this
suite ensures any future refactor of the write path can't silently
re-introduce the wrong-source-default bug. The 90-minute write-path
audit budget from the plan resolves here.

* fix(apply-migrations): unblock PGLite chain (closes #1100)

`gbrain apply-migrations --yes` was wedging on the v0.11.0 (Minions)
schema phase for PGLite installs. Two compounding bugs:

1. `apply-migrations` pre-flight schema-version warning connects to
   PGLite to read config.version, then disconnects. The brief lock
   hold races with downstream subprocess spawns that try to re-acquire
   it; the 30s lock timeout fires before the parent fully releases.
   Pre-flight is a *warning*; on PGLite it adds no information the
   orchestrators don't already handle. Skip the probe for PGLite.

2. v0.11.0 phase A spawned `gbrain init --migrate-only` as an execSync
   subprocess to apply schema migrations. PGLite is single-writer;
   the subprocess inherits HOME and tries to lock the same DB. On
   Postgres this works (concurrent connections OK); on PGLite it
   deadlocks. Route in-process for PGLite — create + connect +
   initSchema + disconnect directly, skipping the subprocess hop.
   Postgres keeps the legacy execSync path.

Verified: fresh PGLite install now walks the full migration chain
through v0.32.2 (Facts SoR) and lands "All migrations up to date" on
re-run.

Closes #1100.

* fix(serve): bootstrap token env override + suppress flag (closes #1024)

`gbrain serve --http` regenerated the admin bootstrap token on every
restart and printed it to stderr. In supervisor-managed production
deployments (LaunchAgent, systemd, k8s) every restart leaks the value
into log aggregators and rotates the access for any agent that paste-
copied it.

Two new knobs:

- **GBRAIN_ADMIN_BOOTSTRAP_TOKEN** env var: when set, used as the
  bootstrap secret instead of a fresh per-process token. Validated:
  must match `^[A-Za-z0-9_-]{32,}$` (32-char minimum), else refuse to
  start with a paste-ready generator hint. Failing closed beats
  silently accepting a weak token.

- **--suppress-bootstrap-token** CLI flag: suppresses the printed
  token line entirely. Operator takes responsibility for tracking the
  value out-of-band.

Startup banner now reflects the chosen source:
- `Admin Token: suppressed` when the flag is set.
- `Admin Token: from $GBRAIN_ADMIN_BOOTSTRAP_TOKEN` when env-sourced.
- Full token print only when both are absent (default behavior, dev
  installs).

Closes #1024.

Co-Authored-By: billy-armstrong <billy-armstrong@users.noreply.github.com>

* fix(config): migrate legacy 'provider' + 'model' to 'embedding_model'

Pre-v0.32 docs and some community templates used a config shape:

  { "provider": "voyage", "model": "voyage-4-large" }

The canonical shape (since the v0.31.12 gateway seam) is:

  { "embedding_model": "voyage:voyage-4-large" }

Users on the legacy shape hit silent fallthrough to the hardcoded
OpenAI default; sync + embed errored out with "OpenAI embedding
requires OPENAI_API_KEY" regardless of their actual provider config.

loadConfig() now translates the legacy keys at parse time:
- emits a one-line stderr nudge with the paste-ready canonical key
- preserves the rest of the config unchanged
- skipped when `embedding_model` is already set (forward-compat)

Closes #1086.

Co-Authored-By: jeunessima <jeunessima@users.noreply.github.com>

* chore(test): quarantine upgrade tests (process.env mutation)

PR #1032's cherry-picked tests use the static-snapshot + try/finally
pattern for env vars instead of the project's withEnv() helper. The
test-isolation lint catches process.env mutations outside withEnv to
prevent cross-test leakage in parallel runs.

Renaming to *.serial.test.ts (the quarantine convention) is the
documented out: runs sequentially, no cross-file race. A future cleanup
PR can migrate the tests to withEnv() and drop the quarantine.

* fix(test): update brain-writer .bak assertion for centralized backup path

The v0.36.x frontmatter backup change (bd60cdfcloses #902) moved
.bak files from sibling-of-source to ~/.gbrain/backups/frontmatter/...
The old test still asserted on the sibling path, so CI failed even
though the production behavior was correct.

Updated assertion contract: backup lands under the injected backupRoot
(test-isolated), the returned backupPath ends in .bak and exists, and
no sibling .bak is created next to the source file. The pre-fix
sibling-path is now a negative assertion.

* chore: bump version and changelog (v0.36.1.0)

v0.36.1.0 — community fix wave (28 atomic fixes + 22 PRs closed as
already-shipped + 14 issues triaged).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(fix-wave): close test gaps surfaced by post-ship audit

After the fix-wave shipped, an audit found 11 commits with no new test
file. Some were inherently structural (build pipelines, shell content)
or had existing test coverage that worked either way; others had real
regression risk with no guard. This commit closes the gaps that matter.

New regression tests for:

- OAuth `verifyAccessToken` throws `InvalidTokenError` (not bare Error)
  on both expired and unknown token paths. Pre-fix, the SDK's
  `requireBearerAuth` middleware fell through to 500 instead of 401 →
  client token-refresh logic never fired (#935).

- `loadConfig` translates legacy `{provider, model}` config shape to
  the canonical `embedding_model: <provider>:<model>`. 3 cases: pure
  legacy → migrated; canonical wins over legacy when both present;
  canonical-only is untouched. Pre-fix, Voyage/Cohere/Mistral users
  silently fell through to OpenAI (#1086).

- `configDir` rejects relative paths; rejects `..` segments via both
  separators (regression guard for the Windows path acceptance fix
  #1019 / cherry-pick #1083).

- `resolveBootstrapToken` (new exported helper extracted from
  `runServeHttp`). 9 cases: unset env generates fresh, valid env
  accepted, hyphens/underscores accepted, < 32 chars rejected, special
  chars rejected, whitespace trimmed, empty string rejected, 32-char
  boundary accepted, 31-char one-short rejected. Security-critical
  validation surface (#1024).

- GET /mcp returns 405 with `Allow: POST, DELETE` (E2E case in
  `serve-http-oauth.test.ts`). Pre-fix, claude.ai and other probing
  MCP clients saw 404 and gave up (#1076).

- apply-migrations `process.exit(0)` on list / dry-run / up-to-date
  paths. Source-shape assertion locks the rule in; shell scripts
  gating on `$?` work (#1062).

- Autopilot wrapper sources `~/.zshenv` BEFORE `~/.zshrc`. zshenv is
  the canonical place for env vars in non-interactive zsh; without
  this ordering, LaunchAgent subprocesses never inherit secrets
  exported in zshrc (#966).

- `test/fix-wave-structural.test.ts` consolidates source-shape
  regression guards for fixes whose behavior is hard to runtime-test
  without heavy mocking: query cache drain (#1125), admin embed
  manifest + handler (#1090), admin register-client PKCE branch
  (#1077), PGLite v0.11.0 phase A in-process routing (#1100), query
  `--no-expand` negation (#1124). 9 source-grep assertions.

Refactored `runServeHttp` to extract `resolveBootstrapToken` as a pure
helper. The boot path now consumes the helper's tagged-union result
({kind:'ok'|'error'}); side effects (`process.exit`, `console.error`)
moved to the caller. Unit-testable without spinning up Express.

Test counts: oauth 71 (was 69), config 20 (was 14), apply-migrations
19 (was 18), autopilot-install 5 (was 4), serve-http-bootstrap-token
9 (new file), fix-wave-structural 9 (new file). Net: +28 cases across
6 files; +1 new exported function with full coverage.

Remaining audit gaps (deferred):
- e82dda0 admin embed E2E (post-deploy curl smoke covers this)
- d93fa81 apply-migrations PGLite chain E2E (already smoke-tested
  manually in the original commit; subprocess test would be flaky in
  CI without DATABASE_URL gating)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: close the two deferred E2E gaps from the post-ship audit

Both gaps now have real behavior coverage. No DATABASE_URL needed (PGLite
engine), so they run in standard unit CI alongside the rest of the suite.
Serial quarantine because both spawn subprocesses + bind ports / write
tmpdirs.

test/admin-embed-spawn.serial.test.ts (4 cases, ~6s wall-clock):
  - Spawns `gbrain serve --http` from a fresh tmpdir so `process.cwd()/
    admin/dist` does not exist — this forces the embedded-manifest
    branch (the one under test). Pre-fix, this exact setup hit 404.
  - GET /admin/ → 200 + SPA shell HTML (title + #root div), content-type
    text/html.
  - GET /admin/index.html → same body via explicit path.
  - GET /admin/agents → SPA fallback returns index.html for deep links.
  - GET /admin/api/stats → NOT 200 (regression guard: SPA fallback must
    not swallow /admin/api/* routes and silently return HTML to a JSON
    client). Closes #1090.

test/apply-migrations-pglite-spawn.serial.test.ts (3 cases, ~25s):
  - Seeds a fresh PGLite config in a tmpdir, runs `gbrain init
    --migrate-only` + `gbrain apply-migrations --yes --non-interactive`.
    Pre-fix this hit "GBrain: Timed out waiting for PGLite lock" because
    apply-migrations' pre-flight probe + v0.11.0's phase A subprocess
    both wanted the single-writer lock.
  - Asserts exit 0, no "Timed out" string, no "Phase A failed" string,
    brain.pglite file written.
  - Re-run case: idempotent — "All migrations up to date" exits 0
    (also locks in the #1062 exit-code fix end-to-end).
  - --list path exits 0 (third leg of the #1062 contract).
  Closes #1100.

Pinned bootstrap token via GBRAIN_ADMIN_BOOTSTRAP_TOKEN env so the
admin test doesn't have to scrape stderr; the startup banner format
is allowed to drift, the /health probe is the readiness contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): consolidate PGLite spawn test to one end-to-end pass

CI failed on test/apply-migrations-pglite-spawn.serial.test.ts (Ubuntu,
bun 1.3.14). The previous shape ran 3 tests × ~3 spawns each. Each
`bun run /abs/src/cli.ts` from a tmpdir cwd pays a full parse/transpile
cost (no near-cwd .bun cache); on Ubuntu CI that compounds past the
runner's per-test budget.

Consolidated to ONE test that exercises the full lifecycle in one
brain: init --migrate-only → apply-migrations --yes → re-run → --list.

Four spawns instead of eight. Local wall-clock: 32s → 11.5s. All four
assertion buckets preserved: no PGLite lock timeout, no Phase A
failure, brain.pglite written, idempotent re-run "All migrations up
to date" exits 0 (#1062 end-to-end), --list exits 0.

Per-test timeout 480_000ms as insurance against the runner's
--timeout=60000 default (bun's API spec: per-test wins).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(diag): dump apply-migrations output when CI exit != 0

The PGLite spawn test passes locally on macOS/bun 1.3.13 in ~11s
end-to-end but fails on Ubuntu/bun 1.3.14 in 4.92s with apply.exitCode
= 1 — fast enough that something is failing early, not timing out.
The runCli helper captured stdout+stderr but never printed them, so
the CI log only showed the bare assertion failure.

This commit prints the captured streams from BOTH init and apply
when the exit code mismatches expectation. After the next CI run we
can read the actual error message and diagnose the Ubuntu-specific
failure mode (likely BUN_INSTALL / HOME / PGLite WASM env quirk).
No behavior change; pure diagnostic output gate on failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): shim `gbrain` on PATH for PGLite spawn test

Root cause of the Ubuntu CI failure: the v0.11.0 orchestrator's phase B
runs `execSync('gbrain jobs smoke')`. PGLite phase A now routes
in-process (the #1100 fix), but phase B and several follow-up phases
still shell out to the `gbrain` binary on PATH. Locally the binary
resolves via `bun link`; on CI Ubuntu it does not exist on PATH, so
execSync exits 127 → orchestrator returns 'failed' → apply-migrations
exits 1. Test failed at 4.92s with exitCode=1, well before any timeout.

Verified locally by removing ~/.bun/bin/gbrain to simulate CI:
  pre-shim:  apply.exitCode=1 (same as CI)
  post-shim: apply.exitCode=0 in 8.4s

The shim writes a tiny `gbrain` executable to a tmpdir that just
`exec`s `bun run <repo>/src/cli.ts "$@"`. Prepended to PATH for the
spawned subprocesses. Mirrors the production contract (gbrain on
PATH) without depending on `bun link` having run in the CI image.

Diagnostic dump from the previous commit stays — useful insurance for
the next time something silently fails inside a spawned binary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: johnybradshaw <johnybradshaw@users.noreply.github.com>
Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com>
Co-authored-by: sharziki <sharziki@users.noreply.github.com>
Co-authored-by: Aashiqe10 <Aashiqe10@users.noreply.github.com>
Co-authored-by: lukejduncan <lukejduncan@users.noreply.github.com>
Co-authored-by: 100yenadmin <100yenadmin@users.noreply.github.com>
Co-authored-by: hnshah <hnshah@users.noreply.github.com>
Co-authored-by: p3ob7o <p3ob7o@users.noreply.github.com>
Co-authored-by: sliday <sliday@users.noreply.github.com>
Co-authored-by: nezovskii <nezovskii@users.noreply.github.com>
Co-authored-by: vincedk-alt <vincedk-alt@users.noreply.github.com>
Co-authored-by: sergeclaesen <sergeclaesen@users.noreply.github.com>
Co-authored-by: navin-moorthy <navin-moorthy@users.noreply.github.com>
Co-authored-by: billy-armstrong <billy-armstrong@users.noreply.github.com>
Co-authored-by: jeunessima <jeunessima@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 23, 2026
…anges

Two CI failures from the post-merge state, both pre-existing tests
pinning implementation detail that v0.39.3.0's fixes legitimately
changed. Repair the tests, not the production behavior.

1) test/commands/capture.test.ts — 2 cases ('uses first non-empty
   line as the title' + 'caps title at 80 chars') were checking the
   YAML literal `title: "..."` (double-quoted). v0.39.3.0 Phase 2a
   (BUG-1 frontmatter merge) replaced the hand-rolled `JSON.stringify`-
   quoting with `matter.stringify()`, which follows YAML defaults:
   simple strings emit unquoted (`title: Real first line`), special-
   char strings get single-quoted. The semantic ("title equals X")
   is correct; the literal-quoting check was incidental. Parse the
   YAML and assert on the value via gray-matter.

2) test/fix-wave-structural.test.ts — the v0.36.1.x #1077 PKCE-
   public-clients regex pinned the exact destructure `const { name,
   scopes, tokenTtl, ... } = req.body`. v0.39.3.0 Phase 4b (WARN-9
   admin scopes normalization) moved `scopes` to a separate read
   line so the route can accept BOTH `scopes` (admin SPA) AND `scope`
   (OAuth wire singular) via `?? `. Relax the destructure regex to
   accept either layout AND add a NEW regex pinning the
   `scopes ?? scope` fallback so the actual v0.39.3.0 contract is
   load-bearing. PKCE-fix assertions (tokenEndpointAuthMethod ===
   'none' + client_secret_hash = NULL + token_endpoint_auth_method =
   'none') unchanged.

Both fixes are tests-only — no production code change.

Verify gate clean post-fix; 30/30 cases pass in the two affected
test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 23, 2026
…x wave from PR #1299) (#1308)

* docs: land v0.38.0.0 production smoke-test report (PR #1299 + editor's note)

PR #1299 from garrytan-agents shipped a 308-line production smoke-test
report against v0.38.0.0 on Supabase+PgBouncer. Bringing the report
verbatim into this worktree alongside the actual fixes (v0.38.3.0 wave).

Privacy scrub passed per CLAUDE.md placeholder rule — no real people,
companies, funds, or deals named.

Editor's Note prepended to flag two re-diagnosed findings:
- BUG-2 actual crash line is :1594-1597 (req.body === undefined → JSON.stringify
  returns literal undefined → Buffer.from throws), not the originally
  reported :1508.
- WARN-5 root cause is `capture` missing from CLI_ONLY_SELF_HELP set;
  the detailed HELP constant exists but is unreachable.

Co-Authored-By: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(capture): BUG-1 — merge frontmatter instead of double-wrap on --file

Pre-fix: `gbrain capture --file foo.md` on a file that already has YAML
frontmatter stamped a second outer `---` block whose `title` field was
the file's own opening `---` delimiter. Users got pages with two
frontmatter blocks: outer `title: '---'`, inner the real metadata. The
parser then treated the second block as a body-side horizontal rule.

Root cause: capture.ts:136-152 `buildContent` always prepended its own
frontmatter without inspecting whether the input already had one. The
title-from-first-line heuristic at :141 picked up the `---` delimiter
when present.

Fix: new pure helper `mergeCaptureFrontmatter(rawBody, opts)` parses
existing frontmatter via gray-matter (the lib markdown.ts already uses)
and merges capture's auto-fields with user-wins precedence on user-
declared keys. Single output frontmatter block in all cases.

Precedence:
- `type`:         opts.type (CLI flag) > userFm.type > 'note'
- `title`:        userFm.title > derived-from-body
- `captured_via`: userFm.captured_via > opts.source > 'capture-cli'
- `captured_at`:  userFm.captured_at > now() (user can pre-stamp)
- All other user keys (description, tags, slug, ...) pass through verbatim

Files without frontmatter keep the original behavior (stamp fresh,
wrap under derived `# heading` if body lacks markdown structure).

CQ2 boil-the-lake test coverage in test/capture-build-content.test.ts
(19 cases, 47 assertions): 13 specified cases including CJK title, CRLF
line endings, UTF-8 BOM, empty frontmatter `---\n---\n`, malformed
YAML, no-trailing-newline-before-body, user description/tags/slug
passthrough; plus 3 deriveTitle helper cases, 2 --type CLI flag
precedence cases, and the BUG-1 regression guard with the exact
reported input shape.

Decisions deferred to later wave phases:
- CV3 source_kind taxonomy + CV15 canonical source resolver: Phase 3c
- CV8 dedup hash from rawBody: Phase 3c (receipt) + Phase 3d (DB)
- CV9 trim boundary split: Phase 3c
- CV10 binary-byte guard: Phase 3c

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 2a)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(serve-http): BUG-2 — /ingest 500-HTML on missing body becomes 400 JSON

Pre-fix: a POST /ingest with NO body (no Content-Length, no body bytes
— common shape for misconfigured webhooks and the smoke-test repro)
leaves `req.body === undefined`. The body-coercion block's `else`
branch then called `Buffer.from(JSON.stringify(undefined), 'utf8')` —
and `JSON.stringify(undefined) === undefined` (the literal, not the
string), so `Buffer.from(undefined, 'utf8')` threw TypeError. The
unhandled throw reached express's default error handler, which
served an HTML 500 page. Clients expecting JSON envelopes broke.

Two changes:

1. Explicit `req.body == null` null-guard at the top of the handler
   (BEFORE the body coercion). Catches both `null` and `undefined`
   request bodies and returns the same 400 `empty_body` envelope as
   the empty-Buffer guard at :1600. Closes the TypeError class
   structurally.

2. Outer try/catch wrapping the entire handler body with a
   `!res.headersSent` guard (codex F#16) so any unexpected throw —
   downstream of the inner queue.add try/catch, in a logging
   side-effect, or in the SSE broadcast — returns a JSON 500
   envelope instead of leaking the HTML error page. Mirrors the F14
   pattern around the MCP handler's transport.handleRequest at
   serve-http.ts:1508-1520.

E2E regression test in test/e2e/serve-http-ingest-webhook.test.ts:
'BUG-2: POST with no body (undefined req.body) → 400 JSON envelope
(not 500 HTML)' uses `fetch(URL, {method:'POST'})` with no body field
to provoke the exact pre-fix shape. Asserts status !== 500, status
=== 400, content-type is application/json, body.error === 'empty_body'.

The existing 'empty body → 400' test at line 200 already covered the
empty-string-body case (Express raw() produces an empty Buffer; the
:1600 guard fires correctly). Both cases now reach the same envelope
via two structurally distinct paths.

Codex F#15 correction absorbed: the misleading 'body \"\"' → empty_body
test case was dropped from the plan. An empty JSON string body has
bytes and is not the same as a missing body.

Codex F#14 correction absorbed: header in the smoke-test verification
command is `Authorization:` not `Auth:` (updated in the plan; tests
already use the correct shape).

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 2b)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(put_page): WARN-8 — provenance write-through with CV6 trust gate + CV12 COALESCE

Migration v81 added 4 nullable provenance columns to `pages`
(source_kind, source_uri, ingested_via, ingested_at). Pre-fix, put_page
wrote them into the FILE's frontmatter (operations.ts:637-644 write-
through path) but never to the DB columns. `gbrain call get_page | jq
.source_kind` returned null even when the JSON receipt claimed
`capture-cli`. This commit closes the loop on the write side.

Surface changes:

- src/core/types.ts: PageInput gains 4 optional provenance fields
  (source_kind, source_uri, ingested_via, ingested_at) with docstring
  explaining the trust model.
- src/core/import-file.ts: importFromContent opts accepts 3 (source_kind,
  source_uri, ingested_via); threads them to tx.putPage. ingested_at is
  NOT a caller-controllable param — engine stamps it.
- src/core/operations.ts put_page op: accepts 3 client params on the
  wire schema (uniform across transports). CV6 trust gate: when
  ctx.remote !== false, IGNORES client params and server-stamps
  source_kind='mcp:put_page'/source_uri=null/ingested_via='mcp:put_page'.
  Only ctx.remote === false (capture CLI, autopilot, dream cycle)
  honors client values.
- src/core/postgres-engine.ts + pglite-engine.ts putPage: INSERT/UPDATE
  SQL extended with 4 columns. ingested_at stamped to now() only when
  ANY provenance field is being written this call (null otherwise).
  ON CONFLICT clause uses COALESCE-preserve UPDATE (CV12) so a later
  put_page without provenance does NOT erase the original first-write
  audit trail — first-write-wins for routine edits.

CV6 closes the spoofing surface codex caught: a write-scope OAuth
token can no longer poison the audit trail with arbitrary labels
('source_kind: capture-cli' from an MCP agent). Anything that isn't
strictly `ctx.remote === false` falls through to server-stamped
'mcp:put_page', matching the v0.26.9 F7b fail-closed discipline.

CV12 closes the audit-trail-erasure trap: routine put_page edits (no
provenance args) preserve the original ingestion's source_kind /
ingested_at via `COALESCE(EXCLUDED.x, pages.x)`. Explicit re-ingestion
that passes new provenance overwrites (latest-ingestion-wins).

Tests in test/put-page-provenance.test.ts (11 cases, 29 assertions):
- Trusted local caller: client params populate DB columns; partial
  provenance still triggers ingested_at stamp; omitting all leaves
  all 4 null.
- CV6 spoofing guard: remote caller's source_kind='capture-cli' claim
  becomes 'mcp:put_page'; ctx.remote === undefined treated as remote
  (v0.26.9 F7b: fail-closed on anything not strictly false).
- CV12 COALESCE-preserve: second write without provenance preserves
  first-write audit AND timestamp; explicit re-ingestion with new
  provenance overwrites; remote second write after local first
  records the most-recent honest source (mcp:put_page).
- T2 subagent regression: namespace check fires when provenance params
  are present; subagent within wiki/agents/<id>/ succeeds with server-
  stamped provenance per CV6; missing subagentId still fail-closes.

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 3a)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(read-path): CV5 — expose put_page provenance via getPage / listPages

Phase 3a wrote the 4 provenance columns into the DB via put_page. This
phase makes them visible to the read side so the smoke-test verification
command `gbrain call get_page <slug> | jq .source_kind` actually
returns the value the write side just stored.

Surface changes:

- src/core/types.ts Page: 4 optional fields (source_kind, source_uri,
  ingested_via, ingested_at). Three-state read pattern matching v0.26.5's
  deleted_at convention — undefined when the SELECT projection didn't
  include the column (older callers), null when historical pre-v0.38
  row, populated when the v0.38+ ingestion stamped it.
- src/core/utils.ts rowToPage: maps the 4 columns to Page fields via
  the same conditional-spread pattern as deleted_at / effective_date.
  Older SELECTs without the projection continue to compile.
- src/core/postgres-engine.ts + pglite-engine.ts getPage: projection
  extended to include the 4 columns. listPages uses `SELECT p.*` so
  the columns flow through automatically — no listPages SQL change.
  Both engines' putPage RETURNING clause already projects the 4
  columns from Phase 3a, so the round-trip (write→get) is symmetric.

get_page op + JSON renderer require no changes: `gbrain call get_page`
goes through `runCall` (src/commands/call.ts:52) which is
`console.log(JSON.stringify(result, null, 2))`. Since `result` is the
Page object from get_page (which is the engine's getPage return,
which is rowToPage), the new fields surface automatically.

The markdown renderer (`gbrain get`) goes through serializeMarkdown
which strips structured fields; that's the right behavior for the
markdown view. Structured access stays on `gbrain call get_page` and
`list_pages` (JSON output).

Engine-parity test (T3) extended in test/e2e/engine-parity.test.ts
with 2 new cases:
- 'provenance columns: putPage writes + getPage returns identical
   shape on both engines' — seeds capture-cli provenance, asserts
   source_kind/source_uri/ingested_via match across engines and
   ingested_at is a Date on both.
- 'provenance COALESCE-preserve UPDATE: parity on both engines (CV12)'
   — first write stamps capture-cli, second write WITHOUT provenance
   preserves the first-write source_kind / ingested_via on BOTH
   engines (CV12 first-write-wins is engine-uniform).

Gated on DATABASE_URL — runs PGLite half always; Postgres half skips
without it (existing engine-parity pattern). When the test fires, a
drift between the two engines now fails loudly instead of waiting for
a user's `gbrain migrate --to supabase` to surface the bug.

Phase 3a test suite (test/put-page-provenance.test.ts) re-verified
green (11 pass) after the read-path additions — no regression.

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 3b)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(capture): WARN-1/3/7 + CV3/CV7/CV8/CV9/CV10/CV15/A2 — capture write-side overhaul

Lands the seven decisions resolved in the plan-eng-review for the
capture CLI's write-side. Each addresses a specific smoke-test finding
or codex outside-voice concern. Atomic single commit because all
changes are in capture.ts and tightly coupled (CV9's trim split feeds
CV8's hash input which the tests depend on).

Decisions resolved:

CV3 — source_kind taxonomy: capture ALWAYS stamps source_kind='capture-cli'
in the receipt JSON AND threads it through to put_page's provenance
params. Pre-fix `parsed.source ?? 'capture-cli'` conflated the DB source
FK (where to write) with the ingestion-channel taxonomy (what kind of
ingestion this was). --source now ONLY maps to source_id; source_kind
is closed taxonomy per migration v81's documented set.

CV7 — thin-client --source rejection: when isThinClient(cfg) AND
parsed.source is set, exit 1 BEFORE any network call with a clear error
pointing at the right fix: `gbrain auth register-client <name>
--source <id>` on the server. Mirrors CV6 trust posture (server-side
OAuth client registration owns source scope; per-call override would
reopen the spoofing surface CV6 just closed).

CV8 (CLI side) — receipt content_hash now comes from normalized rawBody,
NOT the assembled fullContent (which contained a timestamp-bearing
frontmatter). Two captures of identical text now produce identical
content_hash, restoring the daemon's 24h LRU dedup at the CLI receipt
layer. The DB hash (importFromContent) gets the same treatment in
Phase 3d.

CV9 — trim boundary split: introduced `normalizeForHash(s)` pure
helper (trim + BOM strip + LF + NFKC). Hash input gets aggressive
normalization for dedup correctness; the STORED body (passed to
buildContent → mergeCaptureFrontmatter) preserves user bytes (CRLF,
BOM, whitespace) for round-trip fidelity. CQ2's CRLF/BOM tests
continue to pass.

CV10 — binary file guard: read --file via `readFileSync(path)` with NO
encoding (Buffer-side), then sniff first 8KB for NUL bytes via
`detectBinaryNullByte(buf)`. Mirror the same sniff for --stdin (which
also now reads Buffer-side via `readStdinBuffer()`). Real UTF-8 text
(including CJK, emoji, BOM) never contains NUL bytes; binary formats
(executables, archives, most image formats) do. Reject with friendly
error before UTF-8 decode mangles the bytes. Deterministic test
fixtures use `Buffer.from([...])` with explicit byte arrays instead
of /dev/urandom (which a 256-byte sample often had no NUL in).

CV15 — canonical source resolver: route through
`resolveSourceWithTier(engine, parsed.source, cwd)` from
src/core/source-resolver.ts (the v0.37.7.0 6-tier chain every other
CLI op uses). Honors flag → env (GBRAIN_SOURCE) → dotfile
(.gbrain-source) → local_path → brain_default → seed_default. Closes
the WARN-3-adjacent UX divergence where capture silently used
`parsed.source ?? 'default'`, ignoring env / dotfile / local_path /
brain_default tiers. Bonus: resolveSourceWithTier's assertSourceExists
throws a friendly error if the source is missing, BEFORE put_page is
called — making capture-level pre-flight redundant for the common
case (A2 fallback below handles TOCTOU + thin-client edge cases).

A2 — friendly FK error rewrite: new `maybeRewriteSourceFkError(err,
sourceId)` helper detects the Postgres FK-violation patterns
('pages_source_id_fk' OR 'foreign key constraint ... source') and
returns a paste-ready hint: `source '<id>' is not registered.
Register it first: gbrain sources add <id> --path <path>`. Applied
in BOTH the local-engine catch block AND the thin-client (callRemoteTool)
catch block per T1 — so the same friendly error surfaces regardless of
install type. Defense-in-depth alongside CV15's upstream check (covers
TOCTOU race + thin-client implicit source from dotfile pointing at a
server-deleted source).

WARN-1 receipt-side dedup, WARN-3 friendly source error, WARN-7 binary
guard, and the source_kind taxonomy fix all become user-visible via
this commit. The DB-side dedup (WARN-1's daemon side) lands in Phase 3d.

HELP text updated:
- Documents the 6-tier source resolution chain
- Notes thin-client --source restriction
- Notes binary content rejection
- Notes dedup behavior (whitespace + line endings normalized)
- Notes source_kind != source_id distinction

Tests in test/capture-runcapture.test.ts (26 cases, 30 assertions):
- CV10 detectBinaryNullByte: 10 cases incl. ASCII, CJK, emoji, BOM,
  start/mid NUL, PNG magic, 8KB cap boundary, empty
- CV9 normalizeForHash: 6 cases incl. whitespace, BOM, CRLF, NFKC,
  no-op on clean, whitespace-only → empty
- CV8 hash stability: 4 cases proving identical input → identical
  hash regardless of whitespace / line endings / timing
- A2 maybeRewriteSourceFkError: 6 cases incl. raw PG message, wrapped
  OperationError, postgres.js-wrapped, unrelated errors, missing
  sourceId, non-Error throws

Phase 3a + Phase 2a tests re-verified green (30 pass across both
files) — no regression in the surfaces this commit didn't directly
touch.

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 3c)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(import-file): CV8 — exclude timestamp frontmatter from DB content_hash

Pre-fix: every gbrain capture invocation produced a fresh DB
content_hash because parsed.frontmatter included captured_at (and now
ingested_at) which change per call. Two consequences:
- existing.content_hash === hash short-circuit never fired for
  identical body captures → every capture re-chunked and re-embedded
  unchanged content, burning embedding API spend
- daemon-side 24h LRU dedup (separate consumer keyed on the same hash)
  silently never matched, defeating its design intent

Phase 3c shipped the CLI-side fix (receipt hash from normalized
rawBody). This commit shipps the DB-side fix.

Approach: strip timestamp-bearing frontmatter keys before hashing,
NOT strip all frontmatter. Whitelist: ['captured_at', 'ingested_at'].
Future timestamp keys add to the list — small, stable surface.

Why not strip ALL frontmatter? Sync would regress: a user editing a
markdown file to add a tag changes the frontmatter without changing
the body. The pre-fix hash captures that change; tag reconciliation
fires. If we stripped all frontmatter, the hash wouldn't change, the
short-circuit would fire, and the tag-add would silently no-op.

The narrow whitelist preserves frontmatter-change-detection for real
edits (tags, type, slug, description, ...) while ignoring the
ephemeral timestamp keys that capture-cli and provenance-write-through
stamp per-call.

Tests in test/import-file.test.ts (4 new cases in 'CV8 DB
content_hash stability' describe block):
- captured_at differences → IDENTICAL hash → second capture status
  'skipped' (short-circuit fires); putPage NOT called the second time
- body change → DIFFERENT hash → second capture status 'imported'
  (real edits still flow through)
- tag change → DIFFERENT hash → re-import fires (REGRESSION GUARD:
  proves frontmatter-change detection survives the strip)
- ingested_at differences → IDENTICAL hash (provenance-only refresh
  doesn't invalidate the chunk cache)

Combined with Phase 3c's CLI-side hash fix, WARN-1 (dedup not actually
deduplicating) is now fully resolved: both the user-visible receipt
hash AND the DB / daemon hash stabilize across identical-content
captures.

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 3d)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cli): WARN-5 + WARN-6 — capture help discoverable; main --help lists BRAIN

WARN-5: `gbrain capture --help` printed only the generic short-circuit
fallback ('Usage: gbrain capture\\n\\ngbrain capture - run gbrain --help
for the full command list.'). Root cause: `capture` was missing from
CLI_ONLY_SELF_HELP at src/cli.ts:34-53. The detailed HELP constant at
src/commands/capture.ts:90+ existed but was unreachable because the
dispatcher's printCliOnlyHelp short-circuit at :101 fired first.

WARN-6: `gbrain --help` did not mention capture / brainstorm / lsd
anywhere. New v0.37/v0.38 commands were implemented and dispatched but
absent from the hardcoded printHelp text.

Two fixes in cli.ts:

1. Add 'capture' to CLI_ONLY_SELF_HELP. This skips the generic
   short-circuit, allowing the dispatch flow to reach runCapture which
   has its own --help branch printing the detailed HELP constant.

2. Add a pre-engine-bind '--help' short-circuit for capture in
   handleCliOnly (mirrors the existing sync + reinit-pglite pattern).
   Without this, `gbrain capture --help` on a fresh tmpdir with no
   config would hit the engine bind at :1077 and exit with 'Cannot
   connect to database' before runCapture's --help branch fires.

3. Add BRAIN section to printHelp text between TOOLS and SOURCES.
   Documents capture / brainstorm / lsd with their key flags, matching
   the tone of the existing grouped sections.

Tests in test/cli-help-discoverability.test.ts (6 cases, 31 assertions):
- WARN-5: capture --help contains every documented flag (--slug,
  --type, --file, --stdin, --source, --quiet, --json)
- WARN-5: output is NOT the generic short-circuit fallback (presence
  of 'Examples:' + length > 10 lines + does not match the bare-
  short-circuit regex)
- WARN-5: -h short flag works too
- WARN-6: main --help mentions all 3 commands as command-line entries
- WARN-6: BRAIN section heading is present and the 3 commands appear
  textually after it
- regression: existing top-level commands (init, doctor, get, search,
  query, import, export, files, embed) still listed (snapshot guard
  against accidental deletion of other groups during the BRAIN insertion)

Tests use spawnSync subprocess execution so the real dispatcher flow
is exercised end-to-end (no mocking of cli.ts internals).

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 4a)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(serve-http): WARN-9 — admin register-client scopes normalization

Pre-fix at serve-http.ts:1091: the /admin/api/register-client route
destructured `scopes` from req.body and passed `scopes || 'read'` to
`oauthProvider.registerClientManual(name, grants, scopes, uris)`.
Three bugs in that single line:

1. Field-name mismatch: OAuth wire format uses `scope` (singular).
   The destructure looked for `scopes` (plural). A request body sending
   `{"scope": "read write"}` had `scopes === undefined`, fell through to
   the `'read'` default, and silently created a read-only client. This
   is the exact behavior the smoke test reported.

2. Array shapes crashed: registerClientManual's parseScopeString calls
   `.split(' ')` on its argument. Arrays don't have `.split`, so
   `{"scopes": ["read", "write"]}` threw TypeError mid-request,
   surfacing as a 500.

3. Empty-array truthy: `[]` is truthy in JS so `scopes || 'read'`
   returned the empty array, also crashing on split.

Codex outside-voice (CV12) also flagged: validation depth is
insufficient. `["read write"]` (a single-element array where the
element contains a space) silently passes the type check but produces
an unknown scope `"read write"` that registers as garbage. Same for
non-string elements (null, numbers), empty strings, and duplicates.

Fix: new `normalizeScopesInput(raw: unknown)` helper in src/core/scope.ts
handles all four valid input shapes and rejects everything else with
a typed error. The admin route accepts BOTH `scopes` (admin SPA) AND
`scope` (OAuth wire), normalizes, and surfaces a 400 invalid_scopes
on validation failure.

Validation matrix:
- undefined / null / missing → 'read' default
- string → split on /\s+/, dedupe, validate each element against
  ALLOWED_SCOPES allowlist, re-join sorted
- string[] → reject non-string elements, reject empty strings, reject
  internal-whitespace (catches ['read write'] bug shape), dedupe,
  validate each element, re-join sorted
- everything else (number, boolean, plain object) → Error
- empty array / whitespace-only string after split → Error
- unknown scope name → InvalidScopeError ('Unknown scope "X". Allowed: ...')

Output is sorted for determinism so two registrations with the same
scope set produce identical DB rows.

Tests in test/scope-normalize.test.ts (28 cases, 30 assertions):
- Happy paths: 12 cases incl. all 4 shapes, dedupe in both directions,
  whitespace tolerance (tab/newline), every hierarchy scope
- Rejection paths: 14 cases incl. number/object/boolean inputs, empty
  array, empty string, non-string array elements, empty-string element,
  whitespace-in-element (the codex bug), unknown scope name in both
  string and array forms, mix of known+unknown
- Determinism: 2 cases proving sorted output is order-independent

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 4b)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(facts-absorb): WARN-4 — suppress 'No database connection' noise with diagnostic

Pre-fix: every gbrain capture invocation logged
'[facts:absorb] failed to log gateway_error for inbox/...: No database
connection: connect() has not been called. Fix: Run gbrain init...'.
Non-fatal — the page write itself succeeded — but loud per-capture
noise that the smoke test rightly flagged as a user-visible bug.

Root cause: the facts subsystem grabs a separate engine handle that
isn't connected on the CLI capture path. The actual write succeeds via
the connected engine that capture uses; the facts:absorb log is a
courtesy that the doctor health check reads. Codex flagged this as
symptom-treatment vs root-cause-fix (CV13). Plan picked the middle
path: suppress the per-capture noise NOW, instrument first-occurrence
with a stack trace so the v0.38.4 fix knows where to look.

Two changes:

CQ1 — typed access via instanceof + .problem field on GBrainError
(NOT string-match on .message). The class GBrainError already has
structured (problem, cause_description, fix) fields per
src/core/types.ts:1104. Matching the structured field is impossible
to silently break: if someone edits the error wording in db.ts
thinking it's cosmetic, the typed access still routes correctly.
String-match on .message would be the same fragility class we just
closed elsewhere in this wave (capture's FK error rewrite uses both
patterns deliberately because raw PG messages don't go through
GBrainError).

CV13 — first-occurrence diagnostic: module-scoped
_hasLoggedDisconnectedFactsAbsorb flag fires ONE stderr warn with
the full stack trace the first time this class occurs in a process.
Subsequent occurrences are silent. The next user reporting the
warning gives us the call site without an extra round-trip.

Other failure classes (GBrainError with a different .problem field,
plain Error, anything else) keep the loud per-call warn so real
subsystem errors (PgBouncer crash, schema drift, etc.) still surface.
The suppression is narrowly scoped to the known-broken-but-non-fatal
WARN-4 class only.

Test seam: `_resetFactsAbsorbDisconnectedFlagForTests()` exported so
each test can assert from a clean slate.

Tests in test/facts-absorb-log.test.ts (4 new cases in 'WARN-4
disconnected-engine suppression' describe block):
- First occurrence prints ONE warn with 'WARN-4' + 'First-occurrence
  trace' substring; subsequent 3 calls are silent
- GBrainError with a DIFFERENT problem field still warns loudly (the
  suppression is narrow, not class-wide)
- Plain Error (non-GBrainError) still warns loudly
- The engine.logIngest call STILL fires for the suppressed case (the
  suppression is on the WARN output, not the attempt — preserves the
  doctor health check's read path if/when the wiring is fixed in v0.38.4)

v0.38.4 follow-up TODO filed per the plan: trace why the facts pipeline
opens a separate engine handle on the CLI capture path, and either
share the connected engine OR no-op the absorb-log when called from a
CLI context. The diagnostic stack trace this commit prints is the
input that fix needs.

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 4c)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(brainstorm): WARN-10 + CV11 — surface SQLSTATE 57014 as typed StructuredAgentError

Pre-fix: brainstorm + lsd silently produced no output on PgBouncer
transaction-mode environments. Postgres statement_timeout fired
canceling listPrefixSampledPages or hybrid search; the unhandled
error reached main()'s catch-all and surfaced as a generic
'gbrain: unknown error' line — after the user had already waited
through the 10-second cost-preview window. Zero ideas, no diagnostic,
no hint about what to do.

Three changes per the resolved decisions:

CV11 + T4 — orchestrator-level entry-point wrap (NOT per-call
whack-a-mole). The public runBrainstorm becomes a thin wrapper that
delegates to runBrainstormImpl inside try/catch; the catch runs
classifyBrainstormError on the thrown value. Adding a new internal
SQL call to runBrainstormImpl is automatically covered — codex F#20's
'scope too narrow' concern resolves structurally.

A3 — reuse StructuredAgentError (the v0.19.0 envelope every new
agent-facing surface uses) with code='brainstorm_timeout'. No new
BrainstormError class; matches CLAUDE.md's stated convention. Future
typed errors in brainstorm follow the same pattern.

T4 + codex F#19 — classifier matches SQLSTATE 57014 specifically (the
spec-defined query_canceled code) via postgres.js .code, alternate
.sqlState, or message-substring fallback. Hint wording reads 'query
canceled' (generic) covering all three PG cancel sub-causes:
statement_timeout (often PgBouncer transaction-mode), lock_timeout,
user-cancel. Honest under each.

CV11 CLI formatter — runBrainstormCli (used by both gbrain brainstorm
AND gbrain lsd) catches StructuredAgentError before main()'s catch-all
sees it. Prints in the cli.ts:188-191 OperationError-block shape:
'Error [<code>]: <message>' then '  Hint: <hint>'. JSON mode emits
the structured envelope (matches serializeError shape). Non-typed
errors fall through to the dispatcher's existing catch — natural
shape preserved (codex F#20 — no broad swallowing).

Files:
- src/core/brainstorm/error-classify.ts (new): isQueryCanceledError +
  classifyBrainstormError pure helpers. Module isolated from the
  orchestrator so the classifier can be unit-tested without spinning
  up the full brainstorm pipeline.
- src/core/brainstorm/orchestrator.ts: imports the helpers; public
  runBrainstorm becomes a try/catch wrapper around the unchanged
  runBrainstormImpl. ~30 LOC change with zero body edits below.
- src/commands/brainstorm.ts: catches StructuredAgentError before the
  generic main() handler. Imports StructuredAgentError from
  '../core/errors.ts'.

Tests in test/brainstorm-timeout.test.ts (14 cases, 43 assertions):
- isQueryCanceledError: 8 cases covering postgres.js {code}, alternate
  {sqlState}, message-substring fallbacks for all 3 cancel sub-causes,
  case-insensitive SQLSTATE match, negative cases (different codes,
  non-DB errors, null/undefined/non-object inputs)
- classifyBrainstormError: 5 cases pinning the StructuredAgentError
  envelope (class, code, message), hint covers all 3 PG sub-causes
  (codex F#19 honesty contract), non-57014 errors pass through with
  SAME REFERENCE (codex F#20 — no clone, no swallow), null/undefined
  pass through, classified Error.message channel is descriptive
- Source-shape regression guards: orchestrator.ts imports the helpers
  AND wraps runBrainstormImpl in try/catch at the public entry point
  (NOT per-call); commands/brainstorm.ts has the CLI formatter
  recognizing StructuredAgentError with 'Error [' + 'Hint:' shape

WARN-10 root cause (PgBouncer-friendly SQL shape for listPrefixSampledPages)
deferred per the plan's out-of-scope rationale — this commit adds the
diagnostic surfacing so users know what hit them instead of silent
no-output. TODOS.md follow-up filed in Phase 6.

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 5)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38.3.0: capture smoke-test wave — VERSION + CHANGELOG + docs + llms regen

VERSION 0.38.2.0 → 0.38.3.0; package.json mirror.

CHANGELOG: ELI10-lead-first entry per CLAUDE.md voice rules. "Things
you can now do" section covers all 12 user-visible fixes (BUG-1
frontmatter merge, BUG-2 /ingest empty body, WARN-1 dedup, WARN-3
friendly source error, WARN-5 capture --help, WARN-6 main --help,
WARN-7 binary guard, WARN-8 provenance write+read, WARN-9 admin
scopes, WARN-10 brainstorm timeout surfacing, WARN-2 type-overwrite
documentation, WARN-4 facts:absorb suppression). "What you'd see in
a concrete example" block shows real terminal output. "Things to
watch" covers CV7 thin-client --source rejection, CV12 COALESCE-
preserve UPDATE semantics, CV3 closed source_kind taxonomy, brainstorm
diagnostic-only deferral, facts:absorb first-occurrence diagnostic.
"To take advantage" verification block with paste-ready commands.
"For contributors" credits @garrytan-agents for PR #1299 and links
the plan + decision trace.

TODOS.md: 7 new v0.39 follow-ups filed at the top (SQL-shape rewrite
of listPrefixSampledPages for PgBouncer, magic-byte allowlist,
facts:absorb root-cause trace, --source-kind override flag, ingest_capture
Minion handler architecture migration, provenance-history table, ingest
webhook provenance pass-through).

CLAUDE.md: single consolidated v0.38.3.0 entry under "Key files" naming
every touched file + every decision (D1, A1-A3, CQ1-CQ2, T1-T4, CV3,
CV5-CV15) with their resolution. Future maintainers see the full
surface from one paragraph instead of grepping the diff.

llms.txt + llms-full.txt: regenerated via `bun run build:llms` per
CLAUDE.md's mandatory chaser ('every CLAUDE.md edit needs a
build:llms chaser or test/build-llms.test.ts fails in CI'). 7 cases
pass post-regen.

Verify gate passes: typecheck clean + all 8 shell pre-checks (privacy,
jsonb, progress, wasm, admin-scope-drift, cli-executable, system-of-
record, eval-glossary-fresh, synthetic-corpus-privacy, skill-brain-
first, fuzz-purity).

Trio audit:
  VERSION:      0.38.3.0
  package.json: 0.38.3.0
  CHANGELOG:    ## [0.38.3.0] - 2026-05-22

Plan: ~/.claude/plans/system-instruction-you-are-working-async-popcorn.md (Phase 6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): update legacy tests for v0.39.3.0 implementation shape changes

Two CI failures from the post-merge state, both pre-existing tests
pinning implementation detail that v0.39.3.0's fixes legitimately
changed. Repair the tests, not the production behavior.

1) test/commands/capture.test.ts — 2 cases ('uses first non-empty
   line as the title' + 'caps title at 80 chars') were checking the
   YAML literal `title: "..."` (double-quoted). v0.39.3.0 Phase 2a
   (BUG-1 frontmatter merge) replaced the hand-rolled `JSON.stringify`-
   quoting with `matter.stringify()`, which follows YAML defaults:
   simple strings emit unquoted (`title: Real first line`), special-
   char strings get single-quoted. The semantic ("title equals X")
   is correct; the literal-quoting check was incidental. Parse the
   YAML and assert on the value via gray-matter.

2) test/fix-wave-structural.test.ts — the v0.36.1.x #1077 PKCE-
   public-clients regex pinned the exact destructure `const { name,
   scopes, tokenTtl, ... } = req.body`. v0.39.3.0 Phase 4b (WARN-9
   admin scopes normalization) moved `scopes` to a separate read
   line so the route can accept BOTH `scopes` (admin SPA) AND `scope`
   (OAuth wire singular) via `?? `. Relax the destructure regex to
   accept either layout AND add a NEW regex pinning the
   `scopes ?? scope` fallback so the actual v0.39.3.0 contract is
   load-bearing. PKCE-fix assertions (tokenEndpointAuthMethod ===
   'none' + client_secret_hash = NULL + token_endpoint_auth_method =
   'none') unchanged.

Both fixes are tests-only — no production code change.

Verify gate clean post-fix; 30/30 cases pass in the two affected
test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rename v0.38.3.0 → v0.39.3.0 inline references across the wave

The merge resolution bumped VERSION + package.json + CHANGELOG header
to v0.39.3.0 (per user direction; higher than master's existing
v0.39.0.0 + v0.39.1.0 commit subjects). But the wave's source code
comments + test file headers + the smoke-test report's Editor's Note +
the CLAUDE.md extension entry all still carried the v0.38.3.0 internal
label.

Sed-pass across 25 files (24 in-tree + the smoke-test report Editor's
Note):
- 13 src/ files: capture.ts, cli.ts, serve-http.ts, operations.ts,
  import-file.ts, types.ts, utils.ts, scope.ts, postgres-engine.ts,
  pglite-engine.ts, facts/absorb-log.ts, brainstorm/orchestrator.ts,
  brainstorm/error-classify.ts
- 10 test files: capture-build-content, capture-runcapture,
  put-page-provenance, scope-normalize, cli-help-discoverability,
  brainstorm-timeout, facts-absorb-log, import-file, e2e/engine-parity,
  e2e/serve-http-ingest-webhook
- docs/v0.38-smoke-test-report.md (Editor's Note only — the filename
  + the report body's references to v0.38.0.0 stay since they identify
  the historical subject)
- CLAUDE.md (extension entry tag)

Comments-only change; no production behavior shift. Existing tests
continue to pass (175 cases across 10 wave-specific test files).

llms.txt + llms-full.txt regenerated to keep the CLAUDE.md update in
sync (per CLAUDE.md's mandatory build:llms chaser).

Trio audit re-confirmed:
  VERSION:      0.39.3.0
  package.json: 0.39.3.0
  CHANGELOG:    ## [0.39.3.0] - 2026-05-22

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan

Copy link
Copy Markdown
Owner

Thank you for this work @lukejduncan — closing as already in master at src/commands/serve-http.ts:1084-1130. The admin endpoint already accepts grantTypes, redirectUris, and tokenEndpointAuthMethod: 'none' with NULL-out of client_secret_hash for public clients.

The v0.41.3.0 wave (#1403) actually fixes a follow-on issue codex caught: the pre-fix flow did INSERT (confidential) → UPDATE (NULL secret_hash), which left a stranded confidential row if the UPDATE failed. v0.41.3.0 makes it atomic via a new registerClientManual(..., tokenEndpointAuthMethod) parameter. Your PR enabled the public-client path; v0.41.3.0 closes the atomicity gap on top of it.

@garrytan garrytan closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants