Skip to content

Commit 1055e10

Browse files
garrytanclaude
andauthored
v0.26.2 fix(oauth): bun execSync env inheritance + BIGINT-as-string bug class (garrytan#593)
* feat(oauth): add coerceTimestamp helper + fix BIGINT-as-string bug class Postgres-js with prepare:false (auto-detected on Supabase pooler / port 6543) returns BIGINT columns as strings. Two surfaces broke on this: (1) MCP SDK's bearerAuth checks typeof === 'number' and rejected strings — fixed in v0.26.1 only at line 303 of oauth-provider.ts; (2) RFC 7591 §3.2.1 requires client_id_issued_at and client_secret_expires_at to be JSON numbers in DCR responses, not strings — latent until v0.26.2. Adds module-private coerceTimestamp() at the SELECT-row → JS-number boundary. Throws on non-finite (corrupt rows fail loud, not as fake-valid expiresAt: NaN flowing into the SDK). Returns undefined for SQL NULL — schema permits NULL on oauth_tokens.expires_at, callers treat NULL as expired (fail-closed) at comparison sites and preserve undefined in DCR getClient response per RFC 7591. Refactors 5 sites: - L112,113 (getClient) — DCR response numeric-shape compliance. - L274 (exchangeRefreshToken) — NULL→expired fail-closed contract. - L296,303 (verifyAccessToken) — single guard, narrowed return. No `!` non-null assertions: all 5 sites read nullable BIGINT columns per src/schema.sql:362,363,372. The L296/L303 cleanup also folds in v0.26.1's inline Number(...) at L303. * feat(auth): add gbrain auth revoke-client subcommand Hard-deletes the matching oauth_clients row via atomic DELETE ... RETURNING. Schema-level FK CASCADE on oauth_tokens.client_id and oauth_codes.client_id (src/schema.sql:370,382) purges all dependent rows in the same transaction. No manual delete of dependents needed. Exit 1 on no-such-client (idempotent: re-running on the same id produces the same error). Operator-friendly output: prints the client name + cascade confirmation, no race-prone pre-delete count. Closes the v0.26.1 process miss where test/e2e/serve-http-oauth.test.ts afterAll already called this subcommand — silently failing because the subcommand didn't exist. With this fix, E2E cleanup actually purges test clients. * test(oauth): v0.26.2 regression coverage + bun execSync env fix Unit additions in test/oauth.test.ts: - 5 cases pinning coerceTimestamp contract (null/undef/string/number/ throws-on-NaN). The throws-on-NaN case is load-bearing: pre-v0.26.2 Number(corrupt) → NaN, NaN < now is false → expired check skipped, fake-valid expiresAt:NaN flowed to SDK. Now fail-closed. - NULL expires_at on oauth_tokens insert → verifyAccessToken throws "Token expired". Schema permits NULL; pre-v0.26.2 hand-modified rows could ride past validation. - Cascade-deleted client → previously-minted token fails verifyAccessToken with "Invalid token" (not "expired"). Pins the cascade contract independently of the CLI subprocess path. E2E additions in test/e2e/serve-http-oauth.test.ts: - DCR /register HTTP-level response-shape test. Spawns server with --enable-dcr, POSTs a client manifest, asserts typeof === 'number' on client_id_issued_at and (when present) client_secret_expires_at per RFC 7591 §3.2.1. Replaces the v0.26.1 plan's internal-store-only test that Codex flagged as the wrong seam. - Real CLI subprocess test for revoke-client: register → mint token → revoke via execSync → assert token rejected at /mcp + cascade invalidation visible + re-run exits 1 with "No client found". - afterAll guards on clientId so pre-registration beforeAll failures surface cleanly instead of throwing on undefined during cleanup. Also tracks DCR-registered clients alongside the manual one. - Server fixture: --enable-dcr added so /register is reachable. - Health endpoint: page_count assertion loosened from > 0 to >= 0 + typeof number — pre-v0.26.2 broke on fresh-schema E2E runs. bun execSync env-inheritance fix (the load-bearing infrastructure fix that unbroke v0.26.2's full-suite test): - bun's child_process.execSync does NOT inherit env mutations done via process.env.X = ...; only OS-level env from before bun started. - helpers.ts loads .env.testing and sets DATABASE_URL via process.env mutation, invisible to subprocesses unless env: { ...process.env } is passed explicitly. - All 4 execSync calls in this file (beforeAll register-client, afterAll revoke-client, in-test register-client, in-test revoke-client x2) now pass env: { ...process.env }. - Without this, full bun test suite OAuth E2E fails with "Set DATABASE_URL or GBRAIN_DATABASE_URL environment variable" even when isolated test/e2e/serve-http-oauth.test.ts runs pass. Pattern is documented inline as a reference for other E2E test fixes (see TODOS.md "test infra (v0.26.2 follow-up)" for the 22-test backlog). * build: commit admin/dist + remove gitignore exclusion CLAUDE.md (admin/ section, v0.26.0 release notes) states: "output at admin/dist/ is committed for self-contained binaries" But .gitignore excluded admin/dist/, so the bun --compile binary that embeds the admin SPA via `import path from '...' with { type: 'file' }` couldn't resolve in fresh clones. PR garrytan#577 (v0.26.1) didn't catch this because admin tests pass when admin/dist exists locally. Removes the .gitignore line + commits the current 220KB build: - index.html (0.7KB) - assets/index-{hash}.js (210KB / 65KB gzip) - assets/index-{hash}.css (6.3KB / 1.8KB gzip) Now `bun build --compile --outfile bin/gbrain src/cli.ts` works on a fresh clone without a separate `cd admin && bun install && bun run build` step in CI. * docs: capturing test output rule + regen llms-full.txt Adds a CLAUDE.md section "Capturing test output (NEVER pipe through tail / head)" documenting the iron rule that bit v0.26.2's ship: bun test 2>&1 | tail -10 → exit code = tail's (always 0), failures truncated, ship gates fail open The pipe form silently breaks /ship Step T1 (test failure ownership triage) because $? after a pipe is the LAST command's exit code, and bun prints failure details before the summary line so tail -N drops them. v0.26.2's first ship attempt reported "3911 pass / 23 fail" but no failure details survived, forcing a 23-minute re-run to triage. Right pattern: redirect to a file first, then tail the file separately. Regenerates llms-full.txt to match the new CLAUDE.md content (drift guard at test/build-llms.test.ts enforces this). * docs: P0 TODO for 22 pre-existing test failures unrelated to OAuth Captures the test-infra backlog uncovered by v0.26.2's full bun test run. None of the 22 failing cases touch the OAuth diff: - 12 Git-to-DB Sync Pipeline cases (state-machine drift) - 3 multi-source cascade + sync routing cases - E2E sync-parallel, sync --skip-failed, doctor, dream, runCycle, claw-test fresh-install, BrainRegistry lazy init Likely root causes for several: same bun execSync env-inheritance pattern fixed in test/e2e/serve-http-oauth.test.ts during v0.26.2 (documented in the TODO + the inline test comment for the next maintainer to find). Separating from v0.26.2 keeps the OAuth ship focused on the bug class it was scoped for. Fix-wave deserves its own PR. * chore: bump to v0.26.2 + CHANGELOG VERSION 0.26.0 → 0.26.2. Includes a retroactive v0.26.1 entry above v0.26.0 because PR garrytan#577 shipped its three fixes (oauth-provider:303 Number cast, OAuth metadata interceptor, Express 5 trust proxy + admin wildcard) without bumping VERSION/package.json/CHANGELOG — this branch catches the changelog up to commit history. v0.26.2 release-summary covers the OAuth string-vs-number bug class fix (5 sites + coerceTimestamp helper), the gbrain auth revoke-client subcommand landing as a real CLI, and the bun execSync env-inheritance fix that unblocked full-suite E2E OAuth tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: post-ship updates for v0.26.2 - CLAUDE.md src/core/oauth-provider.ts: append v0.26.2 coerceTimestamp boundary helper note (5 call sites, NULL semantics, throw-on-NaN posture, intentionally module-private) - CLAUDE.md src/commands/auth.ts: add v0.26.2 revoke-client subcommand with FK CASCADE cleanup - CLAUDE.md test/oauth.test.ts: bump v0.26.2 case additions (5 coerceTimestamp + NULL-expires_at + cascade-delete contract) - CLAUDE.md test/e2e/serve-http-oauth.test.ts: new entry covering v0.26.0 + v0.26.2 expansion (DCR HTTP-level test, CLI subprocess revoke-client test, bun execSync env-inheritance fix as reference for sibling E2Es) - README.md: add gbrain auth revoke-client to command list - llms-full.txt: regenerate after CLAUDE.md edits Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d01a921 commit 1055e10

15 files changed

Lines changed: 591 additions & 29 deletions

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,10 @@ bin/
1111
.gstack/
1212
supabase/.temp/
1313
.claude/skills/
14-
admin/dist/
14+
# admin/dist/ is the React SPA bundle. CLAUDE.md says it's committed for
15+
# self-contained binaries (the bun --compile path embeds it via
16+
# `import path from 'admin/dist/index.html' with { type: 'file' }`).
17+
# Build via: cd admin && bun install && bun run build.
1518
admin/node_modules/
1619
.idea
1720
eval/reports/

CHANGELOG.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,107 @@
22

33
All notable changes to GBrain will be documented in this file.
44

5+
## [0.26.2] - 2026-05-03
6+
7+
## **MCP fix-wave: every postgres-as-string OAuth bug, killed at the boundary.**
8+
## **Bigger guarantees: `revoke-client` lands as a real CLI, NULL expires_at is treated as expired, corrupt rows fail loud at the row-read boundary instead of skating past validation.**
9+
10+
`gbrain serve --http` now does the right thing on every postgres-driver-as-string edge case the v0.26.1 hot-fix didn't reach. The same bug class that broke `client_credentials` token validation in production (postgres.js with `prepare: false` returns BIGINT columns as strings, and the MCP SDK's bearerAuth checks `typeof === 'number'`) hides at four other read sites in `src/core/oauth-provider.ts`. Two of those flow into the RFC 7591 §3.2.1 Dynamic Client Registration response, where strict OAuth clients reject string timestamps and the registration silently fails. v0.26.2 closes the bug class with a single named helper at the boundary.
11+
12+
The shape changed during eng + outside-voice review. The first draft normalized rows with inline `Number(...)` calls, but a `Number('foo') → NaN` slipping through is fail-OPEN, not fail-closed: `NaN < now` is `false`, so the expired-token branch is skipped and the SDK gets `expiresAt: NaN` as if the token were valid. Codex flagged this. The shipped helper, `coerceTimestamp()`, throws on non-finite input — corrupt rows fail loud at the boundary instead of riding through token validation.
13+
14+
Plus: `gbrain auth revoke-client <client_id>` lands as a first-class CLI subcommand. Schema-level `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token and authorization code in a single atomic transaction. The matching v0.26.1 E2E test had been calling this subcommand all along — silently failing because the subcommand didn't exist. v0.26.2 makes the cleanup actually work.
15+
16+
### The numbers that matter
17+
18+
5 string-vs-number sites identified in the original v0.26.1 audit; 5 fixed in v0.26.2. 4 new tests covering surfaces v0.26.1 didn't reach: real DCR `/register` HTTP-level response shape, real CLI subprocess invocation of `revoke-client`, NULL `expires_at` semantics, cascade-delete contract.
19+
20+
| Metric | BEFORE v0.26.2 | AFTER v0.26.2 | Δ |
21+
|---|---|---|---|
22+
| Sites where postgres-as-string can break OAuth | 4 latent (1 fixed in v0.26.1) | 0 | bug class closed |
23+
| `Number(...)` on corrupt row | flows through as NaN (fail-OPEN) | helper throws (fail-CLOSED) | loud failure |
24+
| `gbrain auth revoke-client` | doesn't exist | first-class CLI subcommand | +1 |
25+
| E2E afterAll cleanup | silently failing | actually deletes the test client | reliable |
26+
| DCR `/register` response timestamps | strings under `prepare: false` | RFC 7591 §3.2.1 numbers | spec-compliant |
27+
28+
### What this means for operators
29+
30+
Strict OAuth clients (Claude Code, Cursor) connecting via `gbrain serve --http` get spec-compliant `client_id_issued_at` numbers in their DCR responses. Operators get a real `revoke-client` subcommand and CASCADE-driven token purge. CI runs no longer leak orphan `gbrain_cl_*` rows on every E2E pass. Run `gbrain upgrade`. No schema migration. No manual step.
31+
32+
### For contributors
33+
34+
The boundary helper `coerceTimestamp` is intentionally module-private to `src/core/oauth-provider.ts` and not promoted to `src/core/utils.ts`. Codex review flagged repo-wide BIGINT precision-loss risk for a generic helper; the OAuth surface is bounded and well-understood, the rest of the repo isn't. Promote later if the pattern recurs.
35+
36+
### Known caveats
37+
38+
Hard-deleting a client orphans its entries in `mcp_request_log` (the table stores `token_name` TEXT with no FK). The admin UI's request-log view will show those entries with the literal token_name and no client correlation. Acceptable for a fix-wave; v0.27 can add a `[revoked]` badge or `LEFT JOIN`-aware rendering if forensics needs grow.
39+
40+
## To take advantage of v0.26.2
41+
42+
`gbrain upgrade` is sufficient. No schema migration. No manual step.
43+
44+
```bash
45+
gbrain upgrade
46+
gbrain --version # should print 0.26.2
47+
```
48+
49+
If you operate `gbrain serve --http` and have OAuth clients registered, no client-side action is needed. Existing tokens keep working. Rolling token rotation continues to work. The new `gbrain auth revoke-client <client_id>` subcommand is available for cleanup.
50+
51+
### Itemized changes
52+
53+
#### OAuth bug-class fixes
54+
- **`coerceTimestamp()` boundary helper** in `src/core/oauth-provider.ts`. Throws on non-finite input (NaN/Infinity); returns undefined for SQL NULL so callers decide NULL semantics explicitly. Doc comment names the three load-bearing pieces: postgres `prepare: false` BIGINT-as-string behavior, MCP SDK's `typeof === 'number'` bearerAuth check, RFC 7591 §3.2.1 JSON-number requirement.
55+
- **5 call sites refactored** to use the helper:
56+
- `getClient` (L112, L113): `client_id_issued_at` and `client_secret_expires_at` now flow through the helper, so DCR `/register` responses are RFC-compliant numbers.
57+
- `exchangeRefreshToken` (L274): NULL `expires_at` is treated as expired (fail-closed). Schema permits NULL on `oauth_tokens.expires_at`; corrupt rows can no longer ride past validation.
58+
- `verifyAccessToken` (L296, L303): same NULL-as-expired contract for access tokens; the SDK's bearerAuth gets a guaranteed `typeof === 'number'` value.
59+
- **Removed inline `Number(...)` from L303** introduced in v0.26.1; replaced with the helper-narrowed value from the L296 guard for consistency. Behavior unchanged.
60+
61+
#### New CLI subcommand
62+
- **`gbrain auth revoke-client <client_id>`** lands in `src/commands/auth.ts`. Atomic `DELETE...RETURNING` on `oauth_clients`, FK CASCADE purges `oauth_tokens` and `oauth_codes`. Prints client name + cascade confirmation. `process.exit(1)` on no-such-client (idempotent: re-running on the same id produces the same exit-1 message).
63+
- Help text + router case wired alongside `register-client`.
64+
65+
#### Tests
66+
- `test/oauth.test.ts`: 5 unit cases for `coerceTimestamp` (null/undefined/string/number/throw-on-NaN), NULL-`expires_at`-as-expired contract test, cascade-delete contract test.
67+
- `test/e2e/serve-http-oauth.test.ts`: real DCR `/register` HTTP-level response-shape test (asserts `typeof body.client_id_issued_at === 'number'` over the wire, not just internal-store shape); real CLI subprocess test for `revoke-client` (registers → mints token → revokes via execSync → asserts token rejected at `/mcp` → asserts re-run exits 1).
68+
- E2E `afterAll` cleanup: now guarded on `clientId` (won't throw if `beforeAll` failed before registration); cleanup errors surface to stderr without throwing so real test failures aren't masked. Tracks DCR-registered clients alongside the manual one.
69+
- Server fixture: `--enable-dcr` added so `/register` is reachable in the DCR test.
70+
71+
#### Mechanics
72+
- `VERSION``0.26.2`. `package.json``0.26.2`. `bun.lock` refreshed.
73+
74+
### Credits
75+
76+
This branch was driven by an audit of PR #577 (v0.26.1). Codex independent review surfaced 5 factual errors and 7 design gaps the in-house eng review had cleared. The shipped scope is tighter and more honest than the original D1 plan — the outside voice was the load-bearing input.
77+
78+
## [0.26.1] - 2026-05-03
79+
80+
## **MCP bearer-auth hot-fix: `client_credentials` tokens stop being rejected at `/mcp`.**
81+
82+
A three-bug fix-wave landed on master as PR #577 to unblock production OAuth connections. Every token minted via `client_credentials` was being rejected at `/mcp` with `HTTP 401 {"error":"invalid_token","error_description":"Token has no expiration time"}`. Token issuance worked; validation failed because the postgres-js driver with `prepare: false` returns BIGINT columns as strings and the MCP SDK's bearerAuth middleware checks `typeof authInfo.expiresAt === 'number'`.
83+
84+
Found in production connecting Claude Code through Caddy/Tailscale to `gbrain serve --http`.
85+
86+
### What shipped
87+
88+
- **`Number(row.expires_at)` cast** in `verifyAccessToken` (`src/core/oauth-provider.ts:303`) so the SDK gets a JS number, not a postgres string.
89+
- **OAuth metadata interceptor middleware** in `src/commands/serve-http.ts:164-175`. The MCP SDK hardcodes `grant_types_supported: ['authorization_code', 'refresh_token']` in its `.well-known/oauth-authorization-server` response. The middleware patches `res.json` to append `client_credentials` so RFC-conformant clients (Claude Code, Cursor) auto-discover the flow.
90+
- **Express 5 compat fixes** in `src/commands/serve-http.ts`:
91+
- `app.set('trust proxy', 'loopback')` so reverse-proxy deployments (Caddy on localhost, Tailscale) don't crash `express-rate-limit` with `ERR_ERL_UNEXPECTED_X_FORWARDED_FOR`. Restricts proxy trust to localhost only — does NOT trust arbitrary `X-Forwarded-For`.
92+
- `/admin/{*path}` (Express 5 named-wildcard syntax) instead of the bare `/admin/*` Express 5 dropped.
93+
94+
### Tests
95+
96+
50 cases / 201 assertions including a real-Postgres E2E (`test/e2e/serve-http-oauth.test.ts`) that spawns a subprocess server, registers an OAuth client via the CLI, mints tokens via client_credentials, and exercises the full MCP JSON-RPC pipeline end-to-end.
97+
98+
### Process note
99+
100+
PR #577 shipped its three fixes but did not bump `VERSION`, `package.json`, or `CHANGELOG.md`. v0.26.2 retroactively writes this v0.26.1 entry so the changelog matches the commit history. The /ship workflow's version idempotency check (Step 12) will catch drifts like this in the future.
101+
102+
### Credits
103+
104+
Co-authored by Wintermute. Found in production. Three bugs, 22 lines, real fix.
105+
5106
## [0.26.0] - 2026-04-25
6107

7108
## **Multi-agent MCP is real. OAuth 2.1, HTTP server, React admin dashboard. Ship once, every AI client connects.**

0 commit comments

Comments
 (0)