Skip to content

v0.26.1 fix(oauth): client_credentials tokens rejected by MCP bearer auth#577

Merged
garrytan merged 3 commits intogarrytan:masterfrom
garrytan-agents:fix/oauth-cc-validation
May 3, 2026
Merged

v0.26.1 fix(oauth): client_credentials tokens rejected by MCP bearer auth#577
garrytan merged 3 commits intogarrytan:masterfrom
garrytan-agents:fix/oauth-cc-validation

Conversation

@garrytan-agents
Copy link
Copy Markdown
Contributor

@garrytan-agents garrytan-agents commented May 3, 2026

v0.26.1 fix: client_credentials OAuth tokens rejected by MCP bearer auth

The bug

Every token minted via client_credentials grant is rejected at /mcp with:

HTTP 401 {"error":"invalid_token","error_description":"Token has no expiration time"}

Token issuance works (returns expires_in: 3600), but the MCP SDK bearer auth middleware checks typeof authInfo.expiresAt !== 'number' and gets 'string' because the postgres driver with prepare: false returns all integer columns as strings.

The fix (3 bugs, 22 lines)

  1. Number(row.expires_at) in verifyAccessToken — cast string to number before returning to SDK
  2. OAuth metadata middleware — SDK hardcodes grant_types_supported: ['authorization_code', 'refresh_token']. Middleware intercepts the .well-known response and appends 'client_credentials' so RFC-conformant clients (Claude Code, Cursor) auto-discover the flow.
  3. Express 5 compattrust proxy: 'loopback' (reverse proxy crashes express-rate-limit without it), /admin/{*path} (Express 5 named wildcard syntax)

Reproduction

# Mint a token (works)
curl -X POST .../token -d "grant_type=client_credentials&client_id=...&client_secret=...&scope=read+write"
# Use it (fails before fix, works after)
curl -X POST -H "Authorization: Bearer TOKEN" -H "Accept: application/json, text/event-stream" -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' .../mcp

Context

Found in production connecting Claude Code to gbrain serve --http behind a Caddy reverse proxy.


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Wintermute added 3 commits May 3, 2026 07:14
Three bugs found in production when connecting Claude Code via Tailscale:

1. Token validation fails with 'Token has no expiration time'
   - Root cause: postgres driver with prepare:false returns expires_at as
     string, but MCP SDK's bearerAuth middleware checks typeof === 'number'
   - Fix: Number(row.expires_at) in verifyAccessToken

2. OAuth metadata missing client_credentials grant type
   - Root cause: MCP SDK hardcodes ['authorization_code', 'refresh_token']
     in mcpAuthRouter's .well-known endpoint
   - Fix: middleware intercepts metadata response and appends
     'client_credentials' before it reaches the client
   - Claude Code's native OAuth auto-discovery now finds the CC flow

3. Express 5 compatibility fixes
   - trust proxy: 'loopback' for reverse proxy deployments (Caddy/Tailscale)
     without this, express-rate-limit throws ERR_ERL_UNEXPECTED_X_FORWARDED_FOR
   - /admin/* wildcard → /admin/{*path} (Express 5 named param syntax)
Unit test (oauth.test.ts):
- expiresAt is always a number, not string — SDK bearerAuth compat

Integration tests (serve-http-oauth.test.ts, 7 cases):
- client_credentials token accepted at /mcp (the actual regression)
- token expires_in matches server TTL
- OAuth metadata includes client_credentials grant type
- token endpoint discoverable from metadata
- admin dashboard serves SPA (Express 5 wildcard fix)
- X-Forwarded-For doesn't crash rate limiter (trust proxy fix)
- read-only token cannot call write operations (scope enforcement)

42 tests, 0 failures, 172 assertions.
Spins up a real gbrain serve --http against real Postgres, registers an
OAuth client, mints tokens via client_credentials, and exercises the full
MCP JSON-RPC pipeline end-to-end.

E2E cases (test/e2e/serve-http-oauth.test.ts):
- mint token via client_credentials grant
- minted token accepted at /mcp — tools/list returns tools
- minted token works for tools/call — search executes
- expired/invalid token rejected at /mcp
- missing Authorization header returns 401
- OAuth metadata includes all three grant types
- OAuth metadata issuer matches public URL
- admin dashboard serves SPA (Express 5 wildcard fix)
- admin sub-routes serve SPA fallback
- X-Forwarded-For doesn't crash rate limiter
- read-only token rejected for write operations
- write-scoped token can call read operations
- health endpoint works without auth
- multiple tokens work independently
- wrong client_secret rejected at token endpoint

Unit test addition (test/oauth.test.ts):
- expiresAt is always typeof number (SDK bearerAuth compat)

Total: 50 tests, 0 failures, 201 assertions.
@garrytan garrytan merged commit d01a921 into garrytan:master May 3, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 3, 2026
…ug class (#593)

* feat(oauth): add coerceTimestamp helper + fix BIGINT-as-string bug class

Postgres-js with prepare:false (auto-detected on Supabase pooler / port
6543) returns BIGINT columns as strings. Two surfaces broke on this:
(1) MCP SDK's bearerAuth checks typeof === 'number' and rejected
strings — fixed in v0.26.1 only at line 303 of oauth-provider.ts;
(2) RFC 7591 §3.2.1 requires client_id_issued_at and
client_secret_expires_at to be JSON numbers in DCR responses, not
strings — latent until v0.26.2.

Adds module-private coerceTimestamp() at the SELECT-row → JS-number
boundary. Throws on non-finite (corrupt rows fail loud, not as
fake-valid expiresAt: NaN flowing into the SDK). Returns undefined for
SQL NULL — schema permits NULL on oauth_tokens.expires_at, callers
treat NULL as expired (fail-closed) at comparison sites and preserve
undefined in DCR getClient response per RFC 7591.

Refactors 5 sites:
- L112,113 (getClient) — DCR response numeric-shape compliance.
- L274 (exchangeRefreshToken) — NULL→expired fail-closed contract.
- L296,303 (verifyAccessToken) — single guard, narrowed return.

No `!` non-null assertions: all 5 sites read nullable BIGINT columns
per src/schema.sql:362,363,372. The L296/L303 cleanup also folds in
v0.26.1's inline Number(...) at L303.

* feat(auth): add gbrain auth revoke-client subcommand

Hard-deletes the matching oauth_clients row via atomic
DELETE ... RETURNING. Schema-level FK CASCADE on oauth_tokens.client_id
and oauth_codes.client_id (src/schema.sql:370,382) purges all dependent
rows in the same transaction. No manual delete of dependents needed.

Exit 1 on no-such-client (idempotent: re-running on the same id
produces the same error). Operator-friendly output: prints the client
name + cascade confirmation, no race-prone pre-delete count.

Closes the v0.26.1 process miss where test/e2e/serve-http-oauth.test.ts
afterAll already called this subcommand — silently failing because the
subcommand didn't exist. With this fix, E2E cleanup actually purges
test clients.

* test(oauth): v0.26.2 regression coverage + bun execSync env fix

Unit additions in test/oauth.test.ts:
- 5 cases pinning coerceTimestamp contract (null/undef/string/number/
  throws-on-NaN). The throws-on-NaN case is load-bearing: pre-v0.26.2
  Number(corrupt) → NaN, NaN < now is false → expired check skipped,
  fake-valid expiresAt:NaN flowed to SDK. Now fail-closed.
- NULL expires_at on oauth_tokens insert → verifyAccessToken throws
  "Token expired". Schema permits NULL; pre-v0.26.2 hand-modified rows
  could ride past validation.
- Cascade-deleted client → previously-minted token fails
  verifyAccessToken with "Invalid token" (not "expired"). Pins the
  cascade contract independently of the CLI subprocess path.

E2E additions in test/e2e/serve-http-oauth.test.ts:
- DCR /register HTTP-level response-shape test. Spawns server with
  --enable-dcr, POSTs a client manifest, asserts typeof === 'number'
  on client_id_issued_at and (when present) client_secret_expires_at
  per RFC 7591 §3.2.1. Replaces the v0.26.1 plan's internal-store-only
  test that Codex flagged as the wrong seam.
- Real CLI subprocess test for revoke-client: register → mint token →
  revoke via execSync → assert token rejected at /mcp + cascade
  invalidation visible + re-run exits 1 with "No client found".
- afterAll guards on clientId so pre-registration beforeAll failures
  surface cleanly instead of throwing on undefined during cleanup.
  Also tracks DCR-registered clients alongside the manual one.
- Server fixture: --enable-dcr added so /register is reachable.
- Health endpoint: page_count assertion loosened from > 0 to >= 0
  + typeof number — pre-v0.26.2 broke on fresh-schema E2E runs.

bun execSync env-inheritance fix (the load-bearing infrastructure
fix that unbroke v0.26.2's full-suite test):
- bun's child_process.execSync does NOT inherit env mutations done
  via process.env.X = ...; only OS-level env from before bun started.
- helpers.ts loads .env.testing and sets DATABASE_URL via process.env
  mutation, invisible to subprocesses unless env: { ...process.env }
  is passed explicitly.
- All 4 execSync calls in this file (beforeAll register-client,
  afterAll revoke-client, in-test register-client, in-test
  revoke-client x2) now pass env: { ...process.env }.
- Without this, full bun test suite OAuth E2E fails with "Set
  DATABASE_URL or GBRAIN_DATABASE_URL environment variable" even when
  isolated test/e2e/serve-http-oauth.test.ts runs pass. Pattern is
  documented inline as a reference for other E2E test fixes (see
  TODOS.md "test infra (v0.26.2 follow-up)" for the 22-test backlog).

* build: commit admin/dist + remove gitignore exclusion

CLAUDE.md (admin/ section, v0.26.0 release notes) states:
"output at admin/dist/ is committed for self-contained binaries"

But .gitignore excluded admin/dist/, so the bun --compile binary that
embeds the admin SPA via `import path from '...' with { type: 'file' }`
couldn't resolve in fresh clones. PR #577 (v0.26.1) didn't catch this
because admin tests pass when admin/dist exists locally.

Removes the .gitignore line + commits the current 220KB build:
- index.html (0.7KB)
- assets/index-{hash}.js (210KB / 65KB gzip)
- assets/index-{hash}.css (6.3KB / 1.8KB gzip)

Now `bun build --compile --outfile bin/gbrain src/cli.ts` works on a
fresh clone without a separate `cd admin && bun install && bun run
build` step in CI.

* docs: capturing test output rule + regen llms-full.txt

Adds a CLAUDE.md section "Capturing test output (NEVER pipe through
tail / head)" documenting the iron rule that bit v0.26.2's ship:

  bun test 2>&1 | tail -10  →  exit code = tail's (always 0),
                                failures truncated, ship gates fail open

The pipe form silently breaks /ship Step T1 (test failure ownership
triage) because $? after a pipe is the LAST command's exit code, and
bun prints failure details before the summary line so tail -N drops
them. v0.26.2's first ship attempt reported "3911 pass / 23 fail" but
no failure details survived, forcing a 23-minute re-run to triage.

Right pattern: redirect to a file first, then tail the file separately.

Regenerates llms-full.txt to match the new CLAUDE.md content (drift
guard at test/build-llms.test.ts enforces this).

* docs: P0 TODO for 22 pre-existing test failures unrelated to OAuth

Captures the test-infra backlog uncovered by v0.26.2's full bun test
run. None of the 22 failing cases touch the OAuth diff:

- 12 Git-to-DB Sync Pipeline cases (state-machine drift)
- 3 multi-source cascade + sync routing cases
- E2E sync-parallel, sync --skip-failed, doctor, dream, runCycle,
  claw-test fresh-install, BrainRegistry lazy init

Likely root causes for several: same bun execSync env-inheritance
pattern fixed in test/e2e/serve-http-oauth.test.ts during v0.26.2
(documented in the TODO + the inline test comment for the next
maintainer to find).

Separating from v0.26.2 keeps the OAuth ship focused on the bug
class it was scoped for. Fix-wave deserves its own PR.

* chore: bump to v0.26.2 + CHANGELOG

VERSION 0.26.0 → 0.26.2. Includes a retroactive v0.26.1 entry above
v0.26.0 because PR #577 shipped its three fixes (oauth-provider:303
Number cast, OAuth metadata interceptor, Express 5 trust proxy +
admin wildcard) without bumping VERSION/package.json/CHANGELOG —
this branch catches the changelog up to commit history.

v0.26.2 release-summary covers the OAuth string-vs-number bug class
fix (5 sites + coerceTimestamp helper), the gbrain auth revoke-client
subcommand landing as a real CLI, and the bun execSync env-inheritance
fix that unblocked full-suite E2E OAuth tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: post-ship updates for v0.26.2

- CLAUDE.md src/core/oauth-provider.ts: append v0.26.2 coerceTimestamp boundary helper note (5 call sites, NULL semantics, throw-on-NaN posture, intentionally module-private)
- CLAUDE.md src/commands/auth.ts: add v0.26.2 revoke-client subcommand with FK CASCADE cleanup
- CLAUDE.md test/oauth.test.ts: bump v0.26.2 case additions (5 coerceTimestamp + NULL-expires_at + cascade-delete contract)
- CLAUDE.md test/e2e/serve-http-oauth.test.ts: new entry covering v0.26.0 + v0.26.2 expansion (DCR HTTP-level test, CLI subprocess revoke-client test, bun execSync env-inheritance fix as reference for sibling E2Es)
- README.md: add gbrain auth revoke-client to command list
- llms-full.txt: regenerate after CLAUDE.md edits

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 3, 2026
Master added v0.26.2 (#593) and v0.26.1 (#577) entries via the OAuth
fix-wave landing path. Resolved three conflicts:

- VERSION: kept 0.26.4 (this branch's version)
- package.json: kept 0.26.4
- CHANGELOG.md: kept both — v0.26.4 entry on top, then master's new
  v0.26.2 + v0.26.1 entries below. Final order: 0.26.4 → 0.26.2 →
  0.26.1 → 0.26.0 (top to bottom, contiguous version sequence).

Also fixed a banned-name reference master shipped in the v0.26.1
CHANGELOG credit line ("Co-authored by Wintermute" → "Co-authored by
your OpenClaw") per CLAUDE.md:550 / scripts/check-privacy.sh.

Regenerated llms-full.txt to match the merged CHANGELOG.

Verified post-merge: typecheck clean, schema-diff unit tests 17/17 pass.
Master's diff did not touch src/schema.sql, src/core/pglite-schema.ts,
or src/core/migrate.ts, so the new schema-drift gate's behavior is
unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants