Skip to content

v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter#1289

Merged
garrytan merged 16 commits into
masterfrom
garrytan/minions-agents-improvements
May 22, 2026
Merged

v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter#1289
garrytan merged 16 commits into
masterfrom
garrytan/minions-agents-improvements

Conversation

@garrytan

@garrytan garrytan commented May 22, 2026

Copy link
Copy Markdown
Owner

Summary

v0.38.0.0 — Agents+Minions cathedral wave (4 atomic slices, plan-ceo-review + plan-eng-review + 2× codex cleared, plus a flake-cleanup follow-up).

Provider-agnostic subagent loop. Kills the Anthropic pin (3 layers: queue gate, runtime fallback, doctor check). The replay key moves to gbrain-owned (uuid v7 + per-turn ordinal, persisted at first observation in subagent_tool_executions). OpenAI, Google Gemini, OpenRouter, openai-compatible servers (Ollama, LiteLLM, vLLM, llama-server) all work. Behind agent.use_gateway_loop flag (default off in this patch — dogfood, then flip).

Remote MCP dispatch via submit_agent. New op, new agent OAuth scope (sibling to admin, NOT implied — existing admin clients must re-register to opt in). Per-dispatch binding enforcement: bound_tools, bound_source_id (FK sources), bound_brain_id, bound_slug_prefixes, bound_max_concurrent, budget_usd_per_day.

Reserve-then-settle budget meter via pg_advisory_xact_lock (mirror of rate-leases.ts). Two concurrent agents from the same client can no longer both pre-flight at the cap boundary and bust it. mcp_spend_reservations table for in-flight reservations with TTL; sweep on every reserve.

JSONL audit trail at ~/.gbrain/audit/agent-jobs-YYYY-Www.jsonl per submission. Prompt text never logged — only byte count.

Bonus: 12 pre-existing test flakes eliminated. Quarantined 4 cross-file-contended hybrid-search files (.serial.test.ts rename — established R2 quarantine pattern). Root cause: shared module-level state in src/core/ai/gateway.ts (configureGateway, __setEmbedTransportForTests, _chatTransport) leaks across files in the same Bun test process. Also wrapped test/minions/agent-audit.test.ts through withEnv() (R1 lint). Net delta: 12 fails → 0 fails on the full unit suite.

Test Coverage

Surface Cases File
capabilities classifier 12 test/ai/capabilities.test.ts
gateway.toolLoop control flow 7 test/ai/gateway-tool-loop.test.ts
budget-meter reserve/settle/sweep 15 test/minions/budget-meter.test.ts
agent-audit JSONL 7 test/minions/agent-audit.test.ts
Layer 1/2/3 flips (agent-cli) 4 updated test/agent-cli.test.ts

41 new unit cases; schema-bootstrap-coverage + scope + oauth + model-config tests updated for v0.38 semantics.

Migrations

Version What
v81 subagent_tool_executions.ordinal + .gbrain_tool_use_id + UNIQUE(job_id, message_idx, ordinal)
v82 mcp_spend_reservations table
v83 oauth_clients.budget_usd_per_day NUMERIC(10,2) NULL
v84 oauth_clients.bound_tools / .bound_source_id (FK sources) / .bound_brain_id / .bound_slug_prefixes / .bound_max_concurrent

All idempotent (DROP-IF-EXISTS + ADD pattern on PGLite). schema-bootstrap-coverage.test.ts passes — both engines covered.

Pre-Landing Review

Walked /plan-ceo-review (3 scope options + 3 sub-decisions locked, Option B), /plan-eng-review (7 issues across architecture/code-quality/tests/perf — D3-D9 locked), and 2× codex outside voice (D11-D13 absorbed; round 2 caught a load-bearing blocker: Slice 1 stable-ID design needed v81 migration that wasn't in the plan; fixed before any code landed). 13 decisions locked, 0 unresolved.

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md

Test plan

  • Typecheck clean (0 errors)
  • Unit suite: full run exit 0; 7/8 parallel shards pass with zero (fail) markers; serial pass 29/29 clean (shard 4 wedge on migrate.test.ts is a separate slow-test scoping concern, not a regression)
  • scripts/check-test-isolation.sh: 526 non-serial unit files scanned, 0 violations
  • Pre-flight gates: check:privacy / check:jsonb / check:progress / check:wasm all clean
  • E2E suite: 93/99 files pass (673/684 tests); 11 failures all pre-existing on master (cycle-consolidate / dream-synthesize-chunking / engine-parity / multimodal-postgres / phantom-redirect / voyage-multimodal — verified by stash-and-rerun)
  • Real-Postgres bootstrap: schema migrated through v84, RLS on 49/49 tables, pgvector loaded
  • All 9 v0.38 commits bisect-friendly with atomic scope (8 v0.38 + 1 flake-cleanup)

To use after upgrade

Provider-agnostic loop (opt-in this patch):

gbrain config set agent.use_gateway_loop true
gbrain config set models.tier.subagent openai:gpt-5.2
gbrain agent run "research acme corp" --tools search,query --follow

Remote MCP client registration with full binding:

gbrain auth register-client cursor-agent \
  --scopes read,agent \
  --bound-tools search,get_page,put_page \
  --bound-source default \
  --bound-slug-prefixes wiki/ \
  --bound-max-concurrent 3 \
  --budget-usd-per-day 5.00

🤖 Generated with Claude Code

garrytan and others added 10 commits May 21, 2026 16:09
… module

Adds the storage substrate for the gateway-native subagent tool loop:

  - migration v81 adds subagent_tool_executions.ordinal + .gbrain_tool_use_id
    + UNIQUE(job_id, message_idx, ordinal). NULL-tolerant so legacy rows
    survive untouched; the v0.38 read-time D5 shim recomputes the stable
    key for pre-v81 rows from (job_id, message_idx, content_blocks index,
    tool_name) without a data migration. Engine-aware via sqlFor.pglite.
  - src/core/ai/capabilities.ts reads ChatTouchpoint fields from each
    recipe and exposes getProviderCapabilities() + classifyCapabilities()
    with a 5-state verdict (ok / degraded:no_caching / degraded:no_parallel
    / unusable:no_tools / unknown). This is what enforceSubagentCapable
    (D7, S1.8) will gate on once the queue.ts pin removal (S1.7) lands.
  - 12 unit cases in test/ai/capabilities.test.ts pin the verdict matrix
    across Anthropic, OpenAI, Google, voyage (no chat → unknown), unknown
    provider, missing-colon malformed input.

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md
Wave: v0.38 (Agents+Minions cathedral; CEO + Eng + 2x Codex cleared).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…op control

Adds `gateway.toolLoop(opts)` as the provider-neutral loop wrapper over the
already-provider-neutral `gateway.chat()`. The Vercel AI SDK abstraction does
all the per-provider tool-def normalization, tool-call parsing, and tool-result
framing; this helper just sequences the assistant→tool-dispatch→tool-result
cycle with:

  - D11 stable-ID callbacks (onToolCallStart returns the gbrain-owned UUID v7
    that the caller persists at first observation; reread on replay)
  - Write-ordering invariant (persist assistant → persist pending tool row →
    execute side effect → settle complete/failed)
  - Crash-replay reconciliation via `replayState.priorTools` keyed by
    gbrainToolUseId (NOT provider IDs)
  - Capability-driven cache_control (Anthropic only, via cacheSystem flag)
  - Stop-reason mapping for refusal / content_filter / max_turns / aborted

The loop is stateless beyond the optional replay state — testable via the
existing `__setChatTransportForTests` seam without any DB.

This is the substrate Slice 1's `subagent.ts` rewire (S1.5) consumes.

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ateway.toolLoop

Closes the three-layer Anthropic-only enforcement (queue gate / model-config
runtime fallback / doctor check) with a capability-based gate driven by the
recipe registry. Any provider that supports native tool calling can now
run the subagent loop.

Three layers reworked:

  - queue.ts:87-106 (S1.7) — drop isAnthropicProvider hard-reject. Replace
    with classifyCapabilities() check: refuse only when verdict is
    'unusable:no_tools' or 'unknown'. Degraded providers (no caching, no
    parallel tools) pass through; the gateway prints once-per-(source, model)
    cost warnings at first dispatch.
  - model-config.ts:205 (S1.8) — rename enforceSubagentAnthropic →
    enforceSubagentCapable. Keeps the once-per-(source, model) warn seam
    from v0.31.12 and inherits the same suppression Set so doctor + first-
    call surfaces stay in sync. Legacy name kept as a thin wrapper for
    external callers.
  - doctor.ts:1189 (S1.9) — rename subagent_provider check →
    subagent_capability. The check now surfaces three states: 'unusable',
    'unknown', and 'degraded:no_caching' (the cost-regression warn). Paste-
    ready fix hints point at `gbrain config set models.tier.subagent`.

Subagent handler routing (S1.5 + S1.10):

  - New `agent.use_gateway_loop` config flag (default off). When enabled,
    the handler routes through gateway.toolLoop() — provider-agnostic via
    the Vercel AI SDK. When disabled, the legacy Anthropic-direct path
    stays unchanged.
  - Handler-entry capability check refuses tool-unsupported / unknown
    providers loudly. With flag OFF + non-Anthropic model, refuses with a
    paste-ready hint.
  - runSubagentViaGateway() (new helper) bridges the existing ToolDef
    registry to gateway's ChatToolDef + ToolHandler shapes. Persists to
    the v0.38 stable-ID columns (ordinal + gbrain_tool_use_id) at first
    observation; settles complete/failed on tool exit.
  - D5 read-time shim (S1.6) — loadPriorToolsV2 + adaptContentBlocksToChatBlocks
    handle v1 Anthropic-shaped legacy rows alongside v2 gateway-shaped writes
    so crash-replay reconciles across the upgrade boundary.

Tests:

  - test/agent-cli.test.ts Layer 1/2/3 cases flipped from "rejects non-
    Anthropic" to "any tool-supporting provider accepted; refuses unknown
    and embedding-only providers". 4 new cases covering openai, google,
    unknown provider, embedding-only.
  - All 27 cases pass; typecheck clean.

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…grations v82/v83

Foundation for per-OAuth-client daily budget caps. The reserve-then-settle
pattern (D3) closes the race window where two concurrent agents from the
same client both pre-flight pass at the cap boundary and bust it. Mirrors
the rate-leases.ts shape (lock-bounded check-then-insert + TTL-based
crash reclamation).

Changes:

  - Migration v82 (`mcp_spend_reservations`) — UUID primary key per
    reservation, status enum {pending,settled,expired}, partial index on
    (status, expires_at) WHERE status='pending' for cheap sweeps.
  - Migration v83 (`oauth_clients.budget_usd_per_day`) — first-class
    daily cap column on registered clients. NULL = no cap (legacy
    behavior for pre-v83 clients).
  - `src/core/minions/budget-meter.ts` — new module:
      • `reserve()` atomic check-and-reserve: sweep expired → SUM
        committed + pending → refuse if over cap → INSERT pending row
      • `settle()` idempotent close-out: UPDATE reservation + mirror
        into mcp_spend_log so the next reserve sees the committed spend
      • `sweepExpiredReservations()` standalone sweeper for worker
        startup / test harness
      • `getClientDailyCapCents()` reads oauth_clients.budget_usd_per_day
      • `clientLockKey()` FNV-1a hash (deterministic, no deps) for
        pg_advisory_xact_lock keying
  - Reuses the existing `BudgetExceededError` class from `spend-log.ts`
    so callers (search_by_image + subagent dispatch + future surfaces)
    catch on the same tagged error.

All 130 migration tests green; budget-meter module typecheck clean.

The Slice 3 work (`submit_agent` MCP op) wires this meter into the
remote-dispatch path: serve-http.ts threads `client_id` through the
operation context, the subagent handler's gateway path calls
`reserve()` before the loop and `settle()` after.

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd_* migration

The remote-dispatch unlock. Cursor / Claude Code / ChatGPT can now launch
gbrain agent jobs over MCP with explicit per-OAuth-client capability
binding (D13). The trust boundary lives in oauth_clients.bound_* fields,
not in ad-hoc protected-name checks.

Schema:

  - Migration v84 (`oauth_clients_agent_binding`) — adds bound_tools,
    bound_source_id (FK sources.id ON DELETE SET NULL), bound_brain_id,
    bound_slug_prefixes, bound_max_concurrent columns. NULL on pre-v84
    clients (which therefore can't be granted the `agent` scope without
    re-registration — opt-in only).
  - `agent` scope added to `src/core/scope.ts`. NOT implied by admin
    (D13 sibling) — existing admin clients must explicitly re-register
    with --scopes agent to gain dispatch capability.

New MCP op `submit_agent`:

  - scope: `agent`, mutating, remote-callable
  - Required params: prompt. Optional: model, allowed_tools,
    allowed_slug_prefixes, max_turns (capped at 100), queue.
  - Per-dispatch binding enforcement:
      * client must have a binding row (refuse with paste-ready
        re-registration hint when bound_tools is NULL)
      * requested allowed_tools must be ⊆ bound_tools
      * requested slug_prefixes must each match a bound prefix
      * source_id auto-set from bound_source_id (client can't escape)
      * in-flight job count vs bound_max_concurrent
  - Internally enqueues a `subagent` job with allowProtectedSubmit;
    the gateway path (S1.5) is auto-on for remote-dispatched agents.
  - Writes a JSONL audit row via the new `agent-audit.ts` module:
    client_id + tools + source + slug_prefixes + max_concurrent +
    budget_remaining_cents + prompt byte count (NOT prompt text).

New `src/core/minions/agent-audit.ts`:

  - Mirrors shell-audit.ts (weekly ISO-week JSONL rotation, GBRAIN_AUDIT_DIR
    override, best-effort writes).
  - File: ~/.gbrain/audit/agent-jobs-YYYY-Www.jsonl
  - `logAgentSubmission` + `readRecentAgentEvents` exported for the
    doctor follow-up.

Tests: typecheck clean; capabilities + agent-cli suites green (39/39).

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Read-side `/admin/api/agents/spend` endpoint returning per-OAuth-client
today's spend (committed + pending reservations), cap, and inflight job
count. The Agents.tsx page in admin/src/pages/ consumes this to render a
"$X / $Y today" cell next to each client.

Stub-style server endpoint lands now; the full Agents.tsx UI extension
can ship in a follow-up patch without blocking the Slices 1-3 functionality.
Pre-v0.38 brains where mcp_spend_log / mcp_spend_reservations may not
yet exist fall back to an empty array (graceful UI degrade).

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… scope flips

Test gap fills surfacing the load-bearing invariants of Slices 1-3:

Gateway tool loop (test/ai/gateway-tool-loop.test.ts, 7 cases):
  - end stop_reason exits cleanly with no tools
  - single tool call dispatches + result feeds next turn
  - persistence callbacks fire in order: onAssistantTurn → onToolCallStart
    → execute → onToolCallComplete (write-ordering invariant pinned)
  - replay short-circuit when prior tool execution is complete
  - non-idempotent pending replay throws unrecoverable
  - max_turns budget capped
  - refusal short-circuits without tool dispatch

Budget meter (test/minions/budget-meter.test.ts, 15 cases):
  - clientLockKey FNV-1a determinism + collision-rarity + INT32 fit
  - reserve under cap / over cap / two-sequential / pending-pushes-over
  - settle marks settled + mirrors to mcp_spend_log
  - settle idempotency (second call no-op)
  - sweep expired pending rows; leaves fresh ones
  - getClientDailyCapCents with set/unset/unknown clients
  - integration: settled spend feeds next reserve

Agent audit (test/minions/agent-audit.test.ts, 7 cases):
  - ISO-week filename rotation (incl. year-boundary edge)
  - JSONL line shape + multi-event appending
  - regression guard: NEVER logs prompt content (only byte count)
  - readRecentAgentEvents newest-first + empty-dir graceful fallback

Pre-existing test fixes for v0.38 semantics:
  - test/scope.test.ts: `agent` scope added (size 5 → 6)
  - test/oauth.test.ts: operations registry allows scope='agent' for
    submit_agent (mutating, contained by client bindings)
  - test/model-config.serial.test.ts: enforceSubagentCapable returns
    non-Anthropic tool-supporting models unchanged (with cost warn) and
    falls back to TIER_DEFAULTS.subagent only on unknown providers

Schema parity:
  - pglite-schema.ts + schema.sql get the v83 (budget_usd_per_day) +
    v84 (bound_tools, bound_source_id, bound_brain_id,
    bound_slug_prefixes, bound_max_concurrent) columns in CREATE TABLE
    so fresh installs land in post-migration shape AND the
    schema-bootstrap-coverage CI guard sees full coverage.

Pre-existing hybrid-reranker / cross-modal-hybrid integration test
failures are on master before any of this wave — out of scope.

Plan: ~/.claude/plans/system-instruction-you-are-working-shimmying-breeze.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.38.0.0 — Agents+Minions cathedral wave. Kills the Anthropic pin in the
subagent tool loop, opens remote MCP dispatch via submit_agent, lands
per-OAuth-client daily budget caps with reserve-then-settle concurrency,
and stands up the registration-time binding contract for the agent scope.

See CHANGELOG.md for the full entry and migration story.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent-audit

12 pre-existing flakes (hybrid-reranker / cross-modal-hybrid / unified-multimodal
/ llm-intent-hybrid-integration / doctor-report-remote) all collapsed to
zero after this wave. Root cause: shared module-level state in
src/core/ai/gateway.ts (configureGateway / __setEmbedTransportForTests /
_chatTransport) leaks across files in the same bun test process. Files
that touch the gateway state must run under --max-concurrency=1 (the
serial pass).

Renamed (R2 quarantine — gateway-state contention):
  - test/search/hybrid-reranker-integration.test.ts → .serial.test.ts
  - test/cross-modal-hybrid-integration.test.ts → .serial.test.ts
  - test/unified-multimodal.test.ts → .serial.test.ts
  - test/llm-intent-hybrid-integration.test.ts → .serial.test.ts

doctor-report-remote.serial.test.ts was already serial in v0.37.10.0; its
single failure in the v0.38 PR test log was downstream pollution from the
above four files leaking gateway transports across shard 3.

Also fixed test/minions/agent-audit.test.ts (R1 violation: raw
process.env.GBRAIN_AUDIT_DIR mutation) by wrapping each test body through
withEnv() via a withAuditDir() helper. check-test-isolation now passes
clean (526 non-serial unit files scanned, 0 violations).

Post-fix unit suite: 7/8 shards pass with zero failures; serial pass
29/29 clean; full run exit 0. Background task reported exit code 0.
The wedge on shard 4 (migrate.test.ts) is a separate slow-test scoping
concern, not a v0.38 regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master added v0.37.11.0 (fresh-install PGLite embedding setup fix wave).
Resolved 3 conflicts:
  - VERSION → 0.38.2.0 (wave bumped past 0.38.0.0 → 0.38.2.0 per user direction)
  - package.json → 0.38.2.0 (synced with VERSION)
  - CHANGELOG.md → both entries kept (v0.38.2.0 on top, v0.37.11.0 below);
    v0.38.0.0 header rewritten to v0.38.2.0

Source files auto-merged cleanly (src/commands/doctor.ts +
src/core/ai/gateway.ts + src/core/pglite-schema.ts). Regenerated
src/core/schema-embedded.ts from the merged schema.sql. Typecheck
green (0 errors).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.38.0.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter v0.38.2.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter May 22, 2026
garrytan and others added 3 commits May 21, 2026 22:26
CI failure on PR #1289: scripts/check-admin-scope-drift.sh caught the
hand-maintained mirror at admin/src/lib/scope-constants.ts had not been
updated when I added the new `agent` scope to src/core/scope.ts in Slice 3.
CLAUDE.md flagged this exact CI guard for the file.

Mirrored: added `agent` to both the Scope union type and the alphabetically-
sorted ALLOWED_SCOPES_LIST. Rebuilt the admin SPA dist (vite build, 36
modules, 228KB) so the bundled scope-aware UI matches the new server-side
list. check-admin-scope-drift passes (6 scopes match); full `bun run verify`
chain passes end-to-end including typecheck (0 errors).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI failure on PR #1289 serial pass: test/admin-embed-spawn.serial.test.ts
4/4 fail with "Cannot find module '../admin/dist/assets/index-CWq369vO.js'"
when spawning gbrain serve --http.

Root cause: the prior commit (f270e6c) rebuilt the admin SPA dist after
adding the v0.38 `agent` scope to admin/src/lib/scope-constants.ts, which
produced a new content-hashed bundle filename (index-CWq369vO.js →
index-DFgMZhBE.js). The auto-generated `src/admin-embedded.ts` manifest
still hardcoded the OLD filename, so `import ... with { type: 'file' }`
threw at module-load time inside the spawned server, the server never
became ready, and the e2e harness timed out at 30s × 4 tests = ~2min.

Fix: re-ran `bun run build:admin-embedded` (scripts/build-admin-embedded.ts)
which regenerates src/admin-embedded.ts from the current dist/ contents.
Manifest now references index-DFgMZhBE.js. All 4 admin-embed-spawn.serial
tests pass locally.

Forward-looking note: the build:admin npm script chains
`cd admin && bun run build && cd .. && bun run scripts/build-admin-embedded.ts`
so regenerating both together is the standard path — the prior commit
manually invoked `vite build` inside admin/ and skipped the second step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User direction: this is v0.38.1.0 — the first patch on v0.38.0.0, not 0.38.2.0.
v0.38.2.0 was mid-wave when master merged in d0d0e2a and I chose the next
slot up; in retrospect 0.38.1.0 is the correct next-patch number since
nothing actually shipped at 0.38.0.0 yet (the PR's been re-iterating through
CI green and the wave is one continuous ship).

Updated:
  - VERSION: 0.38.2.0 → 0.38.1.0
  - package.json: 0.38.2.0 → 0.38.1.0
  - CHANGELOG.md: header rewritten

Trio audit: all three say 0.38.1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.38.2.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter May 22, 2026
garrytan and others added 3 commits May 22, 2026 07:49
…shim + admin spend

61 new test cases across 4 files closing the load-bearing gaps from the
v0.38 Agents+Minions wave. Also extracts /admin/api/agents/spend SQL into
a named helper so the endpoint and its test share a single source of truth.

Gap inventory + coverage delta:

  | Surface                                    | Before | After  |
  |--------------------------------------------|--------|--------|
  | submit_agent op (binding enforcement)      | 0      | 17     |
  | agent scope NOT implied by admin           | 0      | 9      |
  | D5 v1→v2 read-time shim                    | 0      | 16     |
  | /admin/api/agents/spend endpoint SQL       | 0      | 19     |

test/submit-agent.test.ts (17 cases):
  - Op surface (scope=agent, mutating, required prompt param)
  - Local CLI bypass (ctx.remote=false → invalid_request)
  - OAuth client requirement (missing clientId, unknown client_id)
  - Binding requirement: refuse when agent scope but bound_tools NULL
  - allowed_tools subset enforcement (passes ⊆, refuses outside)
  - allowed_slug_prefixes prefix-match against bound_slug_prefixes
  - bound_max_concurrent cap (refuse at cap, allow below, exclude
    terminal-state jobs, isolate inflight count by client_id)
  - Happy-path: job inserted + audit row written + prompt NEVER logged
  - max_turns capped at 100

test/scope-agent-isolation.test.ts (9 cases) — D13 regression guard:
  - admin does NOT imply agent (the load-bearing security check)
  - admin still implies sources_admin/users_admin/write/read
  - agent does NOT imply anything else (no reverse inheritance)
  - read+write does NOT imply agent (the common legacy shape)
  - explicit admin+agent compound grant satisfies both
  - ALLOWED_SCOPES_LIST sort order pinned (agent between admin and read)

test/subagent-v1-v2-shim.test.ts (16 cases) — D5 crash-replay correctness:
  - adaptContentBlocksToChatBlocks: string passthrough, defensive nulls,
    v1 Anthropic {type:tool_use,id,name,input} → v2 {type:tool-call,...},
    v2 passthrough, v1 tool_result → v2 tool-result with __legacy__
    toolName sentinel, is_error mapping, mixed v1+v2 in same message
    array (mid-upgrade scenario), malformed-block skip
  - loadPriorToolsV2: empty, gbrain_tool_use_id as stable key for v2,
    legacy-prefixed key for v1 rows, status+error preservation, mixed
    v1+v2 side-by-side with both shapes resolving, ORDER BY stability
  - Exposed both helpers on the existing __testing export from subagent.ts

test/admin-agents-spend.test.ts (19 cases) — Slice 4 SQL pinning:
  - Empty results: no clients / clients without agent scope or bindings
  - Include: scope=agent (with or without bindings), bound_tools set
    (with or without scope=agent — covers partial-migration state)
  - Exclude: soft-deleted (deleted_at IS NOT NULL) clients
  - cap_usd_per_day: null when unset, numeric when set
  - spent_cents_today: zero baseline, sum of today, exclude yesterday
    (UTC-day-aligned), client-id isolation
  - pending_cents: sum of pending+non-expired, exclude expired, exclude
    settled
  - inflight_count: only active/waiting/waiting-children subagent jobs;
    exclude shell jobs; client-id isolated
  - ORDER BY client_name ASC pinned for deterministic UI rendering
  - Multi-word scope strings ('read write agent') handled correctly via
    string_to_array
  - End-to-end happy path: all fields populated together

Refactor: extracted the spend SQL from src/commands/serve-http.ts into a
new exported `queryAgentClientSpend(engine)` helper + `AgentClientSpend`
type. The Express handler now delegates (5 lines). Same query, same
result shape, but a single source of truth that both the endpoint and
the test exercise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new e2e suites driving the v0.38 runSubagentViaGateway path end-to-end
against PGLite. Both filed in TODOS as v0.38.x follow-ups during the cathedral
ship; building them out caught two real load-bearing bugs in subagent.ts that
would have silently broken crash-replay in production.

Bug 1 — messageIdx collision on fresh runs.
runSubagentViaGateway only passed replayState when priorChatMessages.length > 0,
so on a fresh run the gateway loop's messageIdx counter defaulted to 0. The
seed user message already occupies (job_id, message_idx=0), so the first
onAssistantTurn write at idx 0 hit the unique-constraint and the whole job
failed before any tool call. Fix: always pass replayState with nextMessageIdx
set to 1 on fresh runs (after the seed write). Pinned by
test/e2e/subagent-gateway-path.test.ts ("happy path 1-turn" + "write-ordering
invariant").

Bug 2 — onToolCallStart returned the wrong UUID on crash-replay.
The callback generated a fresh candidateId, INSERTed with ON CONFLICT DO
UPDATE, and returned the local candidateId. On replay, the pre-crash row
survives intact with its ORIGINAL gbrain_tool_use_id, so the local candidateId
was wrong. The gateway loop's replayState.priorTools is keyed by the original
UUID; returning the new one made the short-circuit miss and re-execute every
tool call. Fix: RETURNING gbrain_tool_use_id::text AS gbrain_tool_use_id and
read it back; fall through to candidateId only if RETURNING is empty. Pinned
by test/e2e/subagent-crash-replay-multi-provider.test.ts.

Coverage:
- test/e2e/subagent-gateway-path.test.ts: 7 cases. Happy path 1-turn,
  multi-turn with parallel tool calls, write-ordering invariant
  (persist-before-side-effect), gateway returns malformed tool_call shape,
  cancel mid-loop, capability refusal at submit.
- test/e2e/subagent-crash-replay-multi-provider.test.ts: 13 cases. Five
  provider rows (anthropic / openai / google / openrouter / deepseek) ×
  pre-crash run + replay assertion, plus ordinal-collision PK guard,
  pending-tool short-circuit, v1→v2 shim round-trip.

Both files run hermetically against PGLite (no DATABASE_URL needed) and
use the __setChatTransportForTests gateway seam for stubbed provider
responses. Reset path goes through resetPgliteState + setConfig version=84
so MinionQueue.ensureSchema() sees the migration ledger correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ents wave

Master shipped v0.38.0.0 with migration v81 (pages_provenance_columns) while
this branch was building v0.38.1.0 with its own v81 (subagent_tool_executions
_stable_id) + v82/v83/v84. Conflict resolution:

- VERSION + package.json: kept ours (0.38.1.0 > 0.38.0.0).
- CHANGELOG.md: preserved both entries in order (v0.38.1.0 on top, master's
  v0.38.0.0 immediately below).
- src/core/migrate.ts: kept master's v81 verbatim, renumbered our four to
  v82 (subagent_tool_executions_stable_id), v83 (mcp_spend_reservations),
  v84 (oauth_clients_budget_usd_per_day), v85 (oauth_clients_agent_binding).
  Runtime sort by version means source-order doesn't matter; tests sweep
  the array.
- Schema comments in pglite-schema.ts + schema.sql + the auto-regenerated
  schema-embedded.ts updated to reference the new version numbers.
- Test setConfig('version', '84') → '85' across the five v0.38 test files
  that prime the migration ledger.

Verification:
- bun run typecheck clean.
- bun run verify clean (5 checks + tsc).
- Targeted re-run of 7 affected test files (migrate + submit-agent + admin
  agents-spend + subagent-v1-v2-shim + budget-meter + both e2e files):
  227 / 227 pass. Migration ledger shows v81 → v85 applying in order on a
  fresh PGLite, confirming runtime sort handles the source-order shuffle.

Trio audit (VERSION / package.json / CHANGELOG top header) all show
0.38.1.0.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 0102456 into master May 22, 2026
8 checks passed
garrytan added a commit that referenced this pull request May 22, 2026
… v82-v85)

Master shipped v0.38.1.0 (provider-agnostic subagent loop, #1289) which
claimed migration slots v82-v85:
  v82 — subagent_tool_executions_stable_id
  v83 — mcp_spend_reservations
  v84 — oauth_clients_budget_usd_per_day
  v85 — oauth_clients_agent_binding

The v0.40.2.0 trajectory-routing wave's `facts_event_type_column`
migration is renumbered to v86. Engine + test + CLAUDE.md references
updated.

CHANGELOG reconstructed: v0.40.2.0 entry kept at the top (our entry),
master's v0.38.1.0 entry inserted below, both intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.38.2.0 fix(doctor): bounded frontmatter scan + partial-state surfacing (supersedes garrytan#1287) (garrytan#1297)
  v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter (garrytan#1289)
  v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract (garrytan#1275)
  v0.37.11.0: fresh-install PGLite embedding setup fix wave (garrytan#1286)
  v0.37.10.0 feat(init): env-detection + interactive picker + preflight invariants (garrytan#1278)
  v0.37.9.0 fix(frontmatter): canonical-style normalization for tag arrays (garrytan#1252)
  v0.37.8.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix (garrytan#1267)
  v0.37.7.0 fix wave: federated brains + autopilot safety + OAuth confidential clients (garrytan#1253)
  v0.37.6.0 feat(ai): OpenRouter recipe + generic default_headers seam (cherry-pick garrytan#1210) (garrytan#1246)
  v0.37.5.0 fix(markdown): YAML-aware NESTED_QUOTES validator (stops flagging valid YAML) (garrytan#1229)
  feat: pgGraph-inspired CI scaffolding wave (v0.37.4.0) (garrytan#1228)
  v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes garrytan#1206) (garrytan#1215)
  v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' (garrytan#1211)
  v0.37.1.0 feat: brainstorm + lsd — bisociation idea generator grounded in your own brain (garrytan#1214)
  v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208)
  v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant