Skip to content

fix: harden multi-agent MCP access#1316

Open
chipoto69 wants to merge 12 commits into
garrytan:masterfrom
chipoto69:prd/gbrain-multi-agent-hardening-2026-05-06
Open

fix: harden multi-agent MCP access#1316
chipoto69 wants to merge 12 commits into
garrytan:masterfrom
chipoto69:prd/gbrain-multi-agent-hardening-2026-05-06

Conversation

@chipoto69

Copy link
Copy Markdown

Summary

  • Implements Phase 4 multi-agent hardening for GBrain MCP: deny-by-default fine-grained scopes, Phase 4B token presets, and remote bearer enforcement.
  • Removes request payload persistence from MCP audit logging, preserves real operation names, adds constrained audit status taxonomy, and cools access_tokens.last_used_at hot-row writes with an in-process LRU debounce.
  • Cleans up the RLS posture for service-role-only deployments: disables zero-policy RLS trapdoors, drops the legacy auto-RLS trigger/function, adds doctor checks, and documents the expected posture.

Linked design / PRD

  • Phase 4 PRD: /Users/rudlord/wiki/_meta/gbrain-recon-2026-05-06/phase-4-prd.md
  • In-repo PRD mirror: docs/prds/gbrain-multi-agent-hardening-2026-05-06.md
  • Ops note: docs/ops/mcp-request-log-partitioning-2026-05-06.md

Review focus

  • Scope enforcement is deny-by-default for remote MCP calls; missing bearer auth is rejected before handler execution, and every operation declares fine-grained required scopes.
  • No cleartext bearer tokens or request payload values are persisted in code/log tables/SSE events; gitleaks scan is clean.
  • MCP audit operation names use real operation names such as get_page / put_page, not generic transport labels like tools/call.
  • RLS is intentionally disabled/commented for service-role-only tables; legacy auto-RLS trigger/function is removed.
  • last_used_at write cooling uses LRU eviction and per-token cooldown while preserving per-request audit rows.

Test plan

  • bun run typecheck
  • bun test test/scoped-tokens.test.ts test/http-transport.test.ts test/serve-http-audit.test.ts test/token-last-used.test.ts test/rls-posture-migration.test.ts — 56 pass
  • gitleaks detect --no-git --redact --source . — no leaks found
  • Static review of diff for conflict markers, token/payload logging, generic operation audit labels, and RLS posture
  • [!] Full bun test was attempted but timed out at the 600s worker cap after substantial progress; test/cli-options.test.ts has an environment-sensitive skillpack-check spawn timeout when run in this profile. Focused Phase 4 suites above pass.
  • [!] Live multi-agent MCP smoke against Rudy's real GBrain Postgres was attempted but blocked by upstream database authentication circuit breaker: (ECIRCUITBREAKER) too many authentication failures, new connections are temporarily blocked. No tokens were printed; temporary smoke script was removed.

Notes

This branch was pushed from fork chipoto69/gbrain because chipoto69 does not have direct push permission to garrytan/gbrain.

@chipoto69

Copy link
Copy Markdown
Author

Hermes steward review / Phase 4E status

Opened this as the fork PR because chipoto69 cannot push directly to garrytan/gbrain.

Review checks completed:

  • Scope enforcement: focused tests verify all MCP operations declare fine-grained required scopes, missing remote bearer auth is rejected before handler execution, read-only cannot write, writer cannot call admin, and legacy read/write/admin normalize without granting accidental admin content access.
  • Token/log secrecy: gitleaks detect --no-git --redact --source . reports no leaks. Diff review found no real cleartext bearer tokens; apparent hits are docs/test identifiers/env forwarding, not token material.
  • Operation names: audit tests verify tools/call logs real operation names like get_page, not generic transport labels.
  • RLS posture: schema/migration tests verify service-role-only tables are RLS-disabled/commented and the legacy auto-RLS trigger/function is removed.
  • Hot-row cooling: token-last-used tests verify one write per key per cooldown window plus LRU behavior while preserving request audit rows.

Commands run:

  • bun run typecheck — pass
  • bun test test/scoped-tokens.test.ts test/http-transport.test.ts test/serve-http-audit.test.ts test/token-last-used.test.ts test/rls-posture-migration.test.ts — 56 pass
  • gitleaks detect --no-git --redact --source . — no leaks
  • Full bun test attempted; timed out at the 600s worker cap after broad progress. test/cli-options.test.ts also has an environment-sensitive spawn issue in this Hermes profile (bun missing from PATH unless prepended, then skillpack-check times out). Treat focused Phase 4 suites as the reliable gate here.

Blocked items:

  • PR is currently GitHub CONFLICTING against origin/master. I attempted a normal merge from origin/master; it produced conflicts across the hardening surface plus broad upstream churn, so I aborted to avoid leaving a dirty/conflicted branch.
  • Live multi-agent MCP smoke against Rudy's real GBrain Postgres was blocked by upstream DB auth circuit breaker: (ECIRCUITBREAKER) too many authentication failures, new connections are temporarily blocked. No tokens were printed, and the temporary smoke script was removed.

Verdict: implementation surface looks sane from focused review, but do not merge until a conflict-resolution pass rebases/merges current master and live MCP smoke can run against valid DB credentials.

@garrytan

Copy link
Copy Markdown
Owner

Status comment — the v0.41.3.0 security/MCP fix wave (#1403) ran a deliberate scope review of this PR and decided to keep it open as a parking lot rather than cherry-pick wholesale. Three independent wins from your design are filed as TODOS:

  • T13a (P1): Extract deny-by-default fine-grained scope wiring — the per-op requiredScope metadata + dispatch-time gate. Today the OAuth scope string is validated at registration but doesn't actually constrain which ops a token can call. Real security win when landed.
  • T13b (P2): Extract real operation names in mcp_request_log — pre-fix every MCP request logs tools/call generically; your PR carries the real op name (get_page, put_page, etc.). Standalone win, low risk, candidate for next minor.
  • T13c (P2): Extract access_tokens.last_used_at LRU debounce — useful on multi-agent fleets sharing tokens at high rate.

The reason for not cherry-picking the whole PR: your Phase 4 work includes an RLS posture rewrite that drops the v0.26.7 auto-RLS event trigger. That trigger is treated as load-bearing by gbrain doctor's rls_event_trigger check and is the canonical posture for service-role-only deployments. Reversing it deserves its own plan-eng-review + doctor-check rewrite + a breaking-change CHANGELOG note — not a bundle inside a small fix wave.

v0.41.3.0 picks up some adjacent territory you helped surface: the OAuth CORS lockdown (Express server defaulted to cors() wide-open), GBRAIN_HTTP_TRUST_PROXY env unification across both transports, and the ALLOWED_TOKEN_ENDPOINT_AUTH_METHODS validator with client_secret_basic in the allowed set.

Happy to discuss splitting #1316 into the three TODOS above as separate PRs if you want to drive them — that would unbundle the architectural risk and keep the wins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants