docs: agent secret gateway — holistic architecture + adversarial tests#16
Closed
docs: agent secret gateway — holistic architecture + adversarial tests#16
Conversation
… test plan Consolidates scattered notes (security-gateway.md, vault-integration-plan.md, model-gateway-primitives.md) into one falsification-friendly document: - States the threat model and goal explicitly - Inventories post-consolidation components (one gateway, not two) - Walks the outbound substitution flow end-to-end - Describes agent bootstrap via zeroclawed-MCP (discovery only, no get_secret surface — by design) - Describes user-side input via !secure chat commands - Audits install.sh vs the full vision (table of gaps per component) - Sketches 10 adversarial tests (T1–T10) mapped to the claims they try to falsify - Ends with an explicit skepticism log of claims the author is not yet sure about Written to be broken. Each §ends with "what could go wrong" so future tests have a concrete target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a draft RFC consolidating the “agent secret gateway” architecture into a single, falsifiable design doc, including an explicit threat model and an adversarial test plan to validate key security claims.
Changes:
- Introduces a holistic component inventory (fnox, security-proxy, clashd, zeroclawed, planned zeroclawed-MCP) and intended consolidation plan.
- Documents the intended outbound substitution data flow and the
!secureuser-input flows. - Adds a T1–T10 adversarial test plan mapped back to specific claims in the document.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds §11–§15 to the draft RFC, broadening from code-correctness to user-story and indirect threat models per review: - §11 Indirect threat models (10 scenarios): substituted-value exfil by upstream, upstream logging, agent-to-agent exfil, pre-substitution artifacts, memory persistence, error-message side-channels, indirect disclosure bypass (chaos-paper #3), adversarial third-party messages, name-leakage as signal, and a mapping of all 8 chaos lessons to our secret-gateway risk surface. - §12 User story failures (10 scenarios): first-run UX, .env migration, key rotation, "I need the value" (HMAC/JWT signing), non-HTTP protocols, blocked legitimate requests, cross-machine secret sync, mobile-without-LAN `!secure request`, request preview/dry-run, and perceived complexity cost. - §13 Legitimate cases we struggle with: HMAC/JWT, binary bodies, streams, WebSocket sessions, OAuth device-flow, mTLS certs, per-user per-request secrets. - §14 Explicitly out of scope: host compromise, fnox root compromise, user-side misuse, compromised model weights, supply chain, timing. - §15 Research pointers: 1Password op:// refs, Doppler, AWS Secrets Manager per-secret destination binding, Vault response-wrapping, SPIFFE/SPIRE, chaos-engineering mindset. Key architectural implication from §11.1 + §11.8: substitution must be bound per-secret to per-destination, and eligibility to substitute must be tagged at the ref site in code the agent wrote, not blindly at every outbound request. Spike item before committing substantial code to §29 (substitution implementation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three polish fixes from the Copilot review:
- §3 substitution scope — the original said "unsupported content-types
pass through unchanged, log warning", which creates a §11.8-shaped
bypass (agent claims multipart/form-data with `{{secret:` in the
body). Rewrote the bullet: a cheap raw-bytes scan runs FIRST; if
the bytes contain `{{secret:` we fail-closed. Only bodies with no
ref-shaped content pass through.
- §6 installer audit table — replaced "✅ (this PR)" with concrete
branch/PR references ("feat/fnox-integration, PR #15") so the
attribution stays correct when read outside this PR.
- §§5/7 — path consistency on the `security-gateway.md` reference;
use `docs/security-gateway.md` everywhere so the link is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Owner
Author
|
Subsumed by #44 (squashed to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft RFC pulling together scattered secret/gateway notes into one doc
designed to be falsified:
Each section ends with "what could go wrong" so the planned tests have
concrete targets. Explicitly in draft form — feedback on the attack
surfaces you think I'm missing is the most valuable thing to get now,
before code lands.
Test plan
🤖 Generated with Claude Code