CI-native evidence compiler for MCP and A2A governance
Deterministic policy enforcement, canonical evidence, and reviewable trust artifacts for agent systems.
See It Work · Quick Start · CI Guide · Discussions
Your MCP agent calls read_file, exec, web_search — but should it, and what can you honestly prove about that run afterward?
Assay compiles agent runtime signals into enforceable decisions and portable evidence artifacts. The wedge is familiar: sit between the agent and MCP servers, allow or deny tool calls from policy, and record every decision. The product story is broader: canonical evidence, bounded trust claims (what is verified vs merely visible), and outputs you can hand to CI, security review, or audit — without a hosted backend.
Positioning: Assay is best understood as a CI-native protocol-governance layer: canonical evidence compiler + protocol-aware policy checks. It is not a trust-score engine, a generic eval dashboard, or an observability product with a thin security veneer.
| Enforce | Intercept MCP tool calls, apply policy, ALLOW / DENY deterministically. |
| Compile | Turn traces, decisions, and bundles into canonical evidence — not raw OTel or ad hoc logs as truth. |
| Prove | Export tamper-evident bundles, Trust Basis (trust-basis.json), Trust Card (trustcard.json / trustcard.md), SARIF, and CI gates. |
No hosted backend. No API keys for core flows. Deterministic — same input, same decision, every time.
Trust Compiler line: Release v3.5.0 is the current public trust-compiler line. It carries forward v3.3.0 as the first release that shipped both built-in evidence lint companion packs (
mcp-signal-followup,a2a-signal-followup), v3.4.0 as the public line forG4-APhase 1 (payload.discovery), built-inP2c(a2a-discovery-card-followup), andK1-APhase 1 (payload.handoff), and now also publicly shipsK2-APhase 1 (episode_start.meta.mcp.authorization_discovery). Pack YAML still distinguishes the substrate floor>=3.2.3from the G4-A / P2c floor>=3.3.0— see MIGRATION — Trust Compiler 3.2.
Repository truth:
mainnow tracks the released v3.5.0 trust-compiler surface, including the bounded MCP authorization-discovery seam in imported traces. Future trust-compiler slices may still land onmainbefore the next public cut, so release notes and changelog remain the authority for what is actually public.
Agent ──► Assay ──► MCP Server
│
├─ ✅ ALLOW / ❌ DENY (policy)
├─► 📋 Evidence bundle (verifiable)
└─► 📊 Trust Basis → Trust Card → SARIF / CI
CLI: The
mcpcommand group is hidden from top-levelassay --helpwhile the surface stabilizes; it is supported. Useassay mcp --help,assay mcp wrap …, or follow the MCP Quickstart.
Wedge, not category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.
cargo install assay-cli
mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt
assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
-- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo✅ ALLOW read_file path=/tmp/assay-demo/safe.txt reason=policy_allow
✅ ALLOW list_dir path=/tmp/assay-demo/ reason=policy_allow
❌ DENY read_file path=/tmp/outside-demo.txt reason=path_constraint_violation
❌ DENY exec cmd=ls reason=tool_denied
Inspect the audit artifact:
assay evidence show demo/fixtures/bundle.tar.gzThe bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.
Install from crates.io or source (cargo install --path crates/assay-cli), then:
# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json
# Human + machine Trust Card (schema v2 — seven trust claims; key by `id`, not row count)
assay trustcard generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# → trust-out/trustcard.json , trust-out/trustcard.mdtrust-basis.json emits claims from a bounded, versioned vocabulary for this schema (examples: bundle_verified, delegation_context_visible, authorization_context_visible, containment_degradation_observed, …). Claim id values are stable across runs, but consumers must not rely on row count or ordering; always key by id. It is not a scalar trust score. The Trust Card is a deterministic render of the same claim rows plus frozen non-goals. Contract versions, pack floors, and release checklist: docs/architecture/MIGRATION-TRUST-COMPILER-3.2.md.
| Output | Role |
|---|---|
| Policy gate | MCP wrap — deterministic allow/deny before tools run (see CLI note above the diagram). |
| Evidence bundle | Offline-verifiable, tamper-evident archive for audit and replay. |
| Trust Basis | Canonical trust-basis.json — bounded claim classification from verified bundles. |
| Trust Card | trustcard.json / trustcard.md — same claims, review-friendly artifact. |
| SARIF / CI | GitHub Action, Security tab integration, policy gates on PRs. |
Trust claims use explicit epistemology, not a single “safety score”:
| Level | Meaning |
|---|---|
verified |
Backed by direct evidence or offline verification in the bundle/path |
self_reported |
Emitted by the system without stronger independent corroboration |
inferred |
Derived from bounded, documented rules |
absent |
No trustworthy evidence supports the claim |
Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.
Yes, if you:
- Build with Claude Desktop, Cursor, Windsurf, or any MCP client
- Ship agents that call tools and you need to control which ones
- Want a CI gate that catches tool-call regressions before production
- Need bounded auditability and trust artifacts, not only sampled observability
Not yet, if you:
- Don't use MCP (Assay is MCP-native; other protocols use adapters)
- Need a hosted dashboard (Assay is CLI-first and offline)
- Want a magic trust score or badge as the main output
Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:
assay mcp config-path cursorIt generates JSON like:
{
"filesystem-secure": {
"command": "assay",
"args": [
"mcp",
"wrap",
"--policy",
"/path/to/policy.yaml",
"--",
"npx",
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/you"
]
}
}The same wrapped command works in other MCP clients — see MCP Quick Start.
version: "2.0"
name: "my-policy"
tools:
allow: ["read_file", "list_dir"]
deny: ["exec", "shell", "write_file"]
schemas:
read_file:
type: object
additionalProperties: false
properties:
path:
type: string
pattern: "^/app/.*"
minLength: 1
required: ["path"]Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.
See Policy Files.
Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.
assay trace ingest-otel \
--input otel-export.jsonl \
--db .eval/eval.db \
--out-trace traces/otel.v2.jsonl# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
contents: read
security-events: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: Rul1an/assay-action@v2PRs that violate policy get blocked; SARIF can surface in the Security tab.
| Canonical evidence | Assay’s evidence model is the stable contract; OTel and adapters map into it. |
| Deterministic | Same input, same decision — not probabilistic. |
| Portable artifacts | Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit. |
| Bounded claims | Explicit about what is verified vs visible vs absent — no score-first UX. |
| MCP-native wedge | assay mcp wrap is the fast path (the mcp group is hidden from assay --help; use assay mcp --help). Adapters extend the same engine. |
| Offline-first | No backend required for core enforcement and bundle verification. |
Assay ships adapters that map protocol events into canonical evidence (same policy and evidence story, different transports):
| Protocol | Adapter | What it maps |
|---|---|---|
| ACP (OpenAI/Stripe) | assay-adapter-acp |
Checkout events, payment intents, tool calls |
| A2A (Google) | assay-adapter-a2a |
Agent capabilities, task delegation, artifacts |
| UCP (Google/Shopify) | assay-adapter-ucp |
Discover/buy/post-purchase state transitions |
Adapter crates are workspace / binary–driven (not published as separate crates.io packages); consume them via this repo or released assay builds.
Governance stays protocol-agnostic; the evidence and claim layer stays the same as protocols evolve.
On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:
- Main protection run:
0.771msp50 /1.913msp95 - Fast-path scenario:
0.345msp50 /1.145msp95
These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)
cargo install assay-cliCI: GitHub Action. Python SDK: pip install assay-it
- MCP Quickstart — filesystem server walkthrough
- Policy Files — YAML schema for
assay mcp wrap - OpenTelemetry & Langfuse — traces → replay and evidence
- CI Guide — GitHub Action
- Evidence Store — S3, B2, MinIO
- ADR-033: Trust compiler positioning
- RFC-005: Trust compiler MVP & Trust Card
Bounded context: numbers below support mapping and experiments, not a product “security score.”
- OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
- Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
- Security experiments — attack vectors and harness notes (methodology matters more than headline counts).
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warningsSee CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.