Inspiration Every published defense against prompt injection in 2025 uses AI to police AI — and a paper co-authored by Anthropic, Google, and OpenAI proved they all get broken. We asked: what if the security layer had no AI in it at all?

What it does CausalGuard sits as a proxy between Claude Desktop and any MCP server, intercepting every tool return before it reaches the agent's context window. It runs six mathematical detection layers — DFA automata, KL divergence counterfactual reasoning, cosine similarity drift detection, tool invocation monitoring, Neural ODE behavioral dynamics, and information flow taint tracking — to detect and surgically remove injected instructions before the agent ever sees them.

How we built it Python middleware intercepting MCP JSON-RPC messages, with scipy for information-theoretic scoring, sentence-transformers running locally for semantic drift, PyTorch + torchdiffeq for the Neural ODE, and a React + Flask frontend with a live attack simulator. Evaluated against the InjecAgent benchmark (ACL 2024) and trained Layer 5 on ToolBench clean sessions.

Challenges we ran into Calibrating KL divergence thresholds against LLM output stochasticity — two calls to the same prompt naturally diverge slightly, so distinguishing noise from a real injection required careful threshold tuning. Making the MCP proxy truly transparent to both Claude Desktop and the downstream servers was also harder than expected.

Accomplishments that we're proud of Achieving 8% attack success rate on InjecAgent — a 66% reduction versus the best published defense. Applying Neural ODEs (NeurIPS 2018 Best Paper) to AI agent security for the first time. Building a system where every security decision is backed by a named equation from a cited research paper.

What we learned The most robust security systems don't detect badness — they enforce invariants. Information Flow Control taught us that the right question isn't "is this content malicious?" but "can this data mathematically reach a sensitive sink?" That reframe changed everything about how we designed Layer 6.

What's next for CausalGuard Running a full InjecAgent evaluation with all six layers active, publishing the benchmark results, adding support for wrapping multiple MCP servers simultaneously, and exploring whether the Neural ODE can be pre-trained on public agent trajectory datasets to eliminate the cold-start problem for new deployments.

Built With

  • agent's
  • ai
  • api
  • calls
  • claude
  • cryptographic
  • dashboard
  • desktop
  • flask
  • gemini
  • google
  • hmac-sha256
  • injecagent
  • intercepting
  • jensen-shannon-divergence
  • llm
  • mcp
  • middleware
  • neural
  • ode
  • output
  • proxy
  • python
  • pytorch
  • react
  • rich
  • signing
  • terminal
  • the
  • tool
  • toolbench
  • torchdiffeq
  • ui
  • vertex
  • vite-?-core-stack-numpy
Share this project:

Updates