You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
arXiv:2601.05755 — VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit
Key Finding
A verify-before-commit protocol sanitizes tool output streams against user-intent-anchored constraints. Reduces attack success rate by over 22% while more than doubling agent utility under attack compared to static baselines.
Applicability to Zeph
Tool output sanitization: Zeph's ContentSanitizer currently checks inputs (user messages). VIGIL targets outputs — tool results before they enter the LLM context. This is a gap: a malicious MCP tool response could inject instructions into the next LLM turn.
Intent anchoring: The key insight is that sanitization should be relative to the original user intent, not absolute. A tool result that says "ignore previous instructions" is obviously suspicious; VIGIL formalizes this check.
Integration point: In agent/tool_execution/, after calling the executor but before pushing tool_result into context. Check tool output against a cached representation of the user's original intent.
Paper
arXiv:2601.05755 — VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit
Key Finding
A verify-before-commit protocol sanitizes tool output streams against user-intent-anchored constraints. Reduces attack success rate by over 22% while more than doubling agent utility under attack compared to static baselines.
Applicability to Zeph
ContentSanitizercurrently checks inputs (user messages). VIGIL targets outputs — tool results before they enter the LLM context. This is a gap: a malicious MCP tool response could inject instructions into the next LLM turn.agent/tool_execution/, after calling the executor but before pushingtool_resultinto context. Check tool output against a cached representation of the user's original intent.