Summary
During continuous improvement session CI-15 (2026-03-20, v0.16.0), the ContentSanitizer flagged 2 injection patterns (flags=2) when retrieving memory content that contained legitimate user queries.
Root Cause
assembly.rs::sanitize_memory_message() runs all retrieved memory messages through ContentSanitizer::detect_injections(). When a user query like:
"Use the memory_save tool to save this fact: ... Confirm when done."
is stored in memory and later retrieved via memory_search, the sanitizer flags it because it contains imperative language similar to injection patterns (e.g., instruction-like directives).
Observed Warning
WARN zeph_core::agent::context::assembly: injection patterns detected in memory retrieval flags=2
Impact
- Severity: Low — sanitizer is advisory only (doc comment: "not a security boundary")
- Functional impact: None — retrieval proceeds normally
- Operational impact: Log noise; inflates
sanitizer_injection_flags metric
Expected Behavior
User messages stored in memory (prior conversation turns) should not trigger injection warnings when retrieved. The sanitizer should distinguish between:
- Actual injection: untrusted external content (web scrapes, MCP tool output, documents)
- False positive: prior user conversation turns retrieved from SQLite
Potential Fix
- Reduce
ContentSource specificity for memory retrieval paths — use a lower sensitivity mode
- Add context label to retrieved messages (e.g.,
role=user) to skip injection scanning for known-safe sources
- Filter patterns that commonly trigger on instruction-like user queries (e.g., patterns matching common verb phrases)
Evidence
- Session:
.local/testing/sessions/2026-03-20-session-ci15.md
- Log:
.local/testing/memory-ops-2026-03-20.log
- Code:
crates/zeph-core/src/agent/context/assembly.rs::sanitize_memory_message()
Summary
During continuous improvement session CI-15 (2026-03-20, v0.16.0), the
ContentSanitizerflagged 2 injection patterns (flags=2) when retrieving memory content that contained legitimate user queries.Root Cause
assembly.rs::sanitize_memory_message()runs all retrieved memory messages throughContentSanitizer::detect_injections(). When a user query like:is stored in memory and later retrieved via
memory_search, the sanitizer flags it because it contains imperative language similar to injection patterns (e.g., instruction-like directives).Observed Warning
Impact
sanitizer_injection_flagsmetricExpected Behavior
User messages stored in memory (prior conversation turns) should not trigger injection warnings when retrieved. The sanitizer should distinguish between:
Potential Fix
ContentSourcespecificity for memory retrieval paths — use a lower sensitivity moderole=user) to skip injection scanning for known-safe sourcesEvidence
.local/testing/sessions/2026-03-20-session-ci15.md.local/testing/memory-ops-2026-03-20.logcrates/zeph-core/src/agent/context/assembly.rs::sanitize_memory_message()