fix(security): add advanced multi-turn attack detection#5924
fix(security): add advanced multi-turn attack detection#5924dan-redcupit wants to merge 1 commit intoopenclaw:mainfrom
Conversation
New security module for stateful detection of sophisticated attacks: - src/security/injection-detection.ts: detects many-shot, crescendo, persona hijack, CoT hijack, authority spoof, false memory, and indirect injection attacks - src/security/injection-detection.test.ts: comprehensive tests with ZeroLeaks regression payloads Features: - Single message attack detection - Multi-turn conversation analysis - Confidence scoring based on attack severity - Quick-check function for obvious attacks Attack types detected: - Many-shot priming (3+ examples building pattern) - Crescendo (progressive trust-building) - Persona hijack (DAN, roleplay injection) - Chain-of-thought hijack - Authority spoofing ([ADMIN], [SYSTEM]) - False memory (fabricated prior agreements) - Indirect injection (hidden in code/HTML comments) Part of Operation CLAW FORTRESS security hardening (ZeroLeaks remediation). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| function hasMatch(text: string, patterns: RegExp[]): boolean { | ||
| return patterns.some((p) => p.test(text)); | ||
| } |
There was a problem hiding this comment.
[P0] hasMatch uses RegExp.test on patterns that include the g flag (e.g., MANY_SHOT_INDICATORS), which mutates lastIndex and can cause subsequent test() calls to flip between true/false across messages. This makes detection non-deterministic depending on call order and prior inputs.
This matters when any ..._PATTERNS array accidentally includes a global regex (now or in future), and it already affects any future reuse of MANY_SHOT_INDICATORS via hasMatch.
A safer approach is to avoid /g in patterns used with test, or to reset lastIndex before testing, or to use text.match()/RegExp.exec without shared regex instances.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/security/injection-detection.ts
Line: 133:135
Comment:
[P0] `hasMatch` uses `RegExp.test` on patterns that include the `g` flag (e.g., `MANY_SHOT_INDICATORS`), which mutates `lastIndex` and can cause subsequent `test()` calls to flip between true/false across messages. This makes detection non-deterministic depending on call order and prior inputs.
This matters when any `..._PATTERNS` array accidentally includes a global regex (now or in future), and it already affects any future reuse of `MANY_SHOT_INDICATORS` via `hasMatch`.
A safer approach is to avoid `/g` in patterns used with `test`, or to reset `lastIndex` before testing, or to use `text.match()`/`RegExp.exec` without shared regex instances.
How can I resolve this? If you propose a fix, please make it concise.| const matches = text.match(pattern); | ||
| if (matches) { | ||
| count += matches.length; | ||
| } |
There was a problem hiding this comment.
[P2] countMatches uses text.match(pattern) with patterns that have /g; this allocates arrays of all matches for every pattern. Since this code may run on user-provided content, it could become unnecessarily expensive for large messages/histories.
Consider using pattern.exec in a loop (or matchAll) to count without building a full array, especially for the multi-turn combinedText case.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/security/injection-detection.ts
Line: 122:125
Comment:
[P2] `countMatches` uses `text.match(pattern)` with patterns that have `/g`; this allocates arrays of all matches for every pattern. Since this code may run on user-provided content, it could become unnecessarily expensive for large messages/histories.
Consider using `pattern.exec` in a loop (or `matchAll`) to count without building a full array, especially for the multi-turn `combinedText` case.
How can I resolve this? If you propose a fix, please make it concise.| /\[(SYSTEM|ADMIN|ROOT|SUDO|OVERRIDE|DEVELOPER)(\s+\w+)?\]/i, | ||
| /\[ADMIN\s+OVERRIDE\]/i, | ||
| /\[SYSTEM\s+OVERRIDE\]/i, | ||
| /ADMIN\s+(COMMAND|OVERRIDE|ACCESS):/i, |
There was a problem hiding this comment.
[P3] AUTHORITY_SPOOF_PATTERNS has overlapping entries: the first pattern /(SYSTEM|ADMIN|...)/ already matches [ADMIN OVERRIDE] and [SYSTEM OVERRIDE], making the two explicit override patterns redundant. Not a functional bug, but it increases maintenance cost and makes it harder to reason about coverage.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/security/injection-detection.ts
Line: 72:75
Comment:
[P3] `AUTHORITY_SPOOF_PATTERNS` has overlapping entries: the first pattern `/(SYSTEM|ADMIN|...)/` already matches `[ADMIN OVERRIDE]` and `[SYSTEM OVERRIDE]`, making the two explicit override patterns redundant. Not a functional bug, but it increases maintenance cost and makes it harder to reason about coverage.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| export type AttackType = | ||
| | "many_shot" // 3+ examples in message or across recent turns | ||
| | "crescendo" // Progressive deepening across turns | ||
| | "persona_hijack" // Roleplay/persona injection | ||
| | "cot_hijack" // Chain-of-thought manipulation | ||
| | "authority_spoof" // Fake system/admin messages | ||
| | "false_memory" // Fabricated prior agreements | ||
| | "indirect" // Hidden in documents/code/HTML | ||
| | "encoding_bypass"; // Obfuscation attempts |
There was a problem hiding this comment.
[P3] AttackType includes "encoding_bypass", but there is no detection logic that can emit it. If this is intentional future work, consider omitting it until implemented (or add at least a stub detector + test) to avoid misleading downstream consumers.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/security/injection-detection.ts
Line: 18:26
Comment:
[P3] `AttackType` includes `"encoding_bypass"`, but there is no detection logic that can emit it. If this is intentional future work, consider omitting it until implemented (or add at least a stub detector + test) to avoid misleading downstream consumers.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
This pull request has been automatically marked as stale due to inactivity. |
bfc1ccb to
f92900f
Compare
This comment was marked as spam.
This comment was marked as spam.
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
Adds stateful detection for sophisticated multi-turn prompt injection attacks.
Part 3 of 3 from Operation CLAW FORTRESS security hardening (split from #5863 for easier review).
New Files
src/security/injection-detection.tssrc/security/injection-detection.test.tsAttack Types Detected
many_shotcrescendopersona_hijackcot_hijackauthority_spooffalse_memoryindirectAPI
```typescript
// Quick check for obvious attacks
isLikelyAttack(content: string): boolean
// Full analysis with confidence scoring
detectAdvancedInjection(ctx: {
currentMessage: string;
recentHistory?: string[];
}): InjectionDetectionResult
```
ZeroLeaks Findings Addressed
Test Plan
🔒 Generated with Claude Code
Greptile Overview
Greptile Summary
Adds a new
src/security/injection-detection.tsmodule that detects several prompt-injection patterns (single-message and multi-turn), producing adetectedflag,attackTypes,confidence, and human-readabledetails. Addssrc/security/injection-detection.test.tswith unit/regression tests covering each attack type plus multi-turn scenarios, including a small suite of "ZeroLeaks" payload regressions.This fits into the repo’s broader security hardening by providing a standalone classifier that callers can use either as a fast-path (
isLikelyAttack) or a richer analysis (detectAdvancedInjection) that can incorporate recent conversation history.Confidence Score: 3/5
hasMatchrelies onRegExp.testand some pattern sets include global regexes; this can lead to statefullastIndexbehavior and flaky/non-deterministic detections depending on call order.(5/5) You can turn off certain types of comments like style here!
Context used:
dashboard- CLAUDE.md (source)dashboard- AGENTS.md (source)