fix(security): add advanced multi-turn attack detection by dan-redcupit · Pull Request #5924 · openclaw/openclaw

dan-redcupit · 2026-02-01T03:47:50Z

Summary

Adds stateful detection for sophisticated multi-turn prompt injection attacks.

Part 3 of 3 from Operation CLAW FORTRESS security hardening (split from #5863 for easier review).

New Files

File	Purpose
`src/security/injection-detection.ts`	Attack detection logic
`src/security/injection-detection.test.ts`	Comprehensive tests

Attack Types Detected

Type	Description
`many_shot`	3+ examples in message building a pattern
`crescendo`	Progressive trust-building across turns
`persona_hijack`	DAN, roleplay, developer mode injection
`cot_hijack`	Chain-of-thought manipulation
`authority_spoof`	Fake [ADMIN], [SYSTEM] markers
`false_memory`	Fabricated prior agreements
`indirect`	Hidden in code/HTML comments

API

```typescript
// Quick check for obvious attacks
isLikelyAttack(content: string): boolean

// Full analysis with confidence scoring
detectAdvancedInjection(ctx: {
currentMessage: string;
recentHistory?: string[];
}): InjectionDetectionResult
```

ZeroLeaks Findings Addressed

Many-shot priming (3.2, 3.9)
Crescendo attacks (3.3, 3.10)
Persona injection (3.6, 4.1)
Authority spoofing (4.1)

Test Plan

Unit tests for all attack types
Multi-turn conversation tests
Regression tests with ZeroLeaks payloads

🔒 Generated with Claude Code

Greptile Overview

Greptile Summary

Adds a new src/security/injection-detection.ts module that detects several prompt-injection patterns (single-message and multi-turn), producing a detected flag, attackTypes, confidence, and human-readable details. Adds src/security/injection-detection.test.ts with unit/regression tests covering each attack type plus multi-turn scenarios, including a small suite of "ZeroLeaks" payload regressions.

This fits into the repo’s broader security hardening by providing a standalone classifier that callers can use either as a fast-path (isLikelyAttack) or a richer analysis (detectAdvancedInjection) that can incorporate recent conversation history.

Confidence Score: 3/5

Mostly safe to merge, but there is a real determinism bug risk in regex matching that could cause inconsistent detection results.
Core logic is straightforward and well-tested, but hasMatch relies on RegExp.test and some pattern sets include global regexes; this can lead to stateful lastIndex behavior and flaky/non-deterministic detections depending on call order.
src/security/injection-detection.ts

_{(5/5) You can turn off certain types of comments like style here!}

Context used:

Context from dashboard - CLAUDE.md (source)
Context from dashboard - AGENTS.md (source)

New security module for stateful detection of sophisticated attacks: - src/security/injection-detection.ts: detects many-shot, crescendo, persona hijack, CoT hijack, authority spoof, false memory, and indirect injection attacks - src/security/injection-detection.test.ts: comprehensive tests with ZeroLeaks regression payloads Features: - Single message attack detection - Multi-turn conversation analysis - Confidence scoring based on attack severity - Quick-check function for obvious attacks Attack types detected: - Many-shot priming (3+ examples building pattern) - Crescendo (progressive trust-building) - Persona hijack (DAN, roleplay injection) - Chain-of-thought hijack - Authority spoofing ([ADMIN], [SYSTEM]) - False memory (fabricated prior agreements) - Indirect injection (hidden in code/HTML comments) Part of Operation CLAW FORTRESS security hardening (ZeroLeaks remediation). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

greptile-apps

_{1 file reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-01T05:36:17Z

src/security/injection-detection.ts

+function hasMatch(text: string, patterns: RegExp[]): boolean {
+  return patterns.some((p) => p.test(text));
+}


[P0] hasMatch uses RegExp.test on patterns that include the g flag (e.g., MANY_SHOT_INDICATORS), which mutates lastIndex and can cause subsequent test() calls to flip between true/false across messages. This makes detection non-deterministic depending on call order and prior inputs.

This matters when any ..._PATTERNS array accidentally includes a global regex (now or in future), and it already affects any future reuse of MANY_SHOT_INDICATORS via hasMatch.

A safer approach is to avoid /g in patterns used with test, or to reset lastIndex before testing, or to use text.match()/RegExp.exec without shared regex instances.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/security/injection-detection.ts Line: 133:135 Comment: [P0] `hasMatch` uses `RegExp.test` on patterns that include the `g` flag (e.g., `MANY_SHOT_INDICATORS`), which mutates `lastIndex` and can cause subsequent `test()` calls to flip between true/false across messages. This makes detection non-deterministic depending on call order and prior inputs. This matters when any `..._PATTERNS` array accidentally includes a global regex (now or in future), and it already affects any future reuse of `MANY_SHOT_INDICATORS` via `hasMatch`. A safer approach is to avoid `/g` in patterns used with `test`, or to reset `lastIndex` before testing, or to use `text.match()`/`RegExp.exec` without shared regex instances. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-01T05:36:19Z

src/security/injection-detection.ts

+    const matches = text.match(pattern);
+    if (matches) {
+      count += matches.length;
+    }


[P2] countMatches uses text.match(pattern) with patterns that have /g; this allocates arrays of all matches for every pattern. Since this code may run on user-provided content, it could become unnecessarily expensive for large messages/histories.

Consider using pattern.exec in a loop (or matchAll) to count without building a full array, especially for the multi-turn combinedText case.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/security/injection-detection.ts Line: 122:125 Comment: [P2] `countMatches` uses `text.match(pattern)` with patterns that have `/g`; this allocates arrays of all matches for every pattern. Since this code may run on user-provided content, it could become unnecessarily expensive for large messages/histories. Consider using `pattern.exec` in a loop (or `matchAll`) to count without building a full array, especially for the multi-turn `combinedText` case. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-01T05:36:20Z

src/security/injection-detection.ts

+  /\[(SYSTEM|ADMIN|ROOT|SUDO|OVERRIDE|DEVELOPER)(\s+\w+)?\]/i,
+  /\[ADMIN\s+OVERRIDE\]/i,
+  /\[SYSTEM\s+OVERRIDE\]/i,
+  /ADMIN\s+(COMMAND|OVERRIDE|ACCESS):/i,


[P3] AUTHORITY_SPOOF_PATTERNS has overlapping entries: the first pattern /(SYSTEM|ADMIN|...)/ already matches [ADMIN OVERRIDE] and [SYSTEM OVERRIDE], making the two explicit override patterns redundant. Not a functional bug, but it increases maintenance cost and makes it harder to reason about coverage.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/security/injection-detection.ts Line: 72:75 Comment: [P3] `AUTHORITY_SPOOF_PATTERNS` has overlapping entries: the first pattern `/(SYSTEM|ADMIN|...)/` already matches `[ADMIN OVERRIDE]` and `[SYSTEM OVERRIDE]`, making the two explicit override patterns redundant. Not a functional bug, but it increases maintenance cost and makes it harder to reason about coverage. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-01T05:36:21Z

src/security/injection-detection.ts

+export type AttackType =
+  | "many_shot" // 3+ examples in message or across recent turns
+  | "crescendo" // Progressive deepening across turns
+  | "persona_hijack" // Roleplay/persona injection
+  | "cot_hijack" // Chain-of-thought manipulation
+  | "authority_spoof" // Fake system/admin messages
+  | "false_memory" // Fabricated prior agreements
+  | "indirect" // Hidden in documents/code/HTML
+  | "encoding_bypass"; // Obfuscation attempts


[P3] AttackType includes "encoding_bypass", but there is no detection logic that can emit it. If this is intentional future work, consider omitting it until implemented (or add at least a stub detector + test) to avoid misleading downstream consumers.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/security/injection-detection.ts Line: 18:26 Comment: [P3] `AttackType` includes `"encoding_bypass"`, but there is no detection logic that can emit it. If this is intentional future work, consider omitting it until implemented (or add at least a stub detector + test) to avoid misleading downstream consumers. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

openclaw-barnacle · 2026-02-15T04:08:59Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

openclaw-barnacle · 2026-03-07T04:03:06Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

greptile-apps bot reviewed Feb 1, 2026

View reviewed changes

Reapor-Yurnero mentioned this pull request Feb 3, 2026

feat(gateway): support modular guardrails extensions for securing against indirect prompt injections and other agentic threats #6095

Closed

openclaw-barnacle bot added the stale Marked as stale due to inactivity label Feb 15, 2026

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

openclaw-barnacle bot removed the stale Marked as stale due to inactivity label Feb 16, 2026

This comment was marked as spam.

Sign in to view

openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(security): add advanced multi-turn attack detection#5924

fix(security): add advanced multi-turn attack detection#5924
dan-redcupit wants to merge 1 commit intoopenclaw:mainfrom
dan-redcupit:fix/advanced-attack-detection

dan-redcupit commented Feb 1, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 1, 2026

Uh oh!

greptile-apps bot Feb 1, 2026

Uh oh!

greptile-apps bot Feb 1, 2026

Uh oh!

greptile-apps bot Feb 1, 2026

Uh oh!

openclaw-barnacle bot commented Feb 15, 2026

Uh oh!

This comment was marked as spam.

openclaw-barnacle bot commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dan-redcupit commented Feb 1, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Files

Attack Types Detected

API

ZeroLeaks Findings Addressed

Test Plan

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

openclaw-barnacle bot commented Feb 15, 2026

Uh oh!

This comment was marked as spam.

openclaw-barnacle bot commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dan-redcupit commented Feb 1, 2026 •

edited by greptile-apps bot

Loading