Skip to content

ReDoS in analyzeInterpreterHeuristicsFromUnquoted causes gateway 100% CPU hang #61881

@keen0206

Description

@keen0206

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

The analyzeInterpreterHeuristicsFromUnquoted() function in pi-embedded-helpers contains two regex patterns with catastrophic backtracking (ReDoS). When the exec tool processes a long shell command containing VAR=value assignments followed by extensive text (e.g. HTML heredocs), the regex engine enters an infinite loop, consuming 100% CPU and permanently blocking the Node.js event loop. The gateway becomes completely unresponsive — no HTTP requests can be served, no new WebSocket connections can be established.

Root cause regex (line ~3084 in pi-embedded-BYdcxQ5A.js):

/(?:^|\s|(?<!\\)[|&;()])(?:[A-Za-z_][A-Za-z0-9_]*=.*\s+)*python/i
/(?:^|\s|(?<!\\)[|&;()])(?:[A-Za-z_][A-Za-z0-9_]*=.*\s+)*node/i

The inner .*\s+ is the problem: both .* and \s+ can match whitespace characters. When surrounded by the ()* outer quantifier, this creates exponentially many ways to partition whitespace, causing catastrophic backtracking on inputs with spaces.

Steps to reproduce

  1. Configure an agent with qwen/glm-5 or qwen/qwen3.5-plus model via dashscope API
  2. Ask the agent to generate a shell command that writes HTML content using a heredoc, e.g.:
    ACCESS_TOKEN=$(curl -s "https://api.example.com/token" | jq -r '.access_token')
    cat > /tmp/content.html << 'HTMLEOF'
    <section style="padding: 30px 20px; font-family: ...">
    ... (4000+ chars of HTML with many spaces) ...
    </section>
    HTMLEOF
  3. The exec tool receives this command and calls analyzeInterpreterHeuristicsFromUnquoted(raw)
  4. The python/node detection regex runs .test() on the full command string
  5. Gateway process hangs at 100% CPU indefinitely

Minimal reproduction:

const re = /(?:^|\s|(?<!\\)[|&;()])(?:[A-Za-z_][A-Za-z0-9_]*=.*\s+)*python(?:3)?(?=$|[\s|&;()])/i;
const cmd = fs.readFileSync("command_with_html_heredoc.txt", "utf8"); // ~5KB shell script
re.test(cmd); // HANGS FOREVER

Expected behavior

The regex should complete in O(n) time regardless of input. The analyzeInterpreterHeuristicsFromUnquoted function should return quickly even for long shell commands with embedded HTML/text content.

Actual behavior

The gateway process enters an infinite CPU loop inside RegExpPrototypeTest (confirmed via macOS sample profiling — 800/800 samples in regex engine). The process:

  • Consumes 100% CPU permanently
  • Cannot serve any HTTP requests (all connections hang)
  • Cannot accept new WebSocket connections
  • Cannot be stopped with SIGTERM (requires SIGKILL)
  • LLM API responses time out because the event loop is blocked

The gateway must be force-killed (kill -9) and restarted.

OpenClaw version

2026.4.2

Operating system

macOS 26.3.1 (Darwin 25.3.0, arm64)

Install method

npm global

Model

qwen/glm-5, qwen/qwen3.5-plus

Provider / routing chain

dashscope (Alibaba Cloud)

Additional provider/model setup details

N/A — the bug is model-independent. It triggers whenever the LLM generates a long shell command with VAR=value assignments followed by text containing many whitespace characters (e.g. HTML heredocs, embedded JSON).

Logs, screenshots, and evidence

**Process sample** (macOS `sample` command on the stuck gateway):

800/800 samples (100%) in:
  node::crypto::TLSWrap::OnStreamRead
    → v8::internal::MicrotaskQueue::RunMicrotasks
      → Builtins_AsyncFunctionAwaitResolveClosure
        → Builtins_RegExpPrototypeTest  ← stuck here


**Error log** shows repeated LLM timeouts caused by blocked event loop:

[diagnostic] lane task error: lane=session:agent:main:main error="FailoverError: LLM request timed out."


**Process state**: `R` (running), 100% CPU, 105+ minutes accumulated CPU time, unresponsive to SIGTERM.

Impact and severity

This is a critical bug:

  • Severity: Gateway becomes permanently unresponsive, requiring force-kill
  • Affected: Any user whose agent generates shell commands with heredocs or long text content containing whitespace
  • Impact: Complete denial of service — all channels (webchat, Feishu, etc.) stop working
  • Trigger: Normal agent usage (not adversarial input) — the triggering command was a WeChat article publishing script with embedded HTML

Additional information

Suggested fix: Replace .* with \S* in both regex patterns to eliminate the overlap between .* and \s+:

- /(?:[A-Za-z_][A-Za-z0-9_]*=.*\s+)*python/i
+ /(?:[A-Za-z_][A-Za-z0-9_]*=\S*\s+)*python/i

- /(?:[A-Za-z_][A-Za-z0-9_]*=.*\s+)*node/i
+ /(?:[A-Za-z_][A-Za-z0-9_]*=\S*\s+)*node/i

This fix:

  • Eliminates catastrophic backtracking (0.15ms vs infinite hang on the same input)
  • Preserves correct detection of VAR=value python patterns
  • Is semantically more accurate: env var values in shell assignments cannot contain unescaped spaces

Source file: src/agents/pi-embedded-helpers/failover-matches.ts (or equivalent source for analyzeInterpreterHeuristicsFromUnquoted)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions