loopDetection does not catch repeated exec tool calls

## Bug: loopDetection does not catch repeated `exec` tool calls

### Environment
- OpenClaw version: v2026.3.2
- Model: Kimi K2.5 (via DashScope/Bailian, OpenAI-compatible API)
- Channel: webchat (control-ui)

### Description

The `tools.loopDetection` config fires for the `read` tool but **does not detect or block repeated `exec` tool calls**. When a model enters an infinite tool-call loop (calling the same exec command repeatedly with identical results), OpenClaw executes it indefinitely.

In our case, the model called the same shell command **121 times** in one session. Each call returned `exitCode=0` with valid output, but the model kept issuing the same tool call (`stopReason: toolUse`) and never generated text.

### Config

```json
"tools": {
  "exec": { "host": "gateway", "security": "full", "ask": "off" },
  "loopDetection": {
    "enabled": true,
    "historySize": 10,
    "warningThreshold": 2,
    "criticalThreshold": 3,
    "globalCircuitBreakerThreshold": 5,
    "detectors": {
      "genericRepeat": true,
      "knownPollNoProgress": true,
      "pingPong": true
    }
  }
}
```

### Session evidence

All 122 assistant messages from the looping session:

```
#1   model=kimi-k2.5  output=26  stop=toolUse  tools=[read]   ← read SKILL.md (skill activation)
#2   model=kimi-k2.5  output=29  stop=toolUse  tools=[exec]   ← grafana-api.sh datasources
#3   model=kimi-k2.5  output=29  stop=toolUse  tools=[exec]   ← same command (identical)
#4   model=kimi-k2.5  output=29  stop=toolUse  tools=[exec]   ← same command (identical)
...
#122 model=kimi-k2.5  output=29  stop=toolUse  tools=[exec]   ← same command (identical)
```

Every exec `toolResult`:

```json
{
  "role": "toolResult",
  "toolCallId": "functions.exec:0",
  "toolName": "exec",
  "content": [{"type": "text", "text": "{\"id\":2,\"name\":\"Loki\"...}"}],
  "details": {"status": "completed", "exitCode": 0, "durationMs": 136},
  "isError": false
}
```

Result was correct every time — `exitCode=0`, valid JSON, `isError=false`. The model never produced text output.

### Expected behavior

`loopDetection` should detect the same `exec` command repeating with identical results and:
1. Warn after `warningThreshold` (2) repetitions
2. Block and force text response after `criticalThreshold` (3)
3. Trip circuit breaker after `globalCircuitBreakerThreshold` (5)

### Actual behavior

No warning, no blocking, no circuit breaker. The loop runs until manual intervention.

### Impact

- Session grows unbounded (183KB+ observed), wasting API tokens
- User gets no response (bot appears frozen)
- **For write operations (e.g., Jira issue creation), the loop creates duplicates**
- Requires manual session file cleanup

### Cross-model verification

We isolated this as a model-specific issue by testing all available models with the same tool call sequence (`read → exec → expect text`). The loopDetection gap affects all models, but only Kimi K2.5 currently triggers it:

| Model | Provider | Scenario: read→exec | Scenario: direct exec | Scenario: Jira create |
|-------|----------|---------------------|----------------------|----------------------|
| **kimi-k2.5** | bailian | ❌ LOOP (3 rounds) | ❌ LOOP (3 rounds) | ❌ LOOP (3 rounds) |
| qwen3.5-plus | bailian | ✅ PASS | ✅ PASS | ✅ PASS |
| qwen3-coder-plus | bailian | ✅ PASS | ✅ PASS | ✅ PASS |
| qwen3-coder-next | bailian | ✅ PASS | ✅ PASS | ✅ PASS |
| MiniMax-M2.5 | bailian | ✅ PASS | ✅ PASS | ✅ PASS |
| glm-5 | bailian | ✅ PASS | ✅ PASS | ✅ PASS |
| qwen35-a3b-iq4xs | local (llama.cpp) | ✅ PASS | ✅ PASS | ✅ PASS |

Even though only one model loops today, **any model could develop this behavior after an update**, and loopDetection should catch it regardless.

### Suggestion

- Extend `genericRepeat` detector to cover `exec` tool (compare command strings + output)
- Add a hard cap on consecutive tool-only turns without any text output (e.g., max 10)

### Test script

We wrote an automated test to verify this. API keys via env vars, no hardcoded credentials:

<details>
<summary>test-model-tool-loop.py</summary>

````python
#!/usr/bin/env python3
"""
Model Tool-Call Loop Detection Script

Tests whether LLM models enter infinite tool-call loops when handling
consecutive tool calls (read → exec → should respond with text).

Usage:
    python3 test-model-tool-loop.py                    # Test all models
    python3 test-model-tool-loop.py kimi-k2.5 glm-5   # Test specific models
    python3 test-model-tool-loop.py --rounds 5         # Max loop detection rounds

Environment variables:
    BAILIAN_API_KEY     - DashScope API key
    SILICONFLOW_API_KEY - SiliconFlow API key  
    LOCAL_API_KEY       - Local model API key
    LOCAL_API_URL       - Local model endpoint (default: http://localhost:8080/v1)
"""

import json, urllib.request, ssl, sys, time, argparse, os

# ── Provider config (keys from env) ──────────────────────────
PROVIDERS = {
    "bailian": {
        "url": "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions",
        "key_env": "BAILIAN_API_KEY",
        "models": ["kimi-k2.5", "qwen3.5-plus", "qwen3-coder-plus", "qwen3-coder-next", "MiniMax-M2.5", "glm-5"],
    },
    "siliconflow": {
        "url": "https://api.siliconflow.cn/v1/chat/completions",
        "key_env": "SILICONFLOW_API_KEY",
        "models": ["Pro/moonshotai/Kimi-K2.5"],
    },
    "local": {
        "url": os.environ.get("LOCAL_API_URL", "http://localhost:8080/v1") + "/chat/completions",
        "key_env": "LOCAL_API_KEY",
        "models": ["default"],
    },
}

# ── Tool definitions ─────────────────────────────────────────
TOOLS = [
    {"type": "function", "function": {
        "name": "exec",
        "description": "Execute a shell command on the host. Returns stdout/stderr and exit code.",
        "parameters": {"type": "object", "properties": {
            "command": {"type": "string", "description": "Shell command to execute"},
        }, "required": ["command"]}
    }},
    {"type": "function", "function": {
        "name": "read",
        "description": "Read a file from the filesystem. Returns the file content.",
        "parameters": {"type": "object", "properties": {
            "path": {"type": "string", "description": "Absolute file path to read"}
        }, "required": ["path"]}
    }}
]

# ── Test fixtures ────────────────────────────────────────────
SYSTEM_PROMPT = """You are an AI assistant that helps with infrastructure tasks.
Rules: Use exec tool to run commands. After getting tool results, respond to the user with a clear summary."""

SKILL_CONTENT = """---
name: grafana
description: Query Grafana dashboards, alerts, datasources via REST API.
---
# Grafana
## Helper Script
Use `grafana-api.sh` for all operations:
Commands:
grafana-api.sh health
grafana-api.sh datasources
grafana-api.sh dashboards [query]
grafana-api.sh alerts

"""

EXEC_RESULT = """{
  "id": 1, "name": "Prometheus", "type": "prometheus", "url": "http://prometheus:9090"
}
{
  "id": 2, "name": "Loki", "type": "loki", "url": "http://loki:3100"
}
{
  "id": 3, "name": "Tempo", "type": "tempo", "url": "http://tempo:3200"
}"""

JIRA_RESULT = '{"id":"40100","key":"PROJ-42","self":"https://jira.example.com/rest/api/2/issue/40100"}'

# ── Test scenarios ───────────────────────────────────────────
SCENARIOS = {
    "read→exec (skill activation flow)": lambda: [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "List all datasources"},
        {"role": "assistant", "content": None, "tool_calls": [{
            "id": "c1", "type": "function",
            "function": {"name": "read", "arguments": json.dumps({"path": "/skills/grafana/SKILL.md"})}
        }]},
        {"role": "tool", "tool_call_id": "c1", "content": SKILL_CONTENT},
        {"role": "assistant", "content": None, "tool_calls": [{
            "id": "c2", "type": "function",
            "function": {"name": "exec", "arguments": json.dumps({"command": "grafana-api.sh datasources"})}
        }]},
        {"role": "tool", "tool_call_id": "c2", "content": EXEC_RESULT},
    ],
    "direct exec (no read)": lambda: [
        {"role": "system", "content": SYSTEM_PROMPT + "\n\n" + SKILL_CONTENT},
        {"role": "user", "content": "List all datasources"},
        {"role": "assistant", "content": None, "tool_calls": [{
            "id": "c1", "type": "function",
            "function": {"name": "exec", "arguments": json.dumps({"command": "grafana-api.sh datasources"})}
        }]},
        {"role": "tool", "tool_call_id": "c1", "content": EXEC_RESULT},
    ],
    "write op (duplicate risk)": lambda: [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "Create a task: Test Grafana alerts"},
        {"role": "assistant", "content": None, "tool_calls": [{
            "id": "c1", "type": "function",
            "function": {"name": "exec", "arguments": json.dumps({"command": "jira-cli create PROJ Task \"Test Grafana alerts\""})}
        }]},
        {"role": "tool", "tool_call_id": "c1", "content": JIRA_RESULT},
    ],
}


def call_api(api_url, api_key, model, messages, timeout=30):
    payload = {"model": model, "messages": messages, "tools": TOOLS, "max_tokens": 2048}
    data = json.dumps(payload).encode('utf-8')
    req = urllib.request.Request(api_url, data=data, method='POST', headers={
        'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json',
    })
    ctx = ssl.create_default_context()
    with urllib.request.urlopen(req, context=ctx, timeout=timeout) as resp:
        return json.loads(resp.read().decode('utf-8'))


def test_model(provider_name, api_url, api_key, model, max_rounds=3):
    results = {}
    for scenario_name, build_messages in SCENARIOS.items():
        messages = build_messages()
        looped = False
        rounds = 0
        error = None
        latency = 0
        try:
            t0 = time.time()
            timeout = 60 if provider_name == "local" else 30
            r = call_api(api_url, api_key, model, messages, timeout)
            latency = time.time() - t0
            choice = r['choices'][0]
            msg = choice['message']
            tc = msg.get('tool_calls')
            if tc and choice['finish_reason'] in ('tool_calls', 'tool_call'):
                looped = True
                rounds = 1
                for i in range(2, max_rounds + 1):
                    messages.append(msg)
                    messages.append({"role": "tool", "tool_call_id": tc[0]['id'], "content": EXEC_RESULT})
                    r = call_api(api_url, api_key, model, messages, timeout)
                    choice = r['choices'][0]
                    msg = choice['message']
                    tc = msg.get('tool_calls')
                    if not tc or choice['finish_reason'] not in ('tool_calls', 'tool_call'):
                        looped = False
                        rounds = i
                        break
                    rounds = i
        except Exception as e:
            error = str(e)[:80]
        results[scenario_name] = {"looped": looped, "rounds": rounds, "error": error, "latency": round(latency, 1)}
    return results


def main():
    parser = argparse.ArgumentParser(description="Model Tool-Call Loop Detection")
    parser.add_argument("models", nargs="*", help="Model IDs to test (default: all)")
    parser.add_argument("--rounds", type=int, default=3, help="Max loop detection rounds (default: 3)")
    parser.add_argument("--provider", type=str, help="Provider name (with --model)")
    parser.add_argument("--model", type=str, help="Model ID (with --provider)")
    args = parser.parse_args()

    # Build test list
    test_list = []
    if args.provider and args.model:
        p = PROVIDERS.get(args.provider)
        if not p:
            print(f"Unknown provider: {args.provider}"); sys.exit(1)
        key = os.environ.get(p["key_env"], "")
        if not key:
            print(f"Set {p['key_env']} env var"); sys.exit(1)
        test_list.append((args.provider, p["url"], key, args.model))
    elif args.models:
        for model_id in args.models:
            for pname, pconf in PROVIDERS.items():
                if model_id in pconf["models"]:
                    key = os.environ.get(pconf["key_env"], "")
                    if key:
                        test_list.append((pname, pconf["url"], key, model_id))
                    else:
                        print(f"Skip {pname}/{model_id}: {pconf['key_env']} not set")
                    break
    else:
        for pname, pconf in PROVIDERS.items():
            key = os.environ.get(pconf["key_env"], "")
            if not key:
                print(f"Skip {pname}: {pconf['key_env']} not set")
                continue
            for m in pconf["models"]:
                test_list.append((pname, pconf["url"], key, m))

    if not test_list:
        print("No models to test. Set API key env vars."); sys.exit(1)

    print(f"Model Tool-Call Loop Detection")
    print(f"Rounds: {args.rounds} | Models: {len(test_list)} | Scenarios: {len(SCENARIOS)}")
    print(f"{'='*75}")

    all_results = {}
    for pname, url, key, model in test_list:
        label = f"{pname}/{model}"
        print(f"\nTesting {label} ...", end="", flush=True)
        results = test_model(pname, url, key, model, args.rounds)
        all_results[label] = results
        has_loop = any(r["looped"] for r in results.values())
        has_error = any(r["error"] for r in results.values())
        print(" FAIL (loop)" if has_loop else " ERROR" if has_error else " PASS")

    # Summary
    print(f"\n\n{'='*75}")
    print("SUMMARY")
    print(f"{'='*75}\n")

    for scenario_name in SCENARIOS:
        print(f"Scenario: {scenario_name}")
        print(f"{'-'*60}")
        print(f"{'Model':<40} {'Result':<10} {'Rounds':<8} {'Latency'}")
        print(f"{'-'*60}")
        for label, results in all_results.items():
            r = results[scenario_name]
            if r["error"]:
                status = "ERROR"
            elif r["looped"]:
                status = "FAIL"
            elif r["rounds"] > 0:
                status = "WARN"
            else:
                status = "PASS"
            latency = f"{r['latency']}s" if r['latency'] else "-"
            print(f"{label:<40} {status:<10} {r['rounds']:<8} {latency}")
        print()

    loop_models = [l for l, res in all_results.items() if any(r["looped"] for r in res.values())]
    safe_models = [l for l, res in all_results.items()
                   if not any(r["looped"] for r in res.values()) and not any(r["error"] for r in res.values())]

    print(f"{'='*75}")
    if loop_models:
        print(f"FAIL: {', '.join(loop_models)}")
    if safe_models:
        print(f"PASS: {', '.join(safe_models)}")
    print(f"{'='*75}")
    sys.exit(1 if loop_models else 0)


if __name__ == "__main__":
    main()
````

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

loopDetection does not catch repeated exec tool calls #34574

Bug: loopDetection does not catch repeated `exec` tool calls

Environment

Description

Config

Session evidence

Expected behavior

Actual behavior

Impact

Cross-model verification

Suggestion

Test script

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Provider	Scenario: read→exec	Scenario: direct exec	Scenario: Jira create
kimi-k2.5	bailian	❌ LOOP (3 rounds)	❌ LOOP (3 rounds)	❌ LOOP (3 rounds)
qwen3.5-plus	bailian	✅ PASS	✅ PASS	✅ PASS
qwen3-coder-plus	bailian	✅ PASS	✅ PASS	✅ PASS
qwen3-coder-next	bailian	✅ PASS	✅ PASS	✅ PASS
MiniMax-M2.5	bailian	✅ PASS	✅ PASS	✅ PASS
glm-5	bailian	✅ PASS	✅ PASS	✅ PASS
qwen35-a3b-iq4xs	local (llama.cpp)	✅ PASS	✅ PASS	✅ PASS

Uh oh!

loopDetection does not catch repeated exec tool calls #34574

Description

Bug: loopDetection does not catch repeated exec tool calls

Environment

Description

Config

Session evidence

Expected behavior

Actual behavior

Impact

Cross-model verification

Suggestion

Test script

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug: loopDetection does not catch repeated `exec` tool calls