Skip to content

bug: runaway tool loop — model repeats completed work, ignores user corrections #350

@Aaronontheweb

Description

@Aaronontheweb

Problem

In a long-running Slack session, the bot successfully scaffolded a GitHub repository (mkdir, file_write, git push — all completed by tool iteration 5-6), then continued for 15+ more iterations redoing its own work: git ls-files three times, ls -la keywords/ config/ twice, re-creating .gitignore, re-committing files, etc.

User sent multiple correction messages ("woah stop", "the repo already exists", "you already pushed to here") but they were buffered and never seen by the LLM because the tool loop was still in progress. The session grew from 44 to 124 messages within a single turn before being manually killed.

Root Cause (Two Issues)

1. Model fails to self-terminate after completing the task
The LLM completed the goal around iteration 5-6 but kept going — verifying, re-checking, and eventually re-doing work. This is a model quality issue exacerbated by using a smaller model (Qwen 3.5 35B). The current MaxToolCallsPerTurn=30 is too permissive for models with weak self-termination behavior.

2. User messages are buffered during tool loops, not injected
LlmSessionActor buffers all incoming user messages with "Buffering user message (LLM call in progress)" while a tool loop runs. The user's corrections are never seen by the LLM until the turn completes. If the user says "stop" at iteration 6, the LLM doesn't see it until after iteration 30.

Session Evidence

Session D0AC6CKBK5K/1773983518.003979 on 2026-03-21:

13:35:49 — turn starts, messages=44
13:36:11 — iter 2: mkdir -p pb-marketing-automations/{scripts,keywords,docs,tests,config}
13:37:07 — iter 3: file_write README.md (large, detailed)
13:37:30 — iter 4: cp reddit-opportunity-scanner.py to repo
13:37:54 — iter 5: file_write requirements.txt
           ^^^ TASK IS DONE HERE ^^^
13:45:28 — iter ??: ls -la keywords/ config/  (re-checking)
13:45:58 — git remote add origin (already set up)
13:46:53 — git push -u origin master (already pushed)
13:48:07 — git remote remove origin (push failed, starts over)
13:48:39 — git ls-files (third time)
13:49:48 — file_write .gitignore (already created)
13:50:19 — git add .gitignore
13:50:50 — git add keywords/ config/
13:51:21 — git commit "Add keywords and configuration files"
13:51:57 — git log --oneline
13:52:29 — git status
13:53:03 — git ls-files (fourth time)
13:53:39 — ls -la keywords/ config/ (second time)
13:54:16 — file_read .gitignore (reading what it just wrote)
13:55:00 — messages=118, still going
13:56:43 — messages=124, manually killed

5 user messages were buffered and never processed during this loop.

Current Tool Loop Architecture

From SessionConfig.cs and LlmSessionActor.cs:

Industry Research

Framework Default Cap Loop Detection
OpenAI Agents SDK 10 turns None
LangChain AgentExecutor 15 iterations early_stopping_method="generate"
LangGraph 25 recursion (~12 effective) None
Cursor (standard) 25 tool calls None, user clicks "Continue"
Cursor MAX 200 tool calls None
Gemini CLI 5 consecutive identical calls SHA-256 hash of tool+args, LLM-as-judge every 10 turns, content repetition detection
Claude Code SDK No default limit max_turns, max_budget_usd params
AutoGen/AG2 100 auto-replies is_termination_msg callback

Notable: Only Gemini CLI implements semantic loop detection (duplicate hashing + LLM judge). All others rely on hard caps. Gemini's approach has high false-positive rates on legitimate iterative work.

Proposed Fixes (Layered)

Fix 1: Drain buffered user messages between tool iterations (high impact, moderate effort)

After turn_tool_execution_complete, before the next turn_llm_call_start, check if user messages are buffered. If so, inject them into the conversation history before the next LLM call. This lets the user steer the agent mid-loop.

This would have directly fixed the observed incident — user said "stop" at iteration 6, and the LLM would have seen it before iteration 7.

Fix 2: Duplicate tool call detection (moderate impact, low effort)

Track hash(toolName + argsJson) for each tool call within a turn. If the same hash appears N times (e.g., 3), inject a warning: "You've called {tool} with the same arguments {N} times this turn. If the task is complete, produce your final response."

In this session, git ls-files was called 4 times with identical args — a clear signal.

Fix 3: Configurable cap with model-aware defaults (low impact, low effort)

Rather than one global MaxToolCallsPerTurn, allow per-model defaults. Frontier models (Claude, GPT-4) can handle 30+. Smaller models (Qwen, Llama) should default lower (15-20). The existing 75% nudge + 100% force-stop framework is good — just adjust the number.

Fix 4: Progress detection heuristic (moderate impact, higher effort)

If the last N tool iterations produced no new text output to the user (only tool calls and results), inject a progress check: "You've made {N} tool calls without updating the user. Summarize your progress or confirm the task is complete."

Fix 5 (deferred): LLM-as-judge convergence check

Like Gemini CLI's approach — every N iterations, run a cheap sidecar LLM call asking "Is the agent making progress or looping?" Only worthwhile if simpler heuristics (Fix 2, Fix 4) prove insufficient.

Priority Recommendation

Fix 1 (drain buffered messages) is the highest-value change. It doesn't just solve loops — it makes the agent responsive to user steering at all times, which is a fundamental UX improvement. The user should never have to wait for a 30-iteration tool loop to finish before their "stop" message is heard.

Fix 2 (duplicate detection) is cheap insurance on top.

Relevant Code

  • src/Netclaw.Configuration/SessionConfig.cs:58MaxToolCallsPerTurn = 30
  • src/Netclaw.Actors/Sessions/LlmSessionActor.cs:616-639 — tool budget enforcement
  • src/Netclaw.Actors/Sessions/LlmSessionActor.cs:265-266 — counter reset logic
  • src/Netclaw.Actors/Sessions/LlmSessionActor.cs — "Buffering user message" logic
  • src/Netclaw.Actors.Tests/Sessions/MaxToolIterationTests.cs — existing test coverage

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions