Skip to content

[Bug] Bash tool metadata drops loop diagnostics so repeated commands bypass the loop gate #489

@Astro-Han

Description

@Astro-Han

What happened?

A Kimi K2.6 session got into a repeated successful Bash command loop while reviewing PR #487. The same command was executed 39 times:

git diff dev..origin/codex/i477-auth-client-hotfix -- packages/app/src/utils/server.ts

The loop gate did not block or stop the run because the completed Bash tool parts in the exported session had no state.metadata.diagnostics.loop. Earlier read tool calls in the same session did carry loop diagnostics and injected repeat reminders, so the diagnostics path itself was active. The failure is specific to Bash metadata overwriting or dropping the previously attached loop diagnostics.

Root-cause evidence from local inspection:

  • SessionPrompt.applyLoopGate runs before tool execution.
  • SessionProcessor attaches SessionDiagnostics.observeToolCall(...) metadata on tool-call.
  • BashTool.run calls ctx.metadata({ metadata: { output, description } }) during execution and returns final metadata with only output, exit, description, and truncated.
  • The exported session shows 63 Bash parts with diagnostics.loop == null, while 14 Read parts have loop diagnostics.

Which area seems affected?

Model harness, prompts, tools, or session mechanics

How much does this affect you?

Breaks an important workflow

Steps to reproduce

  1. Start a PawWork session with Kimi K2.6.

  2. Ask it to perform a risk-driven PR review that requires inspecting diffs.

  3. Let it repeatedly call Bash with the same successful command, for example:

    git diff dev..origin/codex/i477-auth-client-hotfix -- packages/app/src/utils/server.ts
  4. Observe that the same command can repeat far past the loop gate thresholds without a synthetic block or stop.

Concrete local export used for diagnosis:

/Users/yuhan/Downloads/pawwork-session-quiet-harbor-2026-05-07-09-10-02-Kimi-K2.6-Deathloop.json

What did you expect to happen?

Repeated successful Bash calls with identical normalized input should follow the same loop-gate policy as other tools:

  • 3rd occurrence: inject a reminder.
  • 4th occurrence: block the repeated request.
  • 5th occurrence: stop the turn with a user-facing loop summary.

Tool-specific metadata updates should preserve existing diagnostics.loop rather than replacing it.

PawWork version

local

OS version

macOS / Darwin 25.3.0

Can you reproduce it again?

Only once so far

Diagnostics

Observed from the session export and current code inspection:

  • Repeated command count: 39 exact Bash calls.
  • Bash loop metadata: null for all Bash tool parts in the export.
  • Read loop metadata: present, including repeat counts and injected reminders.
  • Likely files:
    • packages/opencode/src/session/prompt.ts
    • packages/opencode/src/session/processor.ts
    • packages/opencode/src/session/diagnostics.ts
    • packages/opencode/src/tool/bash.ts

Suggested fix boundary:

  • Preserve/merge diagnostics.loop when ctx.metadata() updates a running tool part.
  • Preserve/merge diagnostics.loop when completing a tool call with tool-returned metadata.
  • Add a regression test proving repeated successful Bash input triggers reminder, block, then stop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugSomething isn't workingharnessModel harness, prompts, tool descriptions, and session mechanics

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions