Bug: OpenCode enters infinite loop after tool calls complete (Zen/big-pickle)

# Bug: OpenCode Enters Infinite Loop After Tool Calls Complete

## Description

OpenCode (opencode) enters an infinite loop and stops responding to user input after completing tool calls. The process stays alive but never exits or continues meaningfully.

## Affected Versions

- **Big Pickle (opencode/big-pickle) with OpenCode Zen provider** ⚠️ **PRIMARY ISSUE**
- v1.1.60+ (GitHub Copilot provider with claude-sonnet-4.6) - similar symptoms
- v1.3.0+ (with OpenAI-compatible providers) - similar symptoms
- Issue #17516 confirmed in v1.1.65 through v1.3.0

## Symptoms

1. **Process never exits**: `opencode run` hangs indefinitely after tool calls complete
2. **Silent infinite loop**: Session processor keeps calling LLM with empty responses
3. **No CPU usage**: Process alive at 0-2% CPU, doing nothing useful
4. **No error output**: Logs show no errors, no timeouts
5. **Parent sessions block**: When used as subagent, parent hangs forever waiting

## Root Cause (from issue #17516)

**OpenCode doesn't detect "done" state properly.**

After the model finishes tool calls and produces final text response:
1. Session processor should detect `finish_reason=stop` or "no tool calls in text-only response"
2. Instead, it treats the response as requiring another iteration
3. Calls LLM again with same context
4. Creates infinite loop of empty LLM calls

### Timeline from debug logs:
```
18:53:24 | step=0  | Session created, prompt resolved
18:53:27 | step=1  | Model calls Read/Write → both succeed
18:53:32 | step=2  | LLM stream → NO tool call, NO visible output
18:53:35 | step=3  | Same pattern — empty loop iteration
...
18:55:21 | step=42 | Still looping when killed at 120s
```

### Pattern per step (steps 2-42):
```
session.prompt step=N
session.prompt status=started resolveTools
tool.registry status=started/completed (all 12 tools)
permission evaluate (task general, task explore)
session.prompt status=completed resolveTools
session.processor process
llm providerID=opencode/big-pickle modelID=zen stream
→ next step (infinite loop)
```

## Related Issues Found

| Issue | Description | Status |
|--------|-------------|--------|
| #17516 | `opencode run` hangs after completing tool calls | Open |
| #11153 | Session loop doesn't stop when model returns finish_reason=stop | Closed (fixed in #11152) |
| #19208 | opencode run adds empty stop-turn after tool-calls completion | Closed (dup of #17982) |
| #21250 | TUI/task runs stall after last completed tool; child tasks spawn empty assistant run | Open |
| #20096 | Tool and task execution can hang indefinitely with no timeout | Open |
| #13841 | Explore subagent hangs indefinitely with Anthropic Claude Opus 4.6 | Open |
| #11865 | Tasks/Subagents with Codex/OpenAI get stuck with no timeout/retry | Open |

## Reproduction Steps

```bash
# Create a prompt that uses tools
cat > /tmp/test-prompt.md << 'EOF'
Read compress.py and replace the RLE implementation with zlib.
Edit the file directly. Do not run any commands.
EOF

# Run with debug logging - using Big Pickle + Zen provider
opencode run --print-logs --log-level DEBUG \
  -m opencode/big-pickle \
  < /tmp/test-prompt.md 2>debug.log

# Process hangs after tool calls complete
# debug.log shows session.prompt stepping infinitely with no tool calls
# LLM provider: opencode/big-pickle with Zen
```

## Expected Behavior

After the model's final text response (post-tool-call), OpenCode should:
1. Detect that the model has finished (finish_reason=stop or no tool calls)
2. Exit cleanly with code 0
3. NOT call LLM again

## Actual Behavior

OpenCode:
1. Treats final response as requiring another iteration
2. Calls LLM again with same context
3. Loops infinitely with empty LLM calls every ~2.5s
4. Never exits, never produces output
5. Must be killed manually (Ctrl+C or kill -9)

## Workarounds

1. **Switch models**: Use `gpt-5.3-codex` instead of `claude-sonnet-4.6` (works perfectly)
2. **Switch providers**: Use different provider that sends proper stop signals
3. **Manual kill + restart**: `Ctrl+C` and rephrase prompt

## Suggested Fixes

### Fix 1: Detect finish_reason=stop (Primary)

In `processor.ts`, the `process()` function should return `"stop"` when `finish_reason=stop`:

```typescript
// In processor.ts process() function
if (response.finish_reason === 'stop' && !response.tool_calls) {
  return "stop"; // Currently returns "continue"
}
```

### Fix 2: Break on Empty Responses

If LLM is called but produces NO tool calls and NO text, treat as done:

```typescript
// After LLM stream completes
if (!response.content && (!response.tool_calls || response.tool_calls.length === 0)) {
  logger.info('Empty response detected, stopping loop');
  return "stop";
}
```

### Fix 3: Add Loop Detection

Detect when same prompt step repeats without progress:

```typescript
// Track consecutive empty responses
let consecutiveEmptyResponses = 0;

if (isEmptyResponse(response)) {
  consecutiveEmptyResponses++;
  if (consecutiveEmptyResponses > 3) {
    logger.warn('Infinite loop detected, forcing stop');
    return "stop";
  }
}
```

### Fix 4: Add Timeout Protection

Add configurable timeout for headless mode:

```json
{
  "experimental": {
    "headless_timeout": 300000  // 5 minutes
  }
}
```

## Environment

- **OpenCode version**: big-pickle (opencode/big-pickle)
- **Affected providers**: **OpenCode Zen** (primary issue)
- **Also affected**: claude-sonnet-4.6 via github-copilot provider (similar symptoms)
- **Works fine**: gpt-5.3-codex, simple prompts without tool calls
- **Process state**: Alive, 0-2% CPU, consuming 500-900MB RSS
- **Key difference**: Zen provider with big-pickle model causes mid-process stopping/looping

## Impact

1. **Breaks automation**: CLI tools that spawn `opencode run` as subprocess hang forever
2. **Wastes API calls**: Infinite loop consumes quota/credits
3. **Blocks parent sessions**: Subagent hangs block parent indefinitely
4. **Poor UX**: No error message, no indication of what's wrong

## References

- Original issue: https://github.com/anomalyco/opencode/issues/17516
- Related fix attempt: https://github.com/anomalyco/opencode/pull/11152
- Session loop fix: https://github.com/anomalyco/opencode/pull/18500
- Compaction loop fix: https://github.com/anomalyco/opencode/pull/19424

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: OpenCode enters infinite loop after tool calls complete (Zen/big-pickle) #26220

Bug: OpenCode Enters Infinite Loop After Tool Calls Complete

Description

Affected Versions

Symptoms

Root Cause (from issue #17516)

Timeline from debug logs:

Pattern per step (steps 2-42):

Related Issues Found

Reproduction Steps

Expected Behavior

Actual Behavior

Workarounds

Suggested Fixes

Fix 1: Detect finish_reason=stop (Primary)

Fix 2: Break on Empty Responses

Fix 3: Add Loop Detection

Fix 4: Add Timeout Protection

Environment

Impact

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue	Description	Status
#17516	`opencode run` hangs after completing tool calls	Open
#11153	Session loop doesn't stop when model returns finish_reason=stop	Closed (fixed in #11152)
#19208	opencode run adds empty stop-turn after tool-calls completion	Closed (dup of #17982)
#21250	TUI/task runs stall after last completed tool; child tasks spawn empty assistant run	Open
#20096	Tool and task execution can hang indefinitely with no timeout	Open
#13841	Explore subagent hangs indefinitely with Anthropic Claude Opus 4.6	Open
#11865	Tasks/Subagents with Codex/OpenAI get stuck with no timeout/retry	Open

Bug: OpenCode enters infinite loop after tool calls complete (Zen/big-pickle) #26220

Description

Bug: OpenCode Enters Infinite Loop After Tool Calls Complete

Description

Affected Versions

Symptoms

Root Cause (from issue #17516)

Timeline from debug logs:

Pattern per step (steps 2-42):

Related Issues Found

Reproduction Steps

Expected Behavior

Actual Behavior

Workarounds

Suggested Fixes

Fix 1: Detect finish_reason=stop (Primary)

Fix 2: Break on Empty Responses

Fix 3: Add Loop Detection

Fix 4: Add Timeout Protection

Environment

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions