Skip to content

conversation_loop empty-content gate ignores tool_calls — recovery loop on codestral content_len=0 + tool_calls_count>=1 responses #133

@PowerCreek

Description

@PowerCreek

Symptom

Per gist sandbox-finding.md: sandbox dispatch on canonical (post-v0.18.6) loops between codestral and mistral-large with the synthetic "Your previous response was empty" recovery prompt — even though codestral emits tool_calls_count=1. v0.18.6's tool-execution dispatch diagnostics ([tool-dispatch] etc., closes #130) do NOT fire — meaning the empty-content gate intercepts BEFORE tool execution. File never written.

Code path

agent/conversation_loop.py ~ line 3782-3870 (read on v0.18.6 HEAD: 38bcb1e82):

# Line 3782-3784
_truly_empty = not agent._strip_think_blocks(
    final_response
).strip()

_truly_empty is computed from final_response (the assistant message's content string only). It does not check assistant_message.tool_calls. A response with content="" (or empty after <think> stripping) + populated tool_calls evaluates to _truly_empty = True.

# Line 3842 — _structural_empty branch
_structural_empty = (
    _truly_empty
    and not _has_structured
    and finish_reason == "stop"      # ← correctly gated against tool_calls responses
    and not _prior_was_tool
    and _tools_attached
    and not getattr(agent, "_tools_empty_terminal_handled", False)
)

_structural_empty correctly gates on finish_reason == "stop" — so this branch does NOT fire for finish_reason="tool_calls" responses. ✓

# Line 3870-ish — broader empty-content retry path
if _truly_empty and (not _has_structured or _prefill_exhausted) and agent._empty_content_retries < 3:
    agent._empty_content_retries += 1
    ...

This branch does fire for tool_calls responses — there's no tool_calls gate. _truly_empty = True (content empty after strip), not _has_structured = True (no reasoning fields), retries increment. Then the synthetic recovery prompt at the line ~3856 path emits "Your previous response was empty".

The user sees the recovery loop: codestral → empty content + tool_calls → hermes treats as empty → re-prompts → mistral-large emits prose ("Creating /tmp/random_test.py...") + tool_call → hermes treats as empty again → loop until 3 retries exhaust.

Why v0.18.4's recovery (#122) doesn't help

agent/transports/chat_completions.py::normalize_response correctly recovers SDK-dropped tool_calls and populates NormalizedResponse.tool_calls. That's verified. But _truly_empty in conversation_loop is computed from final_response (content string), not from assistant_message.tool_calls. Even with tool_calls fully populated, the gate doesn't know.

Proposed fix (sketch)

Either of:

(a) Tighten _truly_empty to include tool_calls absence:

_truly_empty = (
    not agent._strip_think_blocks(final_response).strip()
    and not getattr(assistant_message, "tool_calls", None)
)

(b) Tighten the retry gate to skip when tool_calls are present:

if _truly_empty and not getattr(assistant_message, "tool_calls", None) \
   and (not _has_structured or _prefill_exhausted) \
   and agent._empty_content_retries < 3:
    ...

Either fix makes the broader empty-content retry path consistent with the existing _structural_empty gate (which already correctly excludes tool_calls responses via finish_reason == "stop").

(a) is cleaner — _truly_empty becomes the single source of truth for "the response carries nothing useful". (b) is narrower — only the retry path changes.

Verified end-to-end

Earlier diagnostic + my in-process probes confirmed:

Companion devagentic-side observability

devagentic#337 / PR devagentic#338 adds a [debug-response-shape] log at the OAI shim exit dumping the full wire shape (finish_reason, content type/nullness/length, tool_call_0 keys + type field). Helps hermes-maint verify what hermes RECEIVES vs what parser expects.

Repro (from the gist)

docker exec -u duplex devagentic-duplex-claude tmux send-keys -t sandbox \
  "Write /tmp/random_test.py with: import random; print(random.randint(1,100)) - then execute it." Enter
sleep 30
docker exec -u duplex devagentic-duplex-claude tmux capture-pane -t sandbox -p -S -40
ssh dev "grep cascade-entry /tmp/service.log | tail -5"

Expected post-fix: [tool-dispatch] lines from v0.18.6 fire, file gets written.

Related

  • gist sandbox-finding.md (full diagnostic)
  • hermes-agent#122 (v0.18.3 SDK recovery — works correctly)
  • hermes-agent#125 (v0.18.4 drop finish_reason gate — broadened recovery)
  • hermes-agent#128 (v0.18.5 loud invalid-tool diagnostic — confirmed names valid)
  • hermes-agent#131 (v0.18.6 tool-execution diagnostic — doesn't fire because gate intercepts before)
  • devagentic#337 / PR devagentic#338 — companion wire-shape log

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions