fix(conversation): structural-empty terminal recovery for stop+no-tool_calls+tools (closes #67) by PowerCreek · Pull Request #69 · TechDevGroup/hermes-agent

PowerCreek · 2026-05-24T05:09:36Z

Closes #67. Based on diagnostic data captured by PR #68 from a live polynomial-explorer failure:

Empty response (no content or reasoning) — retry 1/3
  model=mistral-large provider=devagentic-local
  finish_reason=stop tool_calls=0 response_len=0
  prior_was_tool=False response_id=?

finish_reason=stop confirms the model is choosing to return nothing — not a content filter, not a length cut, not a tool call. The MCP tool schemas in the request overwhelm mistral-large (and likely other small models) and it elects to emit nothing. Retrying with the same input produces the same empty across all 3 attempts.

What this does

Short-circuits the empty-content retry loop when ALL of:

_truly_empty (no content, no reasoning)
finish_reason == "stop"
prior_was_tool is False (separate _post_tool_empty_retried path handles the post-tool case)
agent.tools is non-empty (tools were attached to the request)
_tools_empty_terminal_handled flag not yet set (one-shot per session)

When triggered, instead of wasting 3 retries:

Emits a WARN log identifying the structural-empty pattern
Emits a user-facing status message
Appends a synthetic assistant "(empty — model returned no content despite tools being attached; finish_reason=stop)" message
Appends a synthetic user message instructing the worker to (a) rephrase, (b) explicitly invoke a tool by name, or (c) proceed text-only
continues — next loop iteration sends the productive recovery prompt instead of re-trying the same input

Sets _tools_empty_terminal_handled = True so recovery itself can't loop. If the recovery prompt also produces empty, the existing 3-retry-then-fallback chain takes over.

Why this shape over retry-without-tools

Both options the orchestrator proposed have merit. This PR ships (a) — terminal-with-synthetic — for these reasons:

Honest signal to the worker. Worker is itself a session that can reason about the failure. Stripping tools silently might give a "I'd use silo_query for X but..." response that doesn't match what the worker would produce with tools available.
Mirrors hermes' existing recovery pattern. The _post_tool_empty_retried path at line ~3520 uses exactly this synthetic-assistant-then-synthetic-user shape. New trigger is symmetric.
Cheap to layer (b) on top later. If empirically workers get stuck even with the explanatory synthetic, retry-without-tools becomes a follow-up patch — this PR doesn't preclude it.

Test plan

10 tests in tests/agent/test_empty_response_terminal.py cover the guard expression:
- full match → triggers (the poly-explorer failure shape)
- 8 negative cases: each gating condition individually short-circuits (non-empty, has_structured, finish_reason=length/tool_calls/content_filter, prior_was_tool, no tools attached, already handled)
- patch-landed-correctly smoke (anchor + ordering)
Sibling tests/agent/test_empty_response_diagnostic.py (PR instrument(conversation): enrich empty-response WARN with response-shape diagnostics (#67 partial) #68) all 9 still pass — no regression to the diagnostic plumbing.

Deploy

Hermes-side change → needs the same container rebuild as PR #68 to land on devbox (G6/#66). Operator restart of the devagentic service.py alone won't pick this up.

Companion PR

devagentic#219 — server-side [empty-content] WARN. Captures the same failure pattern from the server vantage; ships in any service.py restart (no container rebuild needed). Both PRs are complementary — one captures from the wire, the other handles the recovery.

…l_calls+tools (closes #67)

…nd (closes #110) (#111) Strict OpenAI-compat validators (Groq, Mistral, Fireworks) reject any unknown property with HTTP 400. Hermes-side bookkeeping markers (``_thinking_prefill``, ``_empty_recovery_synthetic``, ``_empty_terminal_sentinel``) were leaking through to upstream because only ``_thinking_prefill`` was individually stripped. Reproducible on Groq from v0.16.0 (#69's ``_empty_recovery_synthetic``) but the bug only surfaced widely after v0.17.1 (#108) expanded the recovery path to ``finish_reason=tool_calls`` empty-list — now the marker leak fires on every Groq tool-call response. ## Fix Replace the individual ``api_msg.pop("_thinking_prefill", None)`` in the API-message preparation loop with a generic leading-underscore strip: for _k in [k for k in api_msg if isinstance(k, str) and k.startswith("_")]: api_msg.pop(_k, None) The leading-underscore convention is stable for hermes-internal fields; anything that genuinely needs to ride the API has a documented OpenAI shape (``role`` / ``content`` / ``tool_calls`` / ``tool_call_id`` / ``name`` / ``reasoning_content``). Future internal markers added with ``_`` prefix get auto-stripped without further maintenance. ## Tests - 9 new tests in tests/agent/test_internal_marker_stripping.py: strips each known marker (3); strips multiple in one pass (1); preserves standard OpenAI shape fields (1); handles no-markers and empty-message edge cases (2); non-string keys pass through (1); source-level patch-landed check (1). - 28 total green across affected suites (new + #108 finish-reason recovery + #67 structural terminal). No regression. ## Composition Without this fix, devagentic#295 (tools-strip at boundary) MUST stay in place — lifting it makes the model emit tool_calls finish reasons which trigger #108's recovery which now writes API-incompatible markers. With this fix, the recovery path produces clean OpenAI-shape messages on the wire.

…#124) (#125) PR #122 / #121 added raw-response fallback for mistral-shaped tool_calls (type field absent → SDK strips entry → empty ``msg.tool_calls``). The fallback was gated on ``finish_reason in {"tool_calls", "function_call"}`` — assuming the SDK preserved finish_reason while dropping the tool_call entry. Field-tested on duplex sandbox container: the SDK actually normalizes finish_reason to ``stop`` when it strips the type-less tool_call. The recovery NEVER FIRED on the actual mistral shape; the structural-empty recovery (#69 / #67) fired instead and surfaced the noisy "Your previous response was empty" message. ## Fix Drop the finish_reason gate. Whenever the SDK gives empty ``msg.tool_calls`` but the raw response shape carries tool_calls, recover them — wire-level evidence is authoritative. Additionally, rewrite the normalized ``finish_reason`` to ``"tool_calls"`` so downstream consumers (``conversation_loop.py:3180`` tool branch, the structural-empty + finish_reason-tools guards) see a consistent state. ## Tests - Updated 1 test in ``tests/agent/test_tool_call_type_default.py``: ``test_normalize_recovers_when_sdk_normalized_finish_reason_to_stop`` (previously asserted recovery DIDN'T fire on stop, now asserts it DOES — matches field-observed SDK behavior + the bug symptom). Also asserts finish_reason gets rewritten to ``tool_calls`` for downstream consistency. - 49 total green across affected suites — no regression on the SDK-happy path or the #67/#99/#108/#118 recovery families. ## Composition - #121 / PR #122 — original fix, gated finish_reason too narrowly - **this PR** — drops the gate; recovers whenever raw has tool_calls

…MPTY_RECOVERY (refs #143) (#147) T2 of the #143 thin-client refactor scope. Devagentic-side cascade (NousResearch#324) + runaway detector (NousResearch#345-348) + exec-terminus (NousResearch#349-354) now cover the empty-content recovery layer with full intent / role / dispatch-trace context. Hermes-side recovery layered on top caused 3×4 dispatch stacking (#118) + mode confusion + invisible swallow points (#133 debug funnel). This patch default-flips: the four hermes-side recovery paths are short-circuited unless ``HERMES_LEGACY_EMPTY_RECOVERY`` is set to a truthy value. Legacy users opt-in to keep the pre-T2 behavior. ## Gated paths (all skipped when env unset) | Path | Source | Loc | |---|---|---| | ``_finish_wants_tools`` synthetic recovery | PR #108 / #99 | line ~3618 | | ``_post_tool_empty_retried`` nudge | pre-existing NousResearch#9400-class | line ~3749 | | ``_structural_empty`` synthetic recovery | PR #69 / #67 | line ~3880 | | 3-retry empty-content loop | pre-#67 | line ~3927 | When env unset (default), empty responses fall through to either the fallback-chain provider switch (if configured) or the clean ``(empty)`` terminal with ``_empty_terminal_sentinel=True``. ## Preserved through T2 (verified by test_t2_legacy_empty_recovery_gate.py) - **PR #119 cascade_exhausted short-circuit** — hermes correctly deferring to devagentic's sentinel; NOT recovery. Source-level test asserts the ``if _cascade_err:`` block is NOT prefixed by ``_legacy_recovery_on``. - **PR #122/#125 raw tool_calls fallback** — pre-recovery wire parsing in transports/chat_completions.py. Source-level test asserts the helper name doesn't appear in conversation_loop (lives elsewhere; untouched). - **PR #131/#136/#138/#141 diagnostics** — env-gated via ``HERMES_DIAG_RAW_CAPTURE`` (#140), independent of this env. ## Operator deploy Default behavior changes: empty responses surface cleanly (no synthetic re-prompt). To preserve pre-T2 behavior: ```bash export HERMES_LEGACY_EMPTY_RECOVERY=1 ``` The legacy escape hatch is intended as a temporary safety net while operators validate the thin-client architecture. Once the devagentic-side cascade is universally deployed + observed to cover all empty-content cases, the legacy gate can be removed in a follow-up (the env var stays as a no-op for backward compat). ## Tests - 23 new tests in ``tests/agent/test_t2_legacy_empty_recovery_gate.py``: resolver default-false / empty-false / 6 truthy / 6 falsy-or- unknown; source-level gate assertions for each of the four recovery branches; preserved-path assertions for cascade_exhausted + raw tool_calls fallback location; default-off + opt-in resolver round-trip. - 163 total green across affected suites (T2 + existing empty- terminal mirror tests + finish_reason_tools_recovery + cascade_exhausted + internal_marker_stripping + tool_call_type_default + tool_use_enforcement + T3 + T1 persona + doctor persona probe). ## Composition Per #143 sequencing: - T3 / PR #145 (clarify default-out) — merged ✓ - T1 / PR #146 (persona default-flip) — merged ✓ - **T2 / this PR (empty-content recovery removal, env-gated)** - Later: T4-T6 (tool list / iteration cap / summary fallback) The legacy escape hatch design choice (vs. full removal) lets operators roll back per-deployment if a previously-recovered edge case surfaces in the field. Diagnostic from #140 (HERMES_DIAG_RAW_CAPTURE) remains the observability tool to spot any uncovered case.

fix(conversation): structural-empty terminal recovery for stop+no-too…

71d2ce5

…l_calls+tools (closes #67)

PowerCreek merged commit 702420c into main May 24, 2026

PowerCreek deleted the issue-67-empty-stop-terminal branch May 24, 2026 05:12

PowerCreek mentioned this pull request May 27, 2026

regression on #121 fix: SDK normalizes stripped-tool_call finish_reason to 'stop' — recovery gate too narrow #124

Closed

PowerCreek mentioned this pull request May 28, 2026

refactor(T2): default-flip empty-content recovery via HERMES_LEGACY_EMPTY_RECOVERY (refs #143) #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(conversation): structural-empty terminal recovery for stop+no-tool_calls+tools (closes #67)#69

fix(conversation): structural-empty terminal recovery for stop+no-tool_calls+tools (closes #67)#69
PowerCreek merged 1 commit into
mainfrom
issue-67-empty-stop-terminal

PowerCreek commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PowerCreek commented May 24, 2026

What this does

Why this shape over retry-without-tools

Test plan

Deploy

Companion PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant