Skip to content

fix(conversation): structural-empty terminal recovery for stop+no-tool_calls+tools (closes #67)#69

Merged
PowerCreek merged 1 commit into
mainfrom
issue-67-empty-stop-terminal
May 24, 2026
Merged

fix(conversation): structural-empty terminal recovery for stop+no-tool_calls+tools (closes #67)#69
PowerCreek merged 1 commit into
mainfrom
issue-67-empty-stop-terminal

Conversation

@PowerCreek

Copy link
Copy Markdown

Closes #67. Based on diagnostic data captured by PR #68 from a live polynomial-explorer failure:

Empty response (no content or reasoning) — retry 1/3
  model=mistral-large provider=devagentic-local
  finish_reason=stop tool_calls=0 response_len=0
  prior_was_tool=False response_id=?

finish_reason=stop confirms the model is choosing to return nothing — not a content filter, not a length cut, not a tool call. The MCP tool schemas in the request overwhelm mistral-large (and likely other small models) and it elects to emit nothing. Retrying with the same input produces the same empty across all 3 attempts.

What this does

Short-circuits the empty-content retry loop when ALL of:

  1. _truly_empty (no content, no reasoning)
  2. finish_reason == "stop"
  3. prior_was_tool is False (separate _post_tool_empty_retried path handles the post-tool case)
  4. agent.tools is non-empty (tools were attached to the request)
  5. _tools_empty_terminal_handled flag not yet set (one-shot per session)

When triggered, instead of wasting 3 retries:

  • Emits a WARN log identifying the structural-empty pattern
  • Emits a user-facing status message
  • Appends a synthetic assistant "(empty — model returned no content despite tools being attached; finish_reason=stop)" message
  • Appends a synthetic user message instructing the worker to (a) rephrase, (b) explicitly invoke a tool by name, or (c) proceed text-only
  • continues — next loop iteration sends the productive recovery prompt instead of re-trying the same input

Sets _tools_empty_terminal_handled = True so recovery itself can't loop. If the recovery prompt also produces empty, the existing 3-retry-then-fallback chain takes over.

Why this shape over retry-without-tools

Both options the orchestrator proposed have merit. This PR ships (a) — terminal-with-synthetic — for these reasons:

  1. Honest signal to the worker. Worker is itself a session that can reason about the failure. Stripping tools silently might give a "I'd use silo_query for X but..." response that doesn't match what the worker would produce with tools available.
  2. Mirrors hermes' existing recovery pattern. The _post_tool_empty_retried path at line ~3520 uses exactly this synthetic-assistant-then-synthetic-user shape. New trigger is symmetric.
  3. Cheap to layer (b) on top later. If empirically workers get stuck even with the explanatory synthetic, retry-without-tools becomes a follow-up patch — this PR doesn't preclude it.

Test plan

  • 10 tests in tests/agent/test_empty_response_terminal.py cover the guard expression:
    • full match → triggers (the poly-explorer failure shape)
    • 8 negative cases: each gating condition individually short-circuits (non-empty, has_structured, finish_reason=length/tool_calls/content_filter, prior_was_tool, no tools attached, already handled)
    • patch-landed-correctly smoke (anchor + ordering)
  • Sibling tests/agent/test_empty_response_diagnostic.py (PR instrument(conversation): enrich empty-response WARN with response-shape diagnostics (#67 partial) #68) all 9 still pass — no regression to the diagnostic plumbing.

Deploy

Hermes-side change → needs the same container rebuild as PR #68 to land on devbox (G6/#66). Operator restart of the devagentic service.py alone won't pick this up.

Companion PR

devagentic#219 — server-side [empty-content] WARN. Captures the same failure pattern from the server vantage; ships in any service.py restart (no container rebuild needed). Both PRs are complementary — one captures from the wire, the other handles the recovery.

@PowerCreek PowerCreek merged commit 702420c into main May 24, 2026
@PowerCreek PowerCreek deleted the issue-67-empty-stop-terminal branch May 24, 2026 05:12
PowerCreek added a commit that referenced this pull request May 26, 2026
…nd (closes #110) (#111)

Strict OpenAI-compat validators (Groq, Mistral, Fireworks) reject any
unknown property with HTTP 400. Hermes-side bookkeeping markers
(``_thinking_prefill``, ``_empty_recovery_synthetic``,
``_empty_terminal_sentinel``) were leaking through to upstream because
only ``_thinking_prefill`` was individually stripped.

Reproducible on Groq from v0.16.0 (#69's ``_empty_recovery_synthetic``)
but the bug only surfaced widely after v0.17.1 (#108) expanded the
recovery path to ``finish_reason=tool_calls`` empty-list — now the
marker leak fires on every Groq tool-call response.

## Fix

Replace the individual ``api_msg.pop("_thinking_prefill", None)`` in
the API-message preparation loop with a generic leading-underscore
strip:

  for _k in [k for k in api_msg if isinstance(k, str) and k.startswith("_")]:
      api_msg.pop(_k, None)

The leading-underscore convention is stable for hermes-internal
fields; anything that genuinely needs to ride the API has a
documented OpenAI shape (``role`` / ``content`` / ``tool_calls`` /
``tool_call_id`` / ``name`` / ``reasoning_content``). Future internal
markers added with ``_`` prefix get auto-stripped without further
maintenance.

## Tests

- 9 new tests in tests/agent/test_internal_marker_stripping.py:
  strips each known marker (3); strips multiple in one pass (1);
  preserves standard OpenAI shape fields (1); handles
  no-markers and empty-message edge cases (2); non-string keys
  pass through (1); source-level patch-landed check (1).
- 28 total green across affected suites (new + #108 finish-reason
  recovery + #67 structural terminal). No regression.

## Composition

Without this fix, devagentic#295 (tools-strip at boundary) MUST stay
in place — lifting it makes the model emit tool_calls finish reasons
which trigger #108's recovery which now writes API-incompatible
markers. With this fix, the recovery path produces clean OpenAI-shape
messages on the wire.
PowerCreek added a commit that referenced this pull request May 27, 2026
…#124) (#125)

PR #122 / #121 added raw-response fallback for mistral-shaped
tool_calls (type field absent → SDK strips entry → empty
``msg.tool_calls``). The fallback was gated on
``finish_reason in {"tool_calls", "function_call"}`` — assuming the
SDK preserved finish_reason while dropping the tool_call entry.

Field-tested on duplex sandbox container: the SDK actually
normalizes finish_reason to ``stop`` when it strips the type-less
tool_call. The recovery NEVER FIRED on the actual mistral shape;
the structural-empty recovery (#69 / #67) fired instead and
surfaced the noisy "Your previous response was empty" message.

## Fix

Drop the finish_reason gate. Whenever the SDK gives empty
``msg.tool_calls`` but the raw response shape carries tool_calls,
recover them — wire-level evidence is authoritative. Additionally,
rewrite the normalized ``finish_reason`` to ``"tool_calls"`` so
downstream consumers (``conversation_loop.py:3180`` tool branch,
the structural-empty + finish_reason-tools guards) see a
consistent state.

## Tests

- Updated 1 test in
  ``tests/agent/test_tool_call_type_default.py``:
  ``test_normalize_recovers_when_sdk_normalized_finish_reason_to_stop``
  (previously asserted recovery DIDN'T fire on stop, now asserts
  it DOES — matches field-observed SDK behavior + the bug
  symptom). Also asserts finish_reason gets rewritten to
  ``tool_calls`` for downstream consistency.
- 49 total green across affected suites — no regression on the
  SDK-happy path or the #67/#99/#108/#118 recovery families.

## Composition

- #121 / PR #122 — original fix, gated finish_reason too narrowly
- **this PR** — drops the gate; recovers whenever raw has tool_calls
PowerCreek added a commit that referenced this pull request May 28, 2026
…MPTY_RECOVERY (refs #143) (#147)

T2 of the #143 thin-client refactor scope. Devagentic-side cascade
(NousResearch#324) + runaway detector (NousResearch#345-348) + exec-terminus (NousResearch#349-354)
now cover the empty-content recovery layer with full intent /
role / dispatch-trace context. Hermes-side recovery layered on top
caused 3×4 dispatch stacking (#118) + mode confusion + invisible
swallow points (#133 debug funnel).

This patch default-flips: the four hermes-side recovery paths are
short-circuited unless ``HERMES_LEGACY_EMPTY_RECOVERY`` is set to
a truthy value. Legacy users opt-in to keep the pre-T2 behavior.

## Gated paths (all skipped when env unset)

| Path | Source | Loc |
|---|---|---|
| ``_finish_wants_tools`` synthetic recovery | PR #108 / #99 | line ~3618 |
| ``_post_tool_empty_retried`` nudge | pre-existing NousResearch#9400-class | line ~3749 |
| ``_structural_empty`` synthetic recovery | PR #69 / #67 | line ~3880 |
| 3-retry empty-content loop | pre-#67 | line ~3927 |

When env unset (default), empty responses fall through to either
the fallback-chain provider switch (if configured) or the clean
``(empty)`` terminal with ``_empty_terminal_sentinel=True``.

## Preserved through T2 (verified by test_t2_legacy_empty_recovery_gate.py)

- **PR #119 cascade_exhausted short-circuit** — hermes correctly
  deferring to devagentic's sentinel; NOT recovery. Source-level
  test asserts the ``if _cascade_err:`` block is NOT prefixed by
  ``_legacy_recovery_on``.
- **PR #122/#125 raw tool_calls fallback** — pre-recovery wire
  parsing in transports/chat_completions.py. Source-level test
  asserts the helper name doesn't appear in conversation_loop
  (lives elsewhere; untouched).
- **PR #131/#136/#138/#141 diagnostics** — env-gated via
  ``HERMES_DIAG_RAW_CAPTURE`` (#140), independent of this env.

## Operator deploy

Default behavior changes: empty responses surface cleanly (no
synthetic re-prompt). To preserve pre-T2 behavior:

```bash
export HERMES_LEGACY_EMPTY_RECOVERY=1
```

The legacy escape hatch is intended as a temporary safety net
while operators validate the thin-client architecture. Once the
devagentic-side cascade is universally deployed + observed to
cover all empty-content cases, the legacy gate can be removed in
a follow-up (the env var stays as a no-op for backward compat).

## Tests

- 23 new tests in
  ``tests/agent/test_t2_legacy_empty_recovery_gate.py``:
  resolver default-false / empty-false / 6 truthy / 6 falsy-or-
  unknown; source-level gate assertions for each of the four
  recovery branches; preserved-path assertions for cascade_exhausted
  + raw tool_calls fallback location; default-off + opt-in
  resolver round-trip.
- 163 total green across affected suites (T2 + existing empty-
  terminal mirror tests + finish_reason_tools_recovery +
  cascade_exhausted + internal_marker_stripping +
  tool_call_type_default + tool_use_enforcement + T3 + T1 persona
  + doctor persona probe).

## Composition

Per #143 sequencing:
- T3 / PR #145 (clarify default-out) — merged ✓
- T1 / PR #146 (persona default-flip) — merged ✓
- **T2 / this PR (empty-content recovery removal, env-gated)**
- Later: T4-T6 (tool list / iteration cap / summary fallback)

The legacy escape hatch design choice (vs. full removal) lets
operators roll back per-deployment if a previously-recovered edge
case surfaces in the field. Diagnostic from #140
(HERMES_DIAG_RAW_CAPTURE) remains the observability tool to spot
any uncovered case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: devagentic-local provider returns empty content; raw API works

1 participant