fix(response): recover on finish_reason=tool_calls with empty list (closes #99) by PowerCreek · Pull Request #108 · TechDevGroup/hermes-agent

PowerCreek · 2026-05-26T22:45:00Z

Summary

Closes #99 (unblocks #100 R12). When the model emits a tool-call-specific finish reason (tool_calls / function_call) but the normalized assistant_message.tool_calls reaches the response handler empty, the previous behavior burned 3 empty-retries before surfacing (empty). This adds a synthetic recovery branch keyed on tool-call finish reasons — same shape as #67's structural recovery but for the tool-call case.

Background

Per #99: hermes-cli reports "returned no content" + retries when an upstream OpenAI-compat response has message.content == "" AND message.tool_calls != []. Investigation traced this to an SDK-normalization gap — when raw message.tool_calls doesn't make it cleanly to the NormalizedResponse.tool_calls shape (or the model emits an unparseable function-call form), the else branch at conversation_loop.py:3493 is entered with content empty + tool_calls falsy, and _truly_empty triggers the retry loop.

Devagentic#295 currently works around this by stripping tools from the request boundary — once that workaround lifts (post-redeploy), this PR's branch catches any remaining edge cases.

Fix shape

_finish_wants_tools = finish_reason in {"tool_calls", "function_call"}
_tool_calls_absent = not (assistant_message.tool_calls or [])
if _finish_wants_tools and _tool_calls_absent and not _already_handled:
    # WARN log + status emit + synthetic recovery prompt

One-shot semantics via _finish_reason_tools_handled flag (mirrors #67's _tools_empty_terminal_handled).

Test plan

9 new tests in test_finish_reason_tools_recovery.py: positive triggers for tool_calls + function_call with None/empty tool_calls; negative for stop/length/content_filter finish reasons; negative when tool_calls populated; one-shot already-handled gate; source-level patch-landed check
28 total green across affected suites — no regression
After redeploy: poly-explorer's enumeration prompt ("Name three polynomial transformations.") no longer produces 3 silent retries before (empty)

Composition

Unblocks #100 (R12 run_pipeline / propose_pipeline MCP tools) — that issue explicitly recommended "Stay Backlog until hermes#99 lands". Will pick up #100 next iteration.

Devagentic#295 (tools-strip at boundary) can be lifted once this PR + a redeploy is in place; the client-side fix is now the authoritative path.

…loses #99) When the model emits a tool-call-specific finish reason (``tool_calls`` / ``function_call``) but the normalized ``assistant_message.tool_calls`` reaches the response handler empty (SDK normalization gap, upstream shape issue, etc.), the previous behavior: 1. Line 3180 ``if assistant_message.tool_calls:`` evaluates falsy 2. Falls into the ``else: # No tool calls`` branch 3. ``final_response = ""`` (empty content) 4. ``_truly_empty`` becomes True; ``finish_reason="tool_calls"`` is NOT ``"stop"`` so the #67 structural-empty recovery doesn't apply 5. ``_empty_content_retries`` burns 3 retries returning the same empty shape 6. Falls through to ``"(empty)"`` terminal This produces the user-visible "returned no content" + retry loop documented in hermes-agent#99. Devagentic-side workaround NousResearch#295 strips tools at the request boundary to avoid the model emitting the problematic finish-reason — once that workaround lifts, this branch catches any remaining edge cases. ## Fix In the else branch at conversation_loop.py:3493, before the empty-retry logic, check: _finish_wants_tools = finish_reason in {"tool_calls", "function_call"} _tool_calls_absent = not (assistant_message.tool_calls or []) When both hold (and ``_finish_reason_tools_handled`` flag isn't set — one-shot semantics like the #67 path), surface synthetic recovery instead of the retry loop: - Emit WARNING log with the diagnostic shape (finish_reason, model, provider, response_id) so operators see the structural mismatch - Emit user-visible status (``⚠️ Model emitted finish_reason=tool_calls but no tool_calls parsed — surfacing recovery prompt``) - Append synthetic assistant message + user-level recovery prompt that tells the model the shape was broken + how to retry Composes with #67's structural-empty recovery (different finish_reason gates them). ## Tests - 9 new tests in ``tests/agent/test_finish_reason_tools_recovery.py``: positive for ``tool_calls`` + ``function_call`` finish reasons with None / empty-list tool_calls (3); negative for ``stop`` / ``length`` / ``content_filter`` finish reasons (3); negative when tool_calls populated (1); already-handled one-shot (1); source-level patch-landed check (1). - 28 total green across affected suites (new + #67 terminal + empty-response-diagnostic). No regression. ## Composition Unblocks hermes-agent#100 (R12 worker tool surface for ``run_pipeline`` / ``propose_pipeline``) — that issue's "Recommendation: Stay Backlog until hermes#99 lands" gate clears with this PR. Devagentic#295 (tools-strip at boundary) can be lifted once this PR + a redeploy is in place; the client-side fix is now the authoritative path.

…nd (closes #110) (#111) Strict OpenAI-compat validators (Groq, Mistral, Fireworks) reject any unknown property with HTTP 400. Hermes-side bookkeeping markers (``_thinking_prefill``, ``_empty_recovery_synthetic``, ``_empty_terminal_sentinel``) were leaking through to upstream because only ``_thinking_prefill`` was individually stripped. Reproducible on Groq from v0.16.0 (#69's ``_empty_recovery_synthetic``) but the bug only surfaced widely after v0.17.1 (#108) expanded the recovery path to ``finish_reason=tool_calls`` empty-list — now the marker leak fires on every Groq tool-call response. ## Fix Replace the individual ``api_msg.pop("_thinking_prefill", None)`` in the API-message preparation loop with a generic leading-underscore strip: for _k in [k for k in api_msg if isinstance(k, str) and k.startswith("_")]: api_msg.pop(_k, None) The leading-underscore convention is stable for hermes-internal fields; anything that genuinely needs to ride the API has a documented OpenAI shape (``role`` / ``content`` / ``tool_calls`` / ``tool_call_id`` / ``name`` / ``reasoning_content``). Future internal markers added with ``_`` prefix get auto-stripped without further maintenance. ## Tests - 9 new tests in tests/agent/test_internal_marker_stripping.py: strips each known marker (3); strips multiple in one pass (1); preserves standard OpenAI shape fields (1); handles no-markers and empty-message edge cases (2); non-string keys pass through (1); source-level patch-landed check (1). - 28 total green across affected suites (new + #108 finish-reason recovery + #67 structural terminal). No regression. ## Composition Without this fix, devagentic#295 (tools-strip at boundary) MUST stay in place — lifting it makes the model emit tool_calls finish reasons which trigger #108's recovery which now writes API-incompatible markers. With this fix, the recovery path produces clean OpenAI-shape messages on the wire.

…) (#113) Wraps devagentic's ``executeWorkflowPipeline`` (NousResearch#258) + ``writeWorkflowPipeline`` (NousResearch#261) mutations as MCP tools so workers can compose + execute pipelines directly through their tool surface. R12 of devagentic#210 (flow-aware router umbrella). ## Tools registered | Tool | Wraps | Returns | |---|---|---| | ``run_pipeline(pipeline_id, user_id)`` | ``executeWorkflowPipeline`` | ``PipelineRun {runId, status, nodeOutcomes, ...}`` | | ``propose_pipeline(name, intent, version, nodes, edges, user_id, lineage_ref?, produced_by?)`` | ``writeWorkflowPipeline`` | ``Doc {id, content, tags, source, ts}`` | Composes naturally: ``propose_pipeline → doc_id → run_pipeline``. ## Devagentic-side resolver verification Schemas read directly from devagentic ``api.py`` via gh api: - ``executeWorkflowPipeline(pipelineId: String!, userId: String!) -> PipelineRun`` (api.py:21628) - ``writeWorkflowPipeline(name, intent, version, nodes, edges, userId, lineageRef?, producedBy?) -> Doc`` (api.py:21576) Arg-name camelCase conversion (``pipeline_id`` → ``pipelineId``, ``lineage_ref`` → ``lineageRef``, etc.) matches strawberry's default. Note: #100's body referenced ``ShadowNodeRef`` as the propose_pipeline return — the actual mutation returns ``Doc``. Implementing the real shape. ## Tests - 10 new client-level tests in ``tests/plugins/devagentic-mutations/test_client.py``: * ``run_pipeline``: empty-args fail-soft (1), happy path with arg threading + nested nodeOutcomes (1), non-dict fail-soft (1). * ``propose_pipeline``: empty-args fail-soft (3), invalid-version-type fail-soft (3 parametrized), invalid nodes/edges-type fail-soft (2), happy path with all args threaded (1), optional-args-null-when-omitted (1), non-dict fail-soft (1). - Contract test in ``tests/test_mcp_defensive_registrars.py`` extended: ``run_pipeline`` + ``propose_pipeline`` added to the required-G2/R12 tool set (now 17 named devagentic tools). - 184 total green across affected suites — no regression. ## Composition Unblocks the worker-side pipeline-economy use case in devagentic#210 R12. Workers can now author + execute pipelines without shelling out to ``python api.py``. The recent #99 / #110 fixes (PRs #108 + #111) ensure tool_calls with empty content + populated tool_calls on Groq/strict-validator paths don't cascade through the empty-retry loop, so R12 tools are usable from client-tier dispatches.

…#121) (#122) OpenAI-spec ``tool_call.type`` is required with exactly one valid value (``"function"``). Mistral-large (and likely others) emit tool_calls without the ``type`` field. The OpenAI Python SDK's strict Pydantic validation drops these entries from ``response.choices[0].message.tool_calls`` — hermes then sees an empty list + ``finish_reason=tool_calls`` and falls into PR #108's synthetic-recovery path, where the tool the model wanted to call is lost entirely. ## Fix Add raw-response fallback in ``ChatCompletionsTransport.normalize_response``: - When the SDK-parsed ``msg.tool_calls`` is empty AND ``finish_reason in {"tool_calls", "function_call"}``, inspect the raw response via ``response.model_dump()`` (Pydantic v2), attribute walk, or dict shape. - Iterate the raw tool_calls list, default missing ``type`` to ``"function"`` per spec semantics (the field has only ONE valid value), and synthesize ``ToolCall`` entries. - ``ToolCall``'s existing ``type`` property already returns ``"function"`` unconditionally, so downstream consumers see the recovered call as identical to a normally-parsed one. Strictly additive: only fires when (a) SDK gave us nothing AND (b) the finish_reason signals tools. The SDK-happy-path is untouched. ## Implementation detail Three helpers in ``chat_completions.py``: - ``_DEFAULT_TOOL_CALL_TYPE = "function"`` constant - ``_raw_tool_calls_from_response(response)`` — extracts the raw tool_calls list via model_dump / dict; fail-soft on any errors - ``_normalize_raw_tool_call(tc_raw)`` — builds a ``ToolCall`` from a raw dict entry; returns ``None`` for unrecoverable entries (no function.name); re-serializes dict-arguments back to JSON string per OpenAI contract ## Tests - 17 new tests in tests/agent/test_tool_call_type_default.py: * 6 ``_normalize_raw_tool_call`` tests: with-type, without-type- defaults-function, no-function-block, no-name, not-dict, dict-arguments-reserialized, empty-arguments * 6 ``_raw_tool_calls_from_response`` tests: model_dump path, dict path, no-tool_calls, no-choices, model_dump-raises, non-dict-entries-filtered * 3 end-to-end normalize_response tests: mistral-shape recovery, no-fallback-on-stop-finish_reason, no-double-up-when-sdk-parsed * 2 source-level patch-landed checks - 76 total green across affected suites — no regression on the SDK-happy path or the #67/#99/#108 recovery families. ## Composition Same family of bugs as #99 / #108 / #111 — response handler being over-strict on the OpenAI spec. This is the next-uncovered case (tool_calls present in wire response but stripped by SDK before reaching us). Devagentic#NN (the companion devagentic-side normalization at the OAI-shim boundary) becomes redundant defense after this lands — still harmless to keep.

…#124) (#125) PR #122 / #121 added raw-response fallback for mistral-shaped tool_calls (type field absent → SDK strips entry → empty ``msg.tool_calls``). The fallback was gated on ``finish_reason in {"tool_calls", "function_call"}`` — assuming the SDK preserved finish_reason while dropping the tool_call entry. Field-tested on duplex sandbox container: the SDK actually normalizes finish_reason to ``stop`` when it strips the type-less tool_call. The recovery NEVER FIRED on the actual mistral shape; the structural-empty recovery (#69 / #67) fired instead and surfaced the noisy "Your previous response was empty" message. ## Fix Drop the finish_reason gate. Whenever the SDK gives empty ``msg.tool_calls`` but the raw response shape carries tool_calls, recover them — wire-level evidence is authoritative. Additionally, rewrite the normalized ``finish_reason`` to ``"tool_calls"`` so downstream consumers (``conversation_loop.py:3180`` tool branch, the structural-empty + finish_reason-tools guards) see a consistent state. ## Tests - Updated 1 test in ``tests/agent/test_tool_call_type_default.py``: ``test_normalize_recovers_when_sdk_normalized_finish_reason_to_stop`` (previously asserted recovery DIDN'T fire on stop, now asserts it DOES — matches field-observed SDK behavior + the bug symptom). Also asserts finish_reason gets rewritten to ``tool_calls`` for downstream consistency. - 49 total green across affected suites — no regression on the SDK-happy path or the #67/#99/#108/#118 recovery families. ## Composition - #121 / PR #122 — original fix, gated finish_reason too narrowly - **this PR** — drops the gate; recovers whenever raw has tool_calls

…MPTY_RECOVERY (refs #143) (#147) T2 of the #143 thin-client refactor scope. Devagentic-side cascade (NousResearch#324) + runaway detector (NousResearch#345-348) + exec-terminus (NousResearch#349-354) now cover the empty-content recovery layer with full intent / role / dispatch-trace context. Hermes-side recovery layered on top caused 3×4 dispatch stacking (#118) + mode confusion + invisible swallow points (#133 debug funnel). This patch default-flips: the four hermes-side recovery paths are short-circuited unless ``HERMES_LEGACY_EMPTY_RECOVERY`` is set to a truthy value. Legacy users opt-in to keep the pre-T2 behavior. ## Gated paths (all skipped when env unset) | Path | Source | Loc | |---|---|---| | ``_finish_wants_tools`` synthetic recovery | PR #108 / #99 | line ~3618 | | ``_post_tool_empty_retried`` nudge | pre-existing NousResearch#9400-class | line ~3749 | | ``_structural_empty`` synthetic recovery | PR #69 / #67 | line ~3880 | | 3-retry empty-content loop | pre-#67 | line ~3927 | When env unset (default), empty responses fall through to either the fallback-chain provider switch (if configured) or the clean ``(empty)`` terminal with ``_empty_terminal_sentinel=True``. ## Preserved through T2 (verified by test_t2_legacy_empty_recovery_gate.py) - **PR #119 cascade_exhausted short-circuit** — hermes correctly deferring to devagentic's sentinel; NOT recovery. Source-level test asserts the ``if _cascade_err:`` block is NOT prefixed by ``_legacy_recovery_on``. - **PR #122/#125 raw tool_calls fallback** — pre-recovery wire parsing in transports/chat_completions.py. Source-level test asserts the helper name doesn't appear in conversation_loop (lives elsewhere; untouched). - **PR #131/#136/#138/#141 diagnostics** — env-gated via ``HERMES_DIAG_RAW_CAPTURE`` (#140), independent of this env. ## Operator deploy Default behavior changes: empty responses surface cleanly (no synthetic re-prompt). To preserve pre-T2 behavior: ```bash export HERMES_LEGACY_EMPTY_RECOVERY=1 ``` The legacy escape hatch is intended as a temporary safety net while operators validate the thin-client architecture. Once the devagentic-side cascade is universally deployed + observed to cover all empty-content cases, the legacy gate can be removed in a follow-up (the env var stays as a no-op for backward compat). ## Tests - 23 new tests in ``tests/agent/test_t2_legacy_empty_recovery_gate.py``: resolver default-false / empty-false / 6 truthy / 6 falsy-or- unknown; source-level gate assertions for each of the four recovery branches; preserved-path assertions for cascade_exhausted + raw tool_calls fallback location; default-off + opt-in resolver round-trip. - 163 total green across affected suites (T2 + existing empty- terminal mirror tests + finish_reason_tools_recovery + cascade_exhausted + internal_marker_stripping + tool_call_type_default + tool_use_enforcement + T3 + T1 persona + doctor persona probe). ## Composition Per #143 sequencing: - T3 / PR #145 (clarify default-out) — merged ✓ - T1 / PR #146 (persona default-flip) — merged ✓ - **T2 / this PR (empty-content recovery removal, env-gated)** - Later: T4-T6 (tool list / iteration cap / summary fallback) The legacy escape hatch design choice (vs. full removal) lets operators roll back per-deployment if a previously-recovered edge case surfaces in the field. Diagnostic from #140 (HERMES_DIAG_RAW_CAPTURE) remains the observability tool to spot any uncovered case.

PowerCreek merged commit 0b24be0 into main May 26, 2026

PowerCreek deleted the response-handler-finish-reason-tools branch May 26, 2026 22:45

PowerCreek mentioned this pull request May 26, 2026

chore: bump pyproject.toml + uv.lock to 0.17.1 #109

Merged

PowerCreek added a commit that referenced this pull request May 26, 2026

chore: bump pyproject.toml + uv.lock to 0.17.1 (#108 release) (#109)

78cc803

PowerCreek mentioned this pull request May 26, 2026

feat(mcp): run_pipeline + propose_pipeline MCP tools — R12 (closes #100) #113

Merged

This was referenced May 27, 2026

response handler: treat missing tool_call.type as 'function' (OpenAI-spec semantics) — mistral-shaped tool_calls rejected today #121

Closed

fix(tool_calls): recover SDK-dropped mistral-shaped tool_calls (closes #121) #122

Merged

PowerCreek mentioned this pull request May 28, 2026

refactor(T2): default-flip empty-content recovery via HERMES_LEGACY_EMPTY_RECOVERY (refs #143) #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(response): recover on finish_reason=tool_calls with empty list (closes #99)#108

fix(response): recover on finish_reason=tool_calls with empty list (closes #99)#108
PowerCreek merged 1 commit into
mainfrom
response-handler-finish-reason-tools

PowerCreek commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PowerCreek commented May 26, 2026

Summary

Background

Fix shape

Test plan

Composition

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant