fix(codex): treat reasoning-only responses as incomplete, not stop by teknium1 · Pull Request #2070 · NousResearch/hermes-agent

teknium1 · 2026-03-19T17:00:59Z

Summary

Fixes the bug Nester reported in Discord where Codex responses containing only reasoning/thinking blocks (no visible content) would trigger the empty-content retry loop, burning 3 retries and failing with Max retries (3) for empty content exceeded.

Root Cause

_normalize_codex_response() was setting finish_reason='stop' for responses that contained only reasoning items (encrypted thinking state) with no message text. This is incorrect — the model is still thinking and needs another turn.

Changes (2 commits)

Commit 1: Core fix

run_agent.py — two fixes:

_normalize_codex_response: Added a new branch — when reasoning_items_raw is non-empty but final_text is empty (and no tool calls), set finish_reason='incomplete' instead of 'stop'. This routes the response to the Codex continuation path.
Incomplete handling: Also checks for codex_reasoning_items when deciding whether to preserve an interim message.

Commit 2: Replay path hardening (found via research)

After researching how OpenCode, Clawdbot/KiloCode, and OpenHands handle reasoning-only Responses API responses, found 2 additional bugs:

_chat_messages_to_responses_input: Reasoning-only interim messages were converted to API input with the reasoning item as the LAST item — no following item. The Responses API requires a following item after each reasoning item (missing_following_item error, as OpenHands discovered in their feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro #11406). Now emits an empty assistant message as the required following item.
Duplicate detection: Two consecutive reasoning-only incomplete messages with different codex_reasoning_items but identical empty content/reasoning were treated as duplicates, silently dropping the second response's encrypted state. Fixed by including codex_reasoning_items in the comparison.

Comparison with other agents:

OpenCode: Uses Vercel AI SDK stream abstraction. No retry loop, so reasoning-only responses don't cascade. For no-text results, sends a follow-up prompt asking the model to summarize.
Clawdbot: Drops orphaned reasoning blocks entirely from transcript history (downgradeOpenAIReasoningBlocks). Defensive but loses reasoning continuity.
OpenHands: Hit missing_following_item and invalid_encrypted_content errors. Their fix: treat reasoning items as output-only artifacts.
Our approach: Preserves reasoning continuity by routing through the Codex continuation path, while ensuring the API input satisfies the required-following-item constraint.

Tests (8 new)

Unit: reasoning-only → incomplete, reasoning+content → stop
E2E: reasoning-only → continuation → final answer succeeds
E2E: encrypted reasoning items preserved in interim messages
API input: reasoning items always have a following item
Duplicate detection: different codex_reasoning_items not collapsed

python -m pytest tests/test_run_agent_codex_responses.py -n0 -q  # 33 passed
python -m pytest tests/test_run_agent.py tests/test_provider_parity.py -n0 -q  # 250 passed

When a Codex Responses API response contains only reasoning items (encrypted thinking state) with no message text or tool calls, the _normalize_codex_response method was setting finish_reason='stop'. This sent the response into the empty-content retry loop, which burned 3 retries and then failed — exactly the pattern Nester reported in Discord. Two fixes: 1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw non-empty but no final_text) now get finish_reason='incomplete', routing them to the Codex continuation path instead of the retry loop. 2. Incomplete handling: also checks for codex_reasoning_items when deciding whether to preserve an interim message, so encrypted reasoning state is not silently dropped when there is no visible reasoning text. Adds 4 regression tests covering: - Unit: reasoning-only → incomplete, reasoning+content → stop - E2E: reasoning-only → continuation → final answer succeeds - E2E: encrypted reasoning items preserved in interim messages

…I input Follow-up to the reasoning-only response fix. Three additional issues found by tracing the full replay path: 1. _chat_messages_to_responses_input: when a reasoning-only interim message was converted to Responses API input, the reasoning items were emitted as the last items with no following item. The Responses API requires a following item after each reasoning item (otherwise: 'missing_following_item' error, as seen in OpenHands #11406). Now emits an empty assistant message as the required following item when content is empty but reasoning items were added. 2. Duplicate detection: two consecutive reasoning-only incomplete messages with identical empty content/reasoning but different encrypted codex_reasoning_items were incorrectly treated as duplicates, silently dropping the second response's reasoning state. Now includes codex_reasoning_items in the duplicate comparison. 3. Added tests for both the API input conversion path and the duplicate detection edge case. Research context: verified against OpenCode (uses Vercel AI SDK, no retry loop so avoids the issue), Clawdbot (drops orphaned reasoning blocks entirely), and OpenHands (hit the missing_following_item error). Our approach preserves reasoning continuity while satisfying the API constraint.

…arch#2070) * fix(codex): treat reasoning-only responses as incomplete, not stop When a Codex Responses API response contains only reasoning items (encrypted thinking state) with no message text or tool calls, the _normalize_codex_response method was setting finish_reason='stop'. This sent the response into the empty-content retry loop, which burned 3 retries and then failed — exactly the pattern Nester reported in Discord. Two fixes: 1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw non-empty but no final_text) now get finish_reason='incomplete', routing them to the Codex continuation path instead of the retry loop. 2. Incomplete handling: also checks for codex_reasoning_items when deciding whether to preserve an interim message, so encrypted reasoning state is not silently dropped when there is no visible reasoning text. Adds 4 regression tests covering: - Unit: reasoning-only → incomplete, reasoning+content → stop - E2E: reasoning-only → continuation → final answer succeeds - E2E: encrypted reasoning items preserved in interim messages * fix(codex): ensure reasoning items have required following item in API input Follow-up to the reasoning-only response fix. Three additional issues found by tracing the full replay path: 1. _chat_messages_to_responses_input: when a reasoning-only interim message was converted to Responses API input, the reasoning items were emitted as the last items with no following item. The Responses API requires a following item after each reasoning item (otherwise: 'missing_following_item' error, as seen in OpenHands NousResearch#11406). Now emits an empty assistant message as the required following item when content is empty but reasoning items were added. 2. Duplicate detection: two consecutive reasoning-only incomplete messages with identical empty content/reasoning but different encrypted codex_reasoning_items were incorrectly treated as duplicates, silently dropping the second response's reasoning state. Now includes codex_reasoning_items in the duplicate comparison. 3. Added tests for both the API input conversion path and the duplicate detection edge case. Research context: verified against OpenCode (uses Vercel AI SDK, no retry loop so avoids the issue), Clawdbot (drops orphaned reasoning blocks entirely), and OpenHands (hit the missing_following_item error). Our approach preserves reasoning continuity while satisfying the API constraint. --------- Co-authored-by: Test <test@test.com>

Test added 2 commits March 19, 2026 10:00

teknium1 merged commit e84d952 into main Mar 19, 2026
1 check passed

bigph00t mentioned this pull request Mar 20, 2026

Bug: Empty assistant message from reasoning-only fix leaks into Chat Completions API, causing prefill rejection #2128

Closed

kshitijk4poor mentioned this pull request Apr 2, 2026

fix(agent): classify think-only empty responses before retrying #4552

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex): treat reasoning-only responses as incomplete, not stop#2070

fix(codex): treat reasoning-only responses as incomplete, not stop#2070
teknium1 merged 2 commits into
mainfrom
hermes/hermes-f1230adf

teknium1 commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes (2 commits)

Commit 1: Core fix

Commit 2: Replay path hardening (found via research)

Comparison with other agents:

Tests (8 new)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

teknium1 commented Mar 19, 2026 •

edited

Loading