Skip to content

fix: move _last_content_with_tools fallback after retry mechanisms#8024

Closed
tomqiaozc wants to merge 1 commit into
NousResearch:mainfrom
tomqiaozc:fix/last-content-with-tools-retry-bypass
Closed

fix: move _last_content_with_tools fallback after retry mechanisms#8024
tomqiaozc wants to merge 1 commit into
NousResearch:mainfrom
tomqiaozc:fix/last-content-with-tools-retry-bypass

Conversation

@tomqiaozc

Copy link
Copy Markdown

Summary

  • Moves the _last_content_with_tools fallback to fire after all retry mechanisms (thinking-prefill retry, empty-content retry ×3, fallback provider), instead of before them. Previously the fallback short-circuited retries, causing silent mid-task termination with stale content.
  • Clears stale _last_content_with_tools on tool-only turns (no real content) to prevent old content from persisting across many turns.
  • Adds 3 tests covering: retries fire before fallback, fallback fires after retries exhausted, stale capture cleared on tool-only turns.

Fixes #7968

Test plan

  • TestLastContentWithToolsFallback::test_retries_before_fallback_on_empty_followup — verifies retries produce fresh content before fallback is consulted
  • TestLastContentWithToolsFallback::test_fallback_used_after_all_retries_exhausted — verifies fallback IS used when all retries return empty
  • TestLastContentWithToolsFallback::test_stale_fallback_cleared_on_tool_only_turn — verifies tool-only turn clears stale capture
  • Full test_run_agent.py suite: 251/251 passed

🤖 Generated with Claude Code

The _last_content_with_tools fallback was firing before empty-response
retries (thinking-prefill, empty-content, fallback provider), causing
the agent to silently terminate mid-task with stale content instead of
retrying. Move the fallback to after all retry mechanisms are exhausted
so it only fires as a true last resort.

Also clear stale _last_content_with_tools on tool-only turns to prevent
old content from persisting across many turns.

Fixes NousResearch#7968

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the thorough write-up and repro in #7968, @tomqiaozc — this is a real and well-documented bug.

However, the core issue was independently fixed on main three days after this PR was opened, via commit 0d25e1c14 (PR #10472, merged Apr 15 2026). That commit took a slightly different approach — rather than reordering the fallback after all retry mechanisms, it introduced a _last_content_tools_all_housekeeping gate that restricts the fallback to turns where all tools were housekeeping (memory, todo, etc.). Substantive-tool turns now fall through to the existing post-tool nudge path, which achieves the same goal of preventing premature loop exit on mid-task narration content.

The secondary fix (clearing stale _last_content_with_tools on tool-only turns) is also covered: run_agent.py:12152 clears the field in the post-tool-empty-retried branch.

Evidence:

  • run_agent.py:12112 — housekeeping gate (_last_content_tools_all_housekeeping) is live on main
  • run_agent.py:12152 — stale-clear on tool-only turns is live on main
  • Commit 0d25e1c146092b36f26205f65e93c25e5147a646 — the implementing fix

Note: your three TestLastContentWithToolsFallback tests were not ported to main. If you'd like to open a focused follow-up PR that adds just those tests (adapted to the current housekeeping-gate design), that would be a net positive for regression coverage.

This is an automated hermes-sweeper review.

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: _last_content_with_tools fallback bypasses empty-response retries, causing silent agent loop termination mid-task

2 participants