fix(goals): add consecutive API error auto-pause to prevent goal spam (#27585)#27760
fix(goals): add consecutive API error auto-pause to prevent goal spam (#27585)#27760zccyman wants to merge 2 commits into
Conversation
|
CI failure is branch-local. The new Failing tests:
I could not push to the fork ( python -m pytest tests/hermes_cli/test_goals.py tests/cli/test_cli_goal_interrupt.py::TestHealthyTurnStillRuns tests/gateway/test_goal_verdict_send.py -q -o 'addopts='
# 60 passed |
…NousResearch#27585) When the goal judge API is unreachable (network errors, 429s, provider outages), judge_goal() returns ("continue", ..., parse_failed=False). The evaluate_after_turn() loop only tracked consecutive parse failures, not API errors, so a persistent outage caused the agent to repeat its terminal response indefinitely. Changes: - judge_goal() now returns a 4-tuple with an api_error flag - GoalState adds consecutive_api_errors counter (persisted to disk) - evaluate_after_turn() tracks consecutive API errors and auto-pauses after DEFAULT_MAX_CONSECUTIVE_API_ERRORS=5 turns - New auto-pause message directs user to check auxiliary config - 3 new tests covering API error auto-pause, counter reset, and independence from parse failure counter Fixes NousResearch#27585
BoardJames-Bot flagged stale mock return_values in tests/gateway/test_goal_verdict_send.py (4 sites) and tests/cli/test_cli_goal_interrupt.py (2 sites) that still return the old 3-tuple (verdict, reason, parse_failed) instead of the new 4-tuple (verdict, reason, parse_failed, api_error). All 65 tests pass: test_goals (53) + test_goal_verdict_send (5) + test_cli_goal_interrupt (7).
644a1ec to
e253ad0
Compare
|
Thanks for taking on the remaining /goal judge-error spam gap. The premise still reproduces on current main: judge_goal() fails open on API exceptions at hermes_cli/goals.py:451-453, and evaluate_after_turn() can still return should_continue=True with a continuation prompt at hermes_cli/goals.py:727-736. Problems
Suggested changes
This is an automated hermes-sweeper review. |
Summary
Fixes #27585 — the
/goalloop can spam repeated completion messages when the goal judge API is unreachable.Root Cause
judge_goal()returns("continue", ..., parse_failed=False)on API/transport errors.evaluate_after_turn()only tracked consecutive parse failures for auto-pause — API errors reset the parse counter to 0, so a persistent outage caused infinite continuation loops.Fix
Mirror the existing
consecutive_parse_failurespattern for API errors:judge_goal()returns 4-tuple — newapi_errorflag distinguishes API failures from parse failuresGoalState.consecutive_api_errors— persisted counter (backward-compatible viafrom_jsondefault)DEFAULT_MAX_CONSECUTIVE_API_ERRORS = 5— auto-pause threshold, same structure as parse failure guardgoal_judgeconfigTesting
53/53 tests pass, including 3 new tests covering API error auto-pause, counter reset, and independence from parse failure counter.
Relationship to PR #27752
PR #27752 (briandevans) takes a different approach: detecting terminal response content in the agent output. This PR addresses the general case — any consecutive API error triggers auto-pause regardless of response content. The approaches are complementary.