Skip to content

fix(goals): add consecutive API error auto-pause to prevent goal spam (#27585)#27760

Open
zccyman wants to merge 2 commits into
NousResearch:mainfrom
atyou2happy:fix/goal-judge-fail-open-27585
Open

fix(goals): add consecutive API error auto-pause to prevent goal spam (#27585)#27760
zccyman wants to merge 2 commits into
NousResearch:mainfrom
atyou2happy:fix/goal-judge-fail-open-27585

Conversation

@zccyman

@zccyman zccyman commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #27585 — the /goal loop can spam repeated completion messages when the goal judge API is unreachable.

Root Cause

judge_goal() returns ("continue", ..., parse_failed=False) on API/transport errors. evaluate_after_turn() only tracked consecutive parse failures for auto-pause — API errors reset the parse counter to 0, so a persistent outage caused infinite continuation loops.

Fix

Mirror the existing consecutive_parse_failures pattern for API errors:

  1. judge_goal() returns 4-tuple — new api_error flag distinguishes API failures from parse failures
  2. GoalState.consecutive_api_errors — persisted counter (backward-compatible via from_json default)
  3. DEFAULT_MAX_CONSECUTIVE_API_ERRORS = 5 — auto-pause threshold, same structure as parse failure guard
  4. Auto-pause message — directs user to check auxiliary goal_judge config

Testing

53/53 tests pass, including 3 new tests covering API error auto-pause, counter reset, and independence from parse failure counter.

Relationship to PR #27752

PR #27752 (briandevans) takes a different approach: detecting terminal response content in the agent output. This PR addresses the general case — any consecutive API error triggers auto-pause regardless of response content. The approaches are complementary.

@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists labels May 18, 2026
@BoardJames-Bot

Copy link
Copy Markdown

CI failure is branch-local. The new judge_goal() contract now returns four values (verdict, reason, parse_failed, api_error), but a few existing mocked call sites in this branch still return the old 3-tuple, so GoalManager.evaluate_after_turn() raises:

ValueError: not enough values to unpack (expected 4, got 3)

Failing tests:

  • tests/cli/test_cli_goal_interrupt.py::TestHealthyTurnStillRuns::test_clean_response_enqueues_continuation_when_judge_says_continue
  • tests/cli/test_cli_goal_interrupt.py::TestHealthyTurnStillRuns::test_clean_response_marks_done_when_judge_says_done
  • 4 tests in tests/gateway/test_goal_verdict_send.py

I could not push to the fork (Permission to atyou2happy/hermes-agent.git denied to BoardJames-Bot), but the local fix is just to add the fourth False value to those mocks. Verified locally with:

python -m pytest tests/hermes_cli/test_goals.py tests/cli/test_cli_goal_interrupt.py::TestHealthyTurnStillRuns tests/gateway/test_goal_verdict_send.py -q -o 'addopts='
# 60 passed

zccyman added 2 commits May 21, 2026 15:35
…NousResearch#27585)

When the goal judge API is unreachable (network errors, 429s, provider
outages), judge_goal() returns ("continue", ..., parse_failed=False).
The evaluate_after_turn() loop only tracked consecutive parse failures,
not API errors, so a persistent outage caused the agent to repeat its
terminal response indefinitely.

Changes:
- judge_goal() now returns a 4-tuple with an api_error flag
- GoalState adds consecutive_api_errors counter (persisted to disk)
- evaluate_after_turn() tracks consecutive API errors and auto-pauses
  after DEFAULT_MAX_CONSECUTIVE_API_ERRORS=5 turns
- New auto-pause message directs user to check auxiliary config
- 3 new tests covering API error auto-pause, counter reset, and
  independence from parse failure counter

Fixes NousResearch#27585
BoardJames-Bot flagged stale mock return_values in tests/gateway/test_goal_verdict_send.py (4 sites) and tests/cli/test_cli_goal_interrupt.py (2 sites) that still return the old 3-tuple (verdict, reason, parse_failed) instead of the new 4-tuple (verdict, reason, parse_failed, api_error).

All 65 tests pass: test_goals (53) + test_goal_verdict_send (5) + test_cli_goal_interrupt (7).
@zccyman zccyman force-pushed the fix/goal-judge-fail-open-27585 branch from 644a1ec to e253ad0 Compare May 21, 2026 07:36
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for taking on the remaining /goal judge-error spam gap. The premise still reproduces on current main: judge_goal() fails open on API exceptions at hermes_cli/goals.py:451-453, and evaluate_after_turn() can still return should_continue=True with a continuation prompt at hermes_cli/goals.py:727-736.

Problems

  • The PR changes judge_goal() to return a 4-tuple, but current main now has another caller: run_kanban_goal_loop() still unpacks three values at hermes_cli/goals.py:854. That path was added in 0cd7d54, so salvaging this needs the same contract update there.
  • The new threshold still permits several repeated continuations before pause. In gateway sessions, should_continue=True enqueues the next goal turn at gateway/run.py:9528-9545, so an explicit terminal response can still repeat until the API-error counter reaches 5.

Suggested changes

This is an automated hermes-sweeper review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: /goal can spam repeated completion messages when goal_judge errors fail-open to continue

4 participants