fix(goals): add consecutive API error auto-pause to prevent goal spam (#27585) by zccyman · Pull Request #27760 · NousResearch/hermes-agent

zccyman · 2026-05-18T04:41:02Z

Summary

Fixes #27585 — the /goal loop can spam repeated completion messages when the goal judge API is unreachable.

Root Cause

judge_goal() returns ("continue", ..., parse_failed=False) on API/transport errors. evaluate_after_turn() only tracked consecutive parse failures for auto-pause — API errors reset the parse counter to 0, so a persistent outage caused infinite continuation loops.

Fix

Mirror the existing consecutive_parse_failures pattern for API errors:

judge_goal() returns 4-tuple — new api_error flag distinguishes API failures from parse failures
GoalState.consecutive_api_errors — persisted counter (backward-compatible via from_json default)
DEFAULT_MAX_CONSECUTIVE_API_ERRORS = 5 — auto-pause threshold, same structure as parse failure guard
Auto-pause message — directs user to check auxiliary goal_judge config

Testing

53/53 tests pass, including 3 new tests covering API error auto-pause, counter reset, and independence from parse failure counter.

Relationship to PR #27752

PR #27752 (briandevans) takes a different approach: detecting terminal response content in the agent output. This PR addresses the general case — any consecutive API error triggers auto-pause regardless of response content. The approaches are complementary.

BoardJames-Bot · 2026-05-18T05:00:20Z

CI failure is branch-local. The new judge_goal() contract now returns four values (verdict, reason, parse_failed, api_error), but a few existing mocked call sites in this branch still return the old 3-tuple, so GoalManager.evaluate_after_turn() raises:

ValueError: not enough values to unpack (expected 4, got 3)

Failing tests:

tests/cli/test_cli_goal_interrupt.py::TestHealthyTurnStillRuns::test_clean_response_enqueues_continuation_when_judge_says_continue
tests/cli/test_cli_goal_interrupt.py::TestHealthyTurnStillRuns::test_clean_response_marks_done_when_judge_says_done
4 tests in tests/gateway/test_goal_verdict_send.py

I could not push to the fork (Permission to atyou2happy/hermes-agent.git denied to BoardJames-Bot), but the local fix is just to add the fourth False value to those mocks. Verified locally with:

python -m pytest tests/hermes_cli/test_goals.py tests/cli/test_cli_goal_interrupt.py::TestHealthyTurnStillRuns tests/gateway/test_goal_verdict_send.py -q -o 'addopts='
# 60 passed

…NousResearch#27585) When the goal judge API is unreachable (network errors, 429s, provider outages), judge_goal() returns ("continue", ..., parse_failed=False). The evaluate_after_turn() loop only tracked consecutive parse failures, not API errors, so a persistent outage caused the agent to repeat its terminal response indefinitely. Changes: - judge_goal() now returns a 4-tuple with an api_error flag - GoalState adds consecutive_api_errors counter (persisted to disk) - evaluate_after_turn() tracks consecutive API errors and auto-pauses after DEFAULT_MAX_CONSECUTIVE_API_ERRORS=5 turns - New auto-pause message directs user to check auxiliary config - 3 new tests covering API error auto-pause, counter reset, and independence from parse failure counter Fixes NousResearch#27585

BoardJames-Bot flagged stale mock return_values in tests/gateway/test_goal_verdict_send.py (4 sites) and tests/cli/test_cli_goal_interrupt.py (2 sites) that still return the old 3-tuple (verdict, reason, parse_failed) instead of the new 4-tuple (verdict, reason, parse_failed, api_error). All 65 tests pass: test_goals (53) + test_goal_verdict_send (5) + test_cli_goal_interrupt (7).

teknium1 · 2026-06-13T12:14:43Z

Thanks for taking on the remaining /goal judge-error spam gap. The premise still reproduces on current main: judge_goal() fails open on API exceptions at hermes_cli/goals.py:451-453, and evaluate_after_turn() can still return should_continue=True with a continuation prompt at hermes_cli/goals.py:727-736.

Problems

The PR changes judge_goal() to return a 4-tuple, but current main now has another caller: run_kanban_goal_loop() still unpacks three values at hermes_cli/goals.py:854. That path was added in 0cd7d54, so salvaging this needs the same contract update there.
The new threshold still permits several repeated continuations before pause. In gateway sessions, should_continue=True enqueues the next goal turn at gateway/run.py:9528-9545, so an explicit terminal response can still repeat until the API-error counter reaches 5.

Suggested changes

Update the kanban goal-mode caller and add coverage for judge API errors in that loop.
Consider a fast-stop for terminal completion phrasing, or combine the API-error counter with a lower duplicate/terminal-response guard, so the [Bug]: /goal can spam repeated completion messages when goal_judge errors fail-open to continue #27585 gateway spam is stopped earlier.

This is an automated hermes-sweeper review.

alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists labels May 18, 2026

zccyman added 2 commits May 21, 2026 15:35

zccyman force-pushed the fix/goal-judge-fail-open-27585 branch from 644a1ec to e253ad0 Compare May 21, 2026 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(goals): add consecutive API error auto-pause to prevent goal spam (#27585)#27760

fix(goals): add consecutive API error auto-pause to prevent goal spam (#27585)#27760
zccyman wants to merge 2 commits into
NousResearch:mainfrom
atyou2happy:fix/goal-judge-fail-open-27585

zccyman commented May 18, 2026

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

teknium1 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zccyman commented May 18, 2026

Summary

Root Cause

Fix

Testing

Relationship to PR #27752

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

teknium1 commented Jun 13, 2026

Problems

Suggested changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants