fix(cron): surface agent run_conversation failure flags as job failure by briandevans · Pull Request #17859 · NousResearch/hermes-agent

briandevans · 2026-04-30T09:13:32Z

Summary

Cron jobs were silently marked last_status=\"ok\" when the agent's API call exhausted retries — the failure was never delivered to the user even though the agent reported it.
run_job() now consults the failed / completed flags that agent.run_conversation populates on those paths and raises so the existing except handler emits the proper failure tuple.
Adds a parametrized regression covering API exhaustion, mid-run interrupts, and partial-reply failure shapes; plus a success-path guard.

The bug

agent.run_conversation() returns {\"final_response\": \"API call failed after 3 retries: Request timed out.\", \"failed\": True, \"completed\": False, \"error\": \"...\"} on hard failure paths in run_agent.py (multiple paths around lines 10811–12292 set this shape). cron/scheduler.py:run_job only read final_response and returned True, output, final_response, None regardless. The downstream _process_job empty-response soft-fail at line 1342 only triggers when final_response == \"\" — but in this scenario the error text is the final_response, so:

should_deliver = bool(\"API call failed...\") → True, so _deliver_result ships the error text to the user as the agent's reply.
success stays True, so mark_job_run records last_status=\"ok\" with last_error=null.

Production repro from the issue: a job pointed at a slow self-hosted endpoint timed out after 3 retries; output file showed the error, status showed "ok", no notification fired.

The fix

Right after the existing dict-shape guard in run_job, check result.get(\"failed\") is True or result.get(\"completed\") is False and raise RuntimeError with the agent's error (falling back to the trimmed final_response, then a generic string). The pre-existing except block already builds the FAILED output template and returns False, output, \"\", error_msg, so mark_job_run sees the failure and _process_job builds the user-visible error notification (\"⚠️ Cron job '…' failed: …\").

The check uses is True / is False so a result with neither flag (older or simpler success paths) keeps its current success behavior; only explicit failure markers trip it.

Test plan

Focused regression: tests/cron/test_scheduler.py::test_run_job_treats_agent_failure_flag_as_failure (parametrized — 4 cases: API-retry exhaustion with error text in final_response; failed+completed=False with no final_response; completed=False without an explicit failed flag; partial-reply + failed=True).
Success-path guard: test_run_job_completed_true_without_failed_flag_succeeds confirms a normal {completed: True, final_response: ...} result still succeeds.
Adjacent suite: full tests/cron/test_scheduler.py (103 passed). No prior tests stub failed=True, so no existing assertion changes.
Regression guard: with the production fix reverted, all 4 parametrized cases fail with assert True is False (success was wrongly True); restoring the fix passes all 5.

run_job() ignored the result's `failed=True` / `completed=False` flags that agent.run_conversation populates on API exhaustion, mid-run interrupts, and model aborts. Because final_response on those paths is often a non-empty error string ("API call failed after 3 retries: Request timed out."), the existing empty-response soft-fail in _process_job did not trip either: the error text was delivered as if it were the agent's reply and last_status was set to "ok" with no error notification. Detect those flags right after the dict-shape guard and raise so the existing except handler builds the proper failure tuple, preserving the agent's error message via result["error"]. Adds a parametrized regression covering: API-retry-exhausted with error text in final_response, completed=False with no final_response, completed=False without an explicit failed flag, and the partial-reply plus failed=True case. Plus a guard that a normal completed=True success result is still treated as success. Fixes NousResearch#17855 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes cron job status reporting by ensuring cron.scheduler.run_job() treats agent-reported failures (failed=True / completed=False) as actual job failures, so cron last_status and user notifications correctly reflect API retry exhaustion and mid-run aborts.

Changes:

Update run_job() to raise on agent-returned failure flags, allowing the existing failure handler to generate a failure tuple.
Add regression tests covering multiple agent failure-result shapes plus a success-path guard.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`cron/scheduler.py`	Detects `failed` / `completed` flags from `run_conversation()` results and converts them into a job failure.
`tests/cron/test_scheduler.py`	Adds parametrized coverage for agent-flagged failure results and ensures normal success results still succeed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 30, 2026 09:13

Copilot started reviewing on behalf of briandevans April 30, 2026 09:14 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cron Cron scheduler and job management labels Apr 30, 2026

alt-glitch mentioned this pull request Apr 30, 2026

fix(cron): report agent API failure correctly, never hide error notif… #17882

Closed

19 tasks

teknium1 mentioned this pull request Apr 30, 2026

Fix/nameerror prompt defined 17787 #17886

Closed

19 tasks

teknium1 merged commit f549357 into NousResearch:main Apr 30, 2026
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cron): surface agent run_conversation failure flags as job failure#17859

fix(cron): surface agent run_conversation failure flags as job failure#17859
teknium1 merged 1 commit into
NousResearch:mainfrom
briandevans:fix/cron-detect-agent-failure-17855

briandevans commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

briandevans commented Apr 30, 2026

Summary

The bug

The fix

Test plan

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants