fix(cron): surface agent run_conversation failure flags as job failure#17859
Merged
teknium1 merged 1 commit intoApr 30, 2026
Merged
Conversation
run_job() ignored the result's `failed=True` / `completed=False` flags
that agent.run_conversation populates on API exhaustion, mid-run
interrupts, and model aborts. Because final_response on those paths is
often a non-empty error string ("API call failed after 3 retries:
Request timed out."), the existing empty-response soft-fail in
_process_job did not trip either: the error text was delivered as if it
were the agent's reply and last_status was set to "ok" with no error
notification. Detect those flags right after the dict-shape guard and
raise so the existing except handler builds the proper failure tuple,
preserving the agent's error message via result["error"].
Adds a parametrized regression covering: API-retry-exhausted with error
text in final_response, completed=False with no final_response,
completed=False without an explicit failed flag, and the partial-reply
plus failed=True case. Plus a guard that a normal completed=True success
result is still treated as success.
Fixes NousResearch#17855
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes cron job status reporting by ensuring cron.scheduler.run_job() treats agent-reported failures (failed=True / completed=False) as actual job failures, so cron last_status and user notifications correctly reflect API retry exhaustion and mid-run aborts.
Changes:
- Update
run_job()to raise on agent-returned failure flags, allowing the existing failure handler to generate a failure tuple. - Add regression tests covering multiple agent failure-result shapes plus a success-path guard.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cron/scheduler.py |
Detects failed / completed flags from run_conversation() results and converts them into a job failure. |
tests/cron/test_scheduler.py |
Adds parametrized coverage for agent-flagged failure results and ensures normal success results still succeed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
19 tasks
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
last_status=\"ok\"when the agent's API call exhausted retries — the failure was never delivered to the user even though the agent reported it.run_job()now consults thefailed/completedflags thatagent.run_conversationpopulates on those paths and raises so the existing except handler emits the proper failure tuple.The bug
agent.run_conversation()returns{\"final_response\": \"API call failed after 3 retries: Request timed out.\", \"failed\": True, \"completed\": False, \"error\": \"...\"}on hard failure paths inrun_agent.py(multiple paths around lines 10811–12292 set this shape).cron/scheduler.py:run_jobonly readfinal_responseand returnedTrue, output, final_response, Noneregardless. The downstream_process_jobempty-response soft-fail at line 1342 only triggers whenfinal_response == \"\"— but in this scenario the error text is the final_response, so:should_deliver = bool(\"API call failed...\")→True, so_deliver_resultships the error text to the user as the agent's reply.successstaysTrue, somark_job_runrecordslast_status=\"ok\"withlast_error=null.Production repro from the issue: a job pointed at a slow self-hosted endpoint timed out after 3 retries; output file showed the error, status showed "ok", no notification fired.
The fix
Right after the existing dict-shape guard in
run_job, checkresult.get(\"failed\") is True or result.get(\"completed\") is Falseand raiseRuntimeErrorwith the agent'serror(falling back to the trimmedfinal_response, then a generic string). The pre-existing except block already builds the FAILED output template and returnsFalse, output, \"\", error_msg, somark_job_runsees the failure and_process_jobbuilds the user-visible error notification (\"⚠️ Cron job '…' failed: …\").The check uses
is True/is Falseso a result with neither flag (older or simpler success paths) keeps its current success behavior; only explicit failure markers trip it.Test plan
tests/cron/test_scheduler.py::test_run_job_treats_agent_failure_flag_as_failure(parametrized — 4 cases: API-retry exhaustion with error text in final_response; failed+completed=False with no final_response; completed=False without an explicit failed flag; partial-reply + failed=True).test_run_job_completed_true_without_failed_flag_succeedsconfirms a normal{completed: True, final_response: ...}result still succeeds.tests/cron/test_scheduler.py(103 passed). No prior tests stubfailed=True, so no existing assertion changes.assert True is False(success was wrongly True); restoring the fix passes all 5.Related
Fixes #17855