Skip to content

fix(cron): surface agent run_conversation failure flags as job failure#17859

Merged
teknium1 merged 1 commit into
NousResearch:mainfrom
briandevans:fix/cron-detect-agent-failure-17855
Apr 30, 2026
Merged

fix(cron): surface agent run_conversation failure flags as job failure#17859
teknium1 merged 1 commit into
NousResearch:mainfrom
briandevans:fix/cron-detect-agent-failure-17855

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

Summary

  • Cron jobs were silently marked last_status=\"ok\" when the agent's API call exhausted retries — the failure was never delivered to the user even though the agent reported it.
  • run_job() now consults the failed / completed flags that agent.run_conversation populates on those paths and raises so the existing except handler emits the proper failure tuple.
  • Adds a parametrized regression covering API exhaustion, mid-run interrupts, and partial-reply failure shapes; plus a success-path guard.

The bug

agent.run_conversation() returns {\"final_response\": \"API call failed after 3 retries: Request timed out.\", \"failed\": True, \"completed\": False, \"error\": \"...\"} on hard failure paths in run_agent.py (multiple paths around lines 10811–12292 set this shape). cron/scheduler.py:run_job only read final_response and returned True, output, final_response, None regardless. The downstream _process_job empty-response soft-fail at line 1342 only triggers when final_response == \"\" — but in this scenario the error text is the final_response, so:

  1. should_deliver = bool(\"API call failed...\")True, so _deliver_result ships the error text to the user as the agent's reply.
  2. success stays True, so mark_job_run records last_status=\"ok\" with last_error=null.

Production repro from the issue: a job pointed at a slow self-hosted endpoint timed out after 3 retries; output file showed the error, status showed "ok", no notification fired.

The fix

Right after the existing dict-shape guard in run_job, check result.get(\"failed\") is True or result.get(\"completed\") is False and raise RuntimeError with the agent's error (falling back to the trimmed final_response, then a generic string). The pre-existing except block already builds the FAILED output template and returns False, output, \"\", error_msg, so mark_job_run sees the failure and _process_job builds the user-visible error notification (\"⚠️ Cron job '…' failed: …\").

The check uses is True / is False so a result with neither flag (older or simpler success paths) keeps its current success behavior; only explicit failure markers trip it.

Test plan

  • Focused regression: tests/cron/test_scheduler.py::test_run_job_treats_agent_failure_flag_as_failure (parametrized — 4 cases: API-retry exhaustion with error text in final_response; failed+completed=False with no final_response; completed=False without an explicit failed flag; partial-reply + failed=True).
  • Success-path guard: test_run_job_completed_true_without_failed_flag_succeeds confirms a normal {completed: True, final_response: ...} result still succeeds.
  • Adjacent suite: full tests/cron/test_scheduler.py (103 passed). No prior tests stub failed=True, so no existing assertion changes.
  • Regression guard: with the production fix reverted, all 4 parametrized cases fail with assert True is False (success was wrongly True); restoring the fix passes all 5.

Related

Fixes #17855

run_job() ignored the result's `failed=True` / `completed=False` flags
that agent.run_conversation populates on API exhaustion, mid-run
interrupts, and model aborts. Because final_response on those paths is
often a non-empty error string ("API call failed after 3 retries:
Request timed out."), the existing empty-response soft-fail in
_process_job did not trip either: the error text was delivered as if it
were the agent's reply and last_status was set to "ok" with no error
notification. Detect those flags right after the dict-shape guard and
raise so the existing except handler builds the proper failure tuple,
preserving the agent's error message via result["error"].

Adds a parametrized regression covering: API-retry-exhausted with error
text in final_response, completed=False with no final_response,
completed=False without an explicit failed flag, and the partial-reply
plus failed=True case. Plus a guard that a normal completed=True success
result is still treated as success.

Fixes NousResearch#17855

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 09:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cron job status reporting by ensuring cron.scheduler.run_job() treats agent-reported failures (failed=True / completed=False) as actual job failures, so cron last_status and user notifications correctly reflect API retry exhaustion and mid-run aborts.

Changes:

  • Update run_job() to raise on agent-returned failure flags, allowing the existing failure handler to generate a failure tuple.
  • Add regression tests covering multiple agent failure-result shapes plus a success-path guard.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cron/scheduler.py Detects failed / completed flags from run_conversation() results and converts them into a job failure.
tests/cron/test_scheduler.py Adds parametrized coverage for agent-flagged failure results and ensures normal success results still succeed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cron Cron scheduler and job management labels Apr 30, 2026
@teknium1 teknium1 mentioned this pull request Apr 30, 2026
19 tasks
@teknium1 teknium1 merged commit f549357 into NousResearch:main Apr 30, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cron: API failure incorrectly reported as last_status=ok, no error notification delivered

4 participants