Skip to content

fix(cron): prevent parallel job result loss on exception#27048

Closed
pr7426 wants to merge 1 commit into
NousResearch:mainfrom
pr7426:fix/cron-parallel-job-result-loss
Closed

fix(cron): prevent parallel job result loss on exception#27048
pr7426 wants to merge 1 commit into
NousResearch:mainfrom
pr7426:fix/cron-parallel-job-result-loss

Conversation

@pr7426

@pr7426 pr7426 commented May 16, 2026

Copy link
Copy Markdown
Contributor

Problem

In cron/scheduler.py, parallel cron jobs collect results via a generator expression that has two issues:

1. No timeoutf.result() blocks indefinitely. If one job hangs (e.g., LLM API timeout), all parallel jobs in the same tick are blocked.

2. Exception propagation — If f.result() raises (e.g., BaseException subclass escaping _process_job's except Exception), the generator stops iterating. Remaining futures' results are lost, and mark_job_run() is never called for those jobs — leaving them in a stuck state.

Fix

Replace the generator expression with concurrent.futures.as_completed() + per-future try/except:

  • Each future is processed independently — one failure doesn't affect others
  • 600s timeout prevents indefinite blocking
  • Failed futures are logged and counted as failures

Testing

  • Verified with 22 active cron jobs, multiple jobs due simultaneously
  • Simulated job failure: other jobs in the same tick completed normally

Replace generator-based result collection with explicit per-future
handling. Each future is now processed independently with a 600s timeout.

Before: _results.extend(f.result() for f in _futures)
- One exception stops the generator, remaining results are lost
- No timeout: one hung job blocks the entire tick

After: as_completed() + per-future try/except
- Each future handled independently
- 600s timeout prevents indefinite blocking
- Failed futures are logged and counted as failures
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cron Cron scheduler and job management labels May 16, 2026
teknium1 added a commit that referenced this pull request May 17, 2026
…tors

Adds release-note attribution mappings for 9 contributors from group 3:
- @darvsum (PR #26766)
- @hueilau (PR #26498)
- @Timur00Kh (PR #27114)
- @Grogger (PR #27061)
- @lemassykoi (PR #27042)
- @draplater (PR #26707)
- @pr7426 (PR #27048)
- @therahul-yo (PR #26215)
- @flamiinngo (PR #27205)

#27154 dropped from this batch — already landed on main as 4e9cedc.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #27302 — your commit was cherry-picked onto current main as part of a batch salvage of low-risk new-contributor PRs. Authorship preserved (fix(cron): prevent parallel job result loss on exception). Thanks for the contribution.

@teknium1 teknium1 closed this May 17, 2026
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants