Skip to content

feat(cron): parallel job execution within a single tick#9965

Closed
trueice wants to merge 1 commit into
NousResearch:mainfrom
trueice:feat/cron-parallel-execution
Closed

feat(cron): parallel job execution within a single tick#9965
trueice wants to merge 1 commit into
NousResearch:mainfrom
trueice:feat/cron-parallel-execution

Conversation

@trueice

@trueice trueice commented Apr 15, 2026

Copy link
Copy Markdown

Problem

Currently all due jobs in tick() run serially in a for-loop. If one job takes 9 minutes (e.g. a daily data fetch), all other due jobs are blocked until it finishes.

Solution

Split tick() into two phases:

  1. Phase 1 (under file lock): advance next_run_at for all due jobs
  2. Phase 2 (outside lock): execute jobs in parallel via ThreadPoolExecutor

Key changes

  • HERMES_CRON_MAX_WORKERS env var (default 4) controls max parallel jobs
  • os.environcontextvars for session/delivery injection (thread-safe, no cross-job leakage)
  • load_dotenv() serialized with a threading lock
  • Backward compatible: HERMES_CRON_MAX_WORKERS=1 = serial behavior (identical to current code)
  • 155 lines of new tests for parallel execution

Files changed

File Change
cron/scheduler.py Parallel tick + ContextVars
cron/jobs.py Thread-safe jobs.json writes
gateway/session_context.py New ContextVars for session state
tools/send_message_tool.py Read from ContextVars with os.environ fallback
tests/cron/test_scheduler.py 155 lines of new tests

Safety

  • Each job already creates its own AIAgent instance — no shared state
  • File lock still prevents concurrent ticks (gateway + daemon + systemd)
  • save_job_output() writes to per-job directories — no conflicts
  • mark_job_run() uses file-level locking for jobs.json updates

Currently all due jobs run serially in tick(), so a slow job (e.g. 9min
data fetch) blocks everything else. This change submits jobs to a
ThreadPoolExecutor (default 4 workers, configurable via
HERMES_CRON_MAX_WORKERS env var).

Key changes:
- tick() splits into Phase 1 (advance next_run under lock) and Phase 2
  (parallel execution outside lock)
- os.environ session injection replaced with contextvars (thread-safe)
- load_dotenv() serialized with _dotenv_lock
- Backward compatible: HERMES_CRON_MAX_WORKERS=1 = serial behavior
- 155 lines of new tests for parallel execution
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/cron Cron scheduler and job management labels Apr 26, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of merged PR #13021 which already implements parallel cron job execution to prevent serial tick starvation.

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the contribution @trueice! This feature has already landed on main via PR #13021.

Automated hermes-sweeper review found the following evidence that this PR is superseded:

@alt-glitch also noted this duplication in their review comment. Closing as implemented.

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants