Skip to content

fix(cron): run due jobs in parallel to prevent serial tick starvation#9169

Closed
Venomoth1 wants to merge 1 commit into
NousResearch:mainfrom
Venomoth1:fix/cron-parallel-tick-execution
Closed

fix(cron): run due jobs in parallel to prevent serial tick starvation#9169
Venomoth1 wants to merge 1 commit into
NousResearch:mainfrom
Venomoth1:fix/cron-parallel-tick-execution

Conversation

@Venomoth1

Copy link
Copy Markdown

What does this PR do?

The serial job loop inside tick() caused all jobs after a long-running
one to be silently delayed or skipped. If any job ran longer than the 60s
gateway tick interval, the next tick would fail to acquire the file lock
and return 0 with no warning — jobs that became due in the meantime were
never executed.

This replaces the serial for loop with a ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others.

Related Issue

Fixes #9086

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • cron/scheduler.py: replaced serial for job in due_jobs loop in tick() with
    ThreadPoolExecutor.map(_process_job, due_jobs)

  • Max parallelism configurable via HERMES_CRON_MAX_PARALLEL env var or cron.max_parallel_jobs in config.yaml

  • Set max_parallel_jobs: 1 to restore old serial behaviour if needed

How to Test

  1. Create two cron jobs — one with a long-running prompt (e.g. sleep 120), one short
  2. Set both to the same schedule so they fire together in one tick
  3. Confirm both jobs complete in the same tick instead of the short one waiting behind the long one

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform:

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

Screenshots / Logs

hermes-agent-dhabibi pushed a commit to hermes-agent-dhabibi/hermes-agent that referenced this pull request Apr 14, 2026
… (upstream PR NousResearch#9169, indentation fix)

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
hermes-agent-dhabibi pushed a commit to hermes-agent-dhabibi/hermes-agent that referenced this pull request Apr 16, 2026
… (upstream PR NousResearch#9169, indentation fix)

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
teknium1 added a commit that referenced this pull request Apr 20, 2026
Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue #9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR #9169 by @VenomMoth1.

Fixes #9086
teknium1 added a commit that referenced this pull request Apr 20, 2026
…#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue #9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR #9169 by @VenomMoth1.

Fixes #9086
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #13021 (#13021). Your approach was correct — parallel ThreadPoolExecutor in tick(). The salvage added thread-safety fixes for os.environ races (migrated to ContextVars) and jobs.json concurrent write protection (threading.Lock). Thanks for the contribution @VenomMoth1!

@teknium1 teknium1 closed this Apr 20, 2026
brucephaner pushed a commit to brucephaner/xiashou-agent that referenced this pull request Apr 25, 2026
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086

(cherry picked from commit c869150)
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086
Luminet2023 pushed a commit to Luminet2023/hermes-agent that referenced this pull request May 1, 2026
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

# [Bug]: Serial cron tick execution causes scheduled jobs to be silently skipped when preceding jobs run long

2 participants