fix(cron): release tick lock before job execution to prevent scheduler starvation#38624
Open
liuhao1024 wants to merge 2 commits into
Open
fix(cron): release tick lock before job execution to prevent scheduler starvation#38624liuhao1024 wants to merge 2 commits into
liuhao1024 wants to merge 2 commits into
Conversation
…r starvation The tick() function held a LOCK_EX file lock for the entire duration of all job executions, not just the scheduling decision. When a cron job ran a long-running task (e.g. an Opus delegation lasting 2-4 min), every subsequent 60s tick attempt would hit the lock, skip, and return 0. Combined with the grace-window logic in compute_next_run, missed runs were silently dropped with no error logged. Fix: release the file lock immediately after advance_next_run() completes. The at-most-once semantics are preserved because next_run_at is already advanced before any job starts executing. Fixes NousResearch#27485
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Releases the tick file lock before executing cron jobs, preventing scheduler starvation when jobs run long. Previously,
tick()heldLOCK_EXfor the entire duration of all job executions — not just the scheduling decision — causing every subsequent 60s tick to skip and miss scheduled runs.Related Issue
Fixes #27485
Type of Change
Changes Made
cron/scheduler.py: Restructuredtick()to release the file lock immediately afteradvance_next_run()completes. The lock is now held only during the critical section (get_due_jobs+advance_next_run), not during job execution. At-most-once semantics are preserved becausenext_run_atis advanced before any job starts.How to Test
pytest tests/cron/ -q— all existing tests should pass (1 pre-existing failure intest_all_token_case_insensitiveis unrelated)Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/cron/ -qand all tests pass (1 pre-existing failure unrelated to this change)Documentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/ACode Intelligence
cron/scheduler.py::tick()(callers: gateway ticker thread, standalone daemon, manualhermes cron run)advance_next_run()already setsnext_run_atbefore execution, preserving at-most-once semantics;mark_job_run()uses its own_jobs_file_lockfor thread safety