fix(cron): release tick lock before job execution#43652
Conversation
|
Duplicate of #21901 -- both narrow the cron tick() file lock to the scheduling critical section (get_due_jobs + advance_next_run) and release it before job execution, fixing tick starvation while a long job holds the lock. Saturated cluster; also overlaps open #38624 and #27492. #21901 is the earliest open canonical. |
Verification: Lock release before job execution — clean concurrency fixThe change correctly moves Key correctness points:
The fix directly addresses the scenario where a long-running cron job (e.g., a contribution cron with 10-minute git operations) blocks all subsequent gateway ticks, causing newly-due jobs to stay stuck in the past. |
Summary\n- release the cross-process cron tick lock after due-job selection and pre-run advancement instead of holding it through job execution\n- keep the existing at-most-once recurring-job advancement semantics\n- add a regression test proving a second tick can run a newly-due job while a previous long-running cron job is still executing\n\n## Test plan\n- python -m pytest tests/cron -q -o 'addopts='\n\n## Context\nObserved in gateway: a smoke-test cron job stayed scheduled with next_run_at in the past and last_run_at=null while another long-running cron job held ~/.hermes/cron/.tick.lock.