Summary
Long-running cron jobs currently run inline inside cron.scheduler.tick(), while the scheduler lock is still held. Because the gateway cron ticker calls tick() from a single background thread every 60 seconds, one slow job can block the entire scheduler loop and delay unrelated due jobs.
Code path
gateway/run.py _start_cron_ticker() calls cron_tick(verbose=False) in a single loop
cron/scheduler.py tick() acquires .tick.lock, then iterates due_jobs and calls run_job(job) inline before releasing the lock
run_job() executes a full AIAgent.run_conversation(prompt) synchronously
Why this is a problem
This means cron execution is effectively serialized under the scheduler lock for the full runtime of each job. If one cron job takes several minutes:
- later due jobs in the same tick are delayed behind it
- the next 60-second gateway tick cannot make progress while the previous tick is still running
- a second process trying to tick will skip entirely because
.tick.lock is still held
In practice, this can turn one slow monitoring/reporting job into global scheduler starvation.
Suggested direction
Keep the lock scoped to due-job selection / scheduling state updates, then run jobs outside that global tick lock. Another option is to move execution to per-job workers with job-level in-flight tracking so a slow job does not block unrelated jobs.
Relevant references
cron/scheduler.py: tick() and run_job()
gateway/run.py: _start_cron_ticker()
Summary
Long-running cron jobs currently run inline inside
cron.scheduler.tick(), while the scheduler lock is still held. Because the gateway cron ticker callstick()from a single background thread every 60 seconds, one slow job can block the entire scheduler loop and delay unrelated due jobs.Code path
gateway/run.py_start_cron_ticker()callscron_tick(verbose=False)in a single loopcron/scheduler.pytick()acquires.tick.lock, then iteratesdue_jobsand callsrun_job(job)inline before releasing the lockrun_job()executes a fullAIAgent.run_conversation(prompt)synchronouslyWhy this is a problem
This means cron execution is effectively serialized under the scheduler lock for the full runtime of each job. If one cron job takes several minutes:
.tick.lockis still heldIn practice, this can turn one slow monitoring/reporting job into global scheduler starvation.
Suggested direction
Keep the lock scoped to due-job selection / scheduling state updates, then run jobs outside that global tick lock. Another option is to move execution to per-job workers with job-level in-flight tracking so a slow job does not block unrelated jobs.
Relevant references
cron/scheduler.py:tick()andrun_job()gateway/run.py:_start_cron_ticker()