Summary
Manual/isolated cron jobs can time out before doing useful work because the cron job timeout starts before the job gets CPU on the cron lane.
Observed behavior
A cron job with sessionTarget="isolated" and payload.kind="agentTurn" can sit queued on the cron lane for several minutes and then immediately fail with:
cron: job execution timed out
- embedded run cleanup shows
aborted=true timedOut=true
- prompt phase may end in only a few milliseconds, meaning the job never really got to execute
Relevant log pattern seen in production:
lane wait exceeded: lane=cron waitedMs=599950 queueAhead=1
- then later the run starts
- then
run cleanup ... aborted=true timedOut=true
- and the overall cron lane task duration roughly matches the configured timeout budget
Root cause
In src/cron/service/timer.ts the timeout is enforced in executeJobCoreWithTimeout() by racing executeJobCore(...) against a timer immediately:
return await Promise.race([
executeJobCore(state, job, runAbortController.signal),
new Promise<never>((_, reject) => {
timeoutId = setTimeout(() => {
runAbortController.abort(timeoutErrorMessage());
reject(new Error(timeoutErrorMessage()));
}, jobTimeoutMs);
}),
]);
But executeJobCore() for isolated agent jobs may still need to wait for the shared cron lane / downstream lane acquisition before useful work begins.
So queue wait time is effectively charged against the job execution timeout.
Why this is a bug
The configured cron timeout reads like an execution timeout, but in practice it becomes a queue wait + execution timeout.
This causes false failures under contention and makes manual cron run --force behavior confusing.
Expected behavior
One of these should happen:
- timeout should start after the job actually starts executing on its effective lane, or
- queue wait and execution should have separate budgets / error messages.
At minimum, queued time should not silently consume the whole execution timeout budget.
Suggested fixes
- Start the timeout clock only after lane acquisition / actual execution start.
- Or split timeout into:
- queue wait timeout
- execution timeout
- Or preserve the current behavior but surface a distinct error such as
cron: job timed out while waiting in queue.
Notes
This was reproduced while testing an isolated daily digest cron with a 600s timeout. Increasing to 1800s works around the symptom, but does not fix the semantics.
Summary
Manual/isolated cron jobs can time out before doing useful work because the cron job timeout starts before the job gets CPU on the
cronlane.Observed behavior
A cron job with
sessionTarget="isolated"andpayload.kind="agentTurn"can sit queued on thecronlane for several minutes and then immediately fail with:cron: job execution timed outaborted=true timedOut=trueRelevant log pattern seen in production:
lane wait exceeded: lane=cron waitedMs=599950 queueAhead=1run cleanup ... aborted=true timedOut=trueRoot cause
In
src/cron/service/timer.tsthe timeout is enforced inexecuteJobCoreWithTimeout()by racingexecuteJobCore(...)against a timer immediately:But
executeJobCore()for isolated agent jobs may still need to wait for the sharedcronlane / downstream lane acquisition before useful work begins.So queue wait time is effectively charged against the job execution timeout.
Why this is a bug
The configured cron timeout reads like an execution timeout, but in practice it becomes a queue wait + execution timeout.
This causes false failures under contention and makes manual
cron run --forcebehavior confusing.Expected behavior
One of these should happen:
At minimum, queued time should not silently consume the whole execution timeout budget.
Suggested fixes
cron: job timed out while waiting in queue.Notes
This was reproduced while testing an isolated daily digest cron with a 600s timeout. Increasing to 1800s works around the symptom, but does not fix the semantics.