fix(cron): start isolated job timeout after queue wait#41796
fix(cron): start isolated job timeout after queue wait#41796ayanesakura wants to merge 4 commits intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4b80a7769
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Greptile SummaryThis PR fixes a legitimate bug where isolated cron job timeout timers started too early (during queue wait on the shared cron lane), causing jobs to appear to time out immediately upon receiving CPU. The core fix — moving the Key concerns:
Confidence Score: 2/5
Last reviewed commit: d4b80a7 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e8c0e36fb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed the timeout review follow-up in 50dfd02. Main-session due jobs keep the outer timeout, isolated execution timeout now starts when execution actually begins, timer-driven isolated runs keep a separate hard guard, and cron timer delays are clamped to the Node timer max. |
|
Tested locally on the PR head:
This looks like the right fix for the cron-lane queue-wait timeout semantics (isolated timeout starts after execution begins), and the added regression coverage is solid. |
Summary
Fix isolated cron job timeout semantics so queue wait on the shared
cronlane does not consume the entire execution timeout budget.Closes #41783.
Root cause
Before this change,
executeJobCoreWithTimeout()started the cron job timeout timer before the isolated agent job actually got CPU on the downstream cron lane. Under contention, a job could sit queued for minutes and then immediately fail withcron: job execution timed outonce execution finally began.What changed
onTimerexecuteJobCore()runIsolatedAgentJob(...)AbortSignal.any(...)cron: job execution timed outValidation
Ran targeted tests locally:
pnpm vitest run src/cron/service.issue-regressions.test.ts -t "suppresses isolated follow-up side effects after timeout"pnpm vitest run src/cron/service.issue-regressions.test.ts src/cron/service/timeout-policy.test.tsBoth passed.