fix: add task-level timeout to lane queue to prevent permanent session blocking#899
Open
BingqingLyu wants to merge 2 commits into
Open
fix: add task-level timeout to lane queue to prevent permanent session blocking#899BingqingLyu wants to merge 2 commits into
BingqingLyu wants to merge 2 commits into
Conversation
…n blocking When an enqueued task's promise never settles (e.g. hung upstream API call), the lane is permanently jammed because `pump()` is never called again. Session lanes use maxConcurrent=1, so one stuck task blocks all future messages for that session with no automatic recovery — only a full gateway restart (SIGUSR1) clears the stale state. Wrap each dequeued task in `Promise.race` against a configurable timeout (default 5 minutes). When the timeout wins, reject the task with `CommandLaneTaskTimeoutError`, clean up `activeTaskIds`, log a diagnostic warning, and call `pump()` to unblock the lane. Callers can set per-task timeouts via `taskTimeoutMs` on `enqueueCommandInLane` opts. Pass `0` or `Infinity` to opt out. Closes openclaw#48488 Related: openclaw#42883, openclaw#42960, openclaw#42997, openclaw#29601 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…efault timeout - Gate timeout diag.warn on completedCurrentGeneration; stale post-reset timeouts downgrade to diag.debug to avoid misleading on-call noise - Capture activeTaskIds.size before completeTask() removal so the timeout warning reports the pre-removal active count - Increase DEFAULT_TASK_TIMEOUT_MS from 5 to 15 minutes — the lane timeout is a last-resort safety net above the agent-level timeout (default 600s / 10 min), so it must be higher to avoid killing legitimate long-running tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promise.racetimeout wrapper aroundawait entry.task()inpump()to prevent hung promises from permanently jamming session lanesCommandLaneTaskTimeoutErrorerror class for timed-out taskstaskTimeoutMsoption onenqueueCommandInLane(default: 5 minutes, pass0orInfinityto disable)lane task timed out: lane=... timeoutMs=...)timer.unref()prevents timeout timers from keeping the process aliveProblem
When an enqueued task's promise never settles (hung upstream API call, dropped WebSocket, unhandled exception),
completeTask()never runs,pump()is never called again, and the session lane is permanently blocked. Session lanes usemaxConcurrent=1, so one stuck task blocks all future messages for that session. The only recovery is a full gateway restart via SIGUSR1 (resetAllLanes()).This affects all messaging channels (WhatsApp, Telegram, Discord, webchat) and cron jobs. See openclaw#48488 for full root cause analysis with live diagnostic evidence.
Changes
src/process/command-queue.ts:CommandLaneTaskTimeoutErrorerror classDEFAULT_TASK_TIMEOUT_MSconstant (5 minutes)QueueEntrywith optionaltaskTimeoutMsfieldpump(): race each task against a timeout promise; on timeout, reject the entry, clearactiveTaskIds, log warning, and callpump()to unblockenqueueCommandInLaneopts withtaskTimeoutMssrc/process/command-queue.test.ts:taskTimeoutMs: 0), diagnostic logging, and safe interaction withresetAllLanesgeneration bumpsTest plan
pnpm buildpassespnpm test -- src/process/command-queue.test.ts— all 23 tests pass (17 existing + 6 new)pnpm check— lint/format cleanCloses openclaw#48488
Related: openclaw#42883, openclaw#42960, openclaw#42997, openclaw#29601
🤖 Generated with Claude Code