-
-
Notifications
You must be signed in to change notification settings - Fork 55.9k
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't workingstaleMarked as stale due to inactivityMarked as stale due to inactivity
Description
Bug Description
After an LLM request times out (180s), the session lane can get stuck indefinitely. Subsequent messages only enqueue but never get processed (dequeue). This requires a gateway restart to recover.
Steps to Reproduce
- Send a message that triggers a long-running LLM request
- Wait for the request to timeout (180s) with
FailoverError: LLM request timed out. - Immediately after the timeout, if certain hooks are running (e.g.,
/newcommand with slug generation), the next task in the session lane starts but never completes - All subsequent messages pile up in the queue indefinitely
Log Evidence
12:31:14 lane task error: lane=session:qq:dm:... error="FailoverError: LLM request timed out."
12:31:14 lane enqueue: lane=session:qq:dm:... queueSize=1
12:31:14 lane dequeue: lane=session:qq:dm:... queueSize=0
# ^^^ This dequeued task NEVER completes - no "lane task done" or "lane task error"
12:37:14 lane enqueue: lane=session:qq:dm:... queueSize=2
12:38:21 lane enqueue: lane=session:qq:dm:... queueSize=3
12:38:32 lane enqueue: lane=session:qq:dm:... queueSize=4
12:42:32 lane enqueue: lane=session:qq:dm:... queueSize=5
# Queue keeps growing, no dequeue ever happens again
Meanwhile, other lanes (cron, etc.) continue working normally, showing this is a per-session deadlock, not a global hang.
Root Cause Analysis
The nested queue pattern in run.js:
return enqueueSession(() => enqueueGlobal(async () => { ... }));If the inner task (after session lane dequeue) encounters an unhandled exception or a Promise that never resolves, the session lane's active count is never decremented, blocking all subsequent messages.
Environment
- Version: 2026.1.24-3
- Channel: QQ (custom plugin)
- OS: macOS
Workaround
Added a 5-minute timeout to command-queue.js that forcibly releases the lane if a task doesn't complete:
const LANE_TASK_TIMEOUT_MS = 5 * 60 * 1000;
// ... timeout logic that calls state.active -= 1 and pump() after timeoutSuggested Fix
- Add a built-in timeout for lane tasks (configurable)
- Investigate why certain message processing chains (especially
/newcommand hooks) can leave tasks in a hung state - Consider adding a lane health check that detects and recovers from stuck lanes
Labels: bug, reliability
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleMarked as stale due to inactivityMarked as stale due to inactivity