fix(feishu): cap per-chat queue task wait so a single hang doesn't starve later messages (#70133)#76687
Conversation
|
Codex review: needs changes before merge. Summary Reproducibility: yes. Current main can be source-reproduced by enqueuing a never-settling first task and a second task with the same Feishu sequential key; the second remains chained behind the first, and #70133 provides matching Feishu logs. Next step before merge Security Review findings
Review detailsBest possible solution: Keep the Feishu message receive timeout, but make timeout release explicit for that caller and preserve unbounded completion semantics for existing no-option queue users. Do we have a high-confidence way to reproduce the issue? Yes. Current main can be source-reproduced by enqueuing a never-settling first task and a second task with the same Feishu sequential key; the second remains chained behind the first, and #70133 provides matching Feishu logs. Is this the best way to solve the issue? No as submitted. The timeout release guard is the right narrow direction for Feishu message receive, but the helper default should not make existing no-option callers report completion before their task actually finishes. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against d6900ee50037. |
…arve later messages Per-chat sequential queue had no timeout: if a single dispatch hung (e.g. an agent call that never resolved), every subsequent message in the same chat stayed `queued` until the gateway was restarted. Add an optional `taskTimeoutMs` (default 5 min) to `createSequentialQueue`. After the cap, the in-flight task is evicted from the blocking chain so newer same-key tasks can proceed. The original task is NOT aborted — it continues running in the background; we just stop starving the queue. A warning log surfaces the eviction with the offending key. `taskTimeoutMs: 0` restores legacy unbounded behavior. Same-chat FIFO ordering for normal-cadence messages is preserved (see openclaw#64324) — only pathologically slow tasks get evicted. Fixes openclaw#70133.
3d3b1f3 to
1a9bb64
Compare
|
Landed via rebase onto main.
Thanks @martingarramon! |
Summary
Per-chat sequential queue had no timeout: a single hung dispatch (e.g. an agent call that never resolved) kept every later message in the same Feishu chat in
queuedstate until gateway restart. See @bek91's report in #70133.This PR adds an optional
taskTimeoutMstocreateSequentialQueue(default 5 min). After the cap, the in-flight task is evicted from the blocking chain so newer same-key tasks proceed. The original task is not aborted — it continues running in the background; the queue just stops starving. A warning log surfaces the eviction with the offending key.taskTimeoutMs: 0restores legacy unbounded behavior.Same-chat FIFO ordering for normal-cadence messages is preserved (the
/btwout-of-band routing from #64324 is unchanged) — only pathologically slow tasks get evicted.Change Type
Scope
Linked Issue
Root Cause
extensions/feishu/src/sequential-queue.ts'screateSequentialQueuechains promises by key with no escape hatch. Iftasknever resolves,nextnever resolves, and every subsequentenqueue(sameKey, ...)inherits that stuck promise. No recovery short of gateway restart.What changed
sequential-queue.ts— addSequentialQueueOptions(taskTimeoutMs,onTaskTimeout); wrap each task inPromise.race([task(), timeoutPromise]).monitor.message-handler.ts— passonTaskTimeoutcallback that logs the eviction withaccountId+key+ cap.taskTimeoutMs;taskTimeoutMs: 0disables the cap (legacy behavior).What did NOT change
/btwout-of-band design).Trade-off acknowledged
If the original task eventually resolves AND a follow-up is already running, both will be in flight for the same key concurrently. Acceptable because the alternative (current behavior) is indefinite hang requiring restart. For strict single-flight semantics even on hang, set `taskTimeoutMs: 0`.
Tests
```
$ pnpm vitest run extensions/feishu/src/sequential-queue.test.ts
Test Files 1 passed (1)
Tests 5 passed (5)
Duration 399ms
```
Related
#54409 (Feishu collect-mode bypass) is a different design call — deliberately scoped out. Happy to open a discussion with @bek91 + @ngutman if there's interest.