Skip to content

fix(feishu): cap per-chat queue task wait so a single hang doesn't starve later messages (#70133)#76687

Merged
steipete merged 3 commits into
openclaw:mainfrom
martingarramon:fix/feishu-sequential-queue-task-timeout
May 3, 2026
Merged

fix(feishu): cap per-chat queue task wait so a single hang doesn't starve later messages (#70133)#76687
steipete merged 3 commits into
openclaw:mainfrom
martingarramon:fix/feishu-sequential-queue-task-timeout

Conversation

@martingarramon

Copy link
Copy Markdown
Contributor

Summary

Per-chat sequential queue had no timeout: a single hung dispatch (e.g. an agent call that never resolved) kept every later message in the same Feishu chat in queued state until gateway restart. See @bek91's report in #70133.

This PR adds an optional taskTimeoutMs to createSequentialQueue (default 5 min). After the cap, the in-flight task is evicted from the blocking chain so newer same-key tasks proceed. The original task is not aborted — it continues running in the background; the queue just stops starving. A warning log surfaces the eviction with the offending key.

taskTimeoutMs: 0 restores legacy unbounded behavior.

Same-chat FIFO ordering for normal-cadence messages is preserved (the /btw out-of-band routing from #64324 is unchanged) — only pathologically slow tasks get evicted.

Change Type

  • Bug fix

Scope

  • Integrations

Linked Issue

Root Cause

extensions/feishu/src/sequential-queue.ts's createSequentialQueue chains promises by key with no escape hatch. If task never resolves, next never resolves, and every subsequent enqueue(sameKey, ...) inherits that stuck promise. No recovery short of gateway restart.

What changed

  1. sequential-queue.ts — add SequentialQueueOptions (taskTimeoutMs, onTaskTimeout); wrap each task in Promise.race([task(), timeoutPromise]).
  2. monitor.message-handler.ts — pass onTaskTimeout callback that logs the eviction with accountId + key + cap.
  3. Tests — 2 new cases: stuck task evicted after taskTimeoutMs; taskTimeoutMs: 0 disables the cap (legacy behavior).
  4. CHANGELOG entry under `## Unreleased / Fixes`.

What did NOT change

Trade-off acknowledged

If the original task eventually resolves AND a follow-up is already running, both will be in flight for the same key concurrently. Acceptable because the alternative (current behavior) is indefinite hang requiring restart. For strict single-flight semantics even on hang, set `taskTimeoutMs: 0`.

Tests

```
$ pnpm vitest run extensions/feishu/src/sequential-queue.test.ts
Test Files 1 passed (1)
Tests 5 passed (5)
Duration 399ms
```

Related

#54409 (Feishu collect-mode bypass) is a different design call — deliberately scoped out. Happy to open a discussion with @bek91 + @ngutman if there's interest.

@openclaw-barnacle openclaw-barnacle Bot added channel: feishu Channel integration: feishu size: S labels May 3, 2026
@clawsweeper

clawsweeper Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

Summary
The PR adds a default 5-minute timeout and timeout logging to Feishu's sequential queue, wires Feishu message receive logging, adds queue regression tests, and adds a changelog entry.

Reproducibility: yes. Current main can be source-reproduced by enqueuing a never-settling first task and a second task with the same Feishu sequential key; the second remains chained behind the first, and #70133 provides matching Feishu logs.

Next step before merge
A narrow automated repair can update the PR branch by making the timeout explicit for message receive while preserving legacy completion semantics for existing queue callers.

Security
Cleared: No concrete security or supply-chain issue found; the diff touches Feishu TypeScript, tests, and changelog only.

Review findings

  • [P2] Make the timeout opt-in for existing queue callers — extensions/feishu/src/sequential-queue.ts:41
Review details

Best possible solution:

Keep the Feishu message receive timeout, but make timeout release explicit for that caller and preserve unbounded completion semantics for existing no-option queue users.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main can be source-reproduced by enqueuing a never-settling first task and a second task with the same Feishu sequential key; the second remains chained behind the first, and #70133 provides matching Feishu logs.

Is this the best way to solve the issue?

No as submitted. The timeout release guard is the right narrow direction for Feishu message receive, but the helper default should not make existing no-option callers report completion before their task actually finishes.

Full review comments:

  • [P2] Make the timeout opt-in for existing queue callers — extensions/feishu/src/sequential-queue.ts:41
    createFeishuDriveCommentNoticeHandler still calls createSequentialQueue() with no options and records the synthetic event as processed immediately after await enqueue(...). Since this line now gives no-option queues a 5-minute timeout, a slow handleFeishuCommentEvent can time out, mark the event processed, and hide a later failure from replay. Preserve the timeout for Feishu message receive, but keep legacy unbounded completion for existing no-option callers or opt those callers out explicitly.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm test extensions/feishu/src/sequential-queue.test.ts extensions/feishu/src/sequential-key.test.ts
  • pnpm exec oxfmt --check --threads=1 extensions/feishu/src/sequential-queue.ts extensions/feishu/src/sequential-queue.test.ts extensions/feishu/src/monitor.message-handler.ts extensions/feishu/src/monitor.comment-notice-handler.ts CHANGELOG.md
  • pnpm check:changed in Testbox before handoff

What I checked:

Likely related people:

  • ngutman: Merged fix(feishu): route /btw through out-of-band lanes #64324 introduced the Feishu sequential-key serializer and documented the normal same-chat FIFO versus out-of-band lane boundary that this PR modifies. (role: introduced behavior; confidence: high; commits: 4b4ec4dbc298, 3b5b3c2ba35c, 9a9a99815430; files: extensions/feishu/src/sequential-queue.ts, extensions/feishu/src/sequential-key.ts, extensions/feishu/src/sequential-queue.test.ts)
  • vincentkoc: Recent Feishu queue/comment replay work touched sequential-queue.ts and its tests, which are central to the remaining completion-semantics blocker. (role: recent adjacent maintainer; confidence: medium; commits: 6d38bd476893; files: extensions/feishu/src/sequential-queue.ts, extensions/feishu/src/sequential-queue.test.ts, extensions/feishu/src/monitor.account.ts)
  • steipete: Recent Feishu handler maintenance touched message and comment notice surfaces, and the latest PR branch updates were force-pushed by this maintainer. (role: recent maintainer; confidence: medium; commits: 68b9ad4205aa, bb5e278f63bd, 164f0feddff5; files: extensions/feishu/src/monitor.message-handler.ts, extensions/feishu/src/monitor.comment-notice-handler.ts)

Remaining risk / open question:

  • No live Feishu account validation was run in this read-only review; the verdict is based on current source, the PR head, and the linked report logs.

Codex review notes: model gpt-5.5, reasoning high; reviewed against d6900ee50037.

martingarramon and others added 3 commits May 3, 2026 15:20
…arve later messages

Per-chat sequential queue had no timeout: if a single dispatch hung
(e.g. an agent call that never resolved), every subsequent message in
the same chat stayed `queued` until the gateway was restarted.

Add an optional `taskTimeoutMs` (default 5 min) to `createSequentialQueue`.
After the cap, the in-flight task is evicted from the blocking chain so
newer same-key tasks can proceed. The original task is NOT aborted —
it continues running in the background; we just stop starving the queue.
A warning log surfaces the eviction with the offending key.

`taskTimeoutMs: 0` restores legacy unbounded behavior.

Same-chat FIFO ordering for normal-cadence messages is preserved
(see openclaw#64324) — only pathologically slow tasks get evicted.

Fixes openclaw#70133.
@steipete steipete force-pushed the fix/feishu-sequential-queue-task-timeout branch from 3d3b1f3 to 1a9bb64 Compare May 3, 2026 14:21
@steipete steipete merged commit ffdc76b into openclaw:main May 3, 2026
176 of 179 checks passed
@steipete

steipete commented May 3, 2026

Copy link
Copy Markdown
Contributor

Landed via rebase onto main.

  • Gate: GitHub CI green after rerun; local pnpm test extensions/feishu/src/sequential-queue.test.ts extensions/feishu/src/sequential-key.test.ts, pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/feishu/src/sequential-queue.ts extensions/feishu/src/sequential-queue.test.ts extensions/feishu/src/monitor.message-handler.ts, git diff --check
  • Land commit: 1a9bb64
  • Merge commit: ffdc76b

Thanks @martingarramon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Feishu per-chat queue can hang indefinitely, leaving later P2P messages queued

2 participants