Skip to content

[Bug]: Feishu plugin sendMessageFeishu lacks HTTP timeout, leading to permanent per-chat queue deadloc #36412

@guyil

Description

@guyil

Bug type

Regression (worked before, now fails)

Summary

The Feishu plugin's message sending logic lacks an explicit HTTP timeout. When an API call to Feishu hangs or responds slowly, the sendChain never settles, causing the per-chat queue for that specific chatId to remain in a processing state forever, blocking all subsequent messages in that thread.

Steps to reproduce
Configure the Feishu (Lark) channel.

Send a message in a group chat (e.g., "Preliminary completion of L3...").

Trigger a scenario where the Feishu API endpoint experiences a high latency or a "hanging" TCP connection during the sendMessageFeishu call.

Observe that the bot successfully sends the initial reply, but the internal sendChain remains pending.

Send any subsequent messages to the same group chat.

Expected behavior
The HTTP request should have a reasonable timeout (e.g., 10-30 seconds). If the request fails or times out, the sendChain should settle (reject/resolve), allowing the createChatQueue for that chatId to process the next message in the queue.

Actual behavior
Queue Deadlock: The handleFeishuMessage() function never returns because it is awaiting waitForIdle(), which is waiting for a sendChain that never settles.

Scope: This is a per-chat deadlock. Direct Messages (DMs) continue to work because they use different chatId queues, but the affected group chat becomes completely unresponsive.

Symptoms: Webhooks are received by the gateway, but because the queue is locked, handleFeishuMessage is never called for new messages. No deduplication records are created, and no LLM calls are triggered.

Recovery: Only a manual restart of the gateway clears the in-memory Promise queue and restores functionality.

Technical Analysis
The root cause is an implementation flaw in the Feishu Lark SDK integration:

The plugin uses axios for API calls but does not configure a timeout.

In openclaw, message processing is serialized per chatId to ensure order.

Because the axios request hangs, the Promise remains pending indefinitely.

Diagnostic Conclusion: This is an architectural "platform-level" bug where the gateway's reliability is tied to the response time of external third-party APIs.

Impact and severity
Affected Users: All users utilizing the Feishu/Lark adapter.

Severity: High. It causes a silent failure where the bot appears "alive" but ignores all messages in specific channels.

Frequency: Intermittent (dependent on network stability/Feishu API performance), but the consequence of a single failure is a permanent hang until restart.

Suggested Fix
Implement a global or instance-level timeout for the axios client within the Feishu plugin.

(Optional) Add a "Watchdog" or health check mechanism to the ChatQueue to auto-release or alert when a task stays in processing state for longer than a defined threshold (e.g., 5 minutes).

Steps to reproduce

Configure the Feishu (Lark) channel.

Send a message in a group chat (e.g., "Preliminary completion of L3...").

Trigger a scenario where the Feishu API endpoint experiences a high latency or a "hanging" TCP connection during the sendMessageFeishu call.

Observe that the bot successfully sends the initial reply, but the internal sendChain remains pending.

Send any subsequent messages to the same group chat.

Expected behavior

The HTTP request should have a reasonable timeout (e.g., 10-30 seconds). If the request fails or times out, the sendChain should settle (reject/resolve), allowing the createChatQueue for that chatId to process the next message in the queue.

Actual behavior

Queue Deadlock: The handleFeishuMessage() function never returns because it is awaiting waitForIdle(), which is waiting for a sendChain that never settles.

Scope: This is a per-chat deadlock. Direct Messages (DMs) continue to work because they use different chatId queues, but the affected group chat becomes completely unresponsive.

Symptoms: Webhooks are received by the gateway, but because the queue is locked, handleFeishuMessage is never called for new messages. No deduplication records are created, and no LLM calls are triggered.

Recovery: Only a manual restart of the gateway clears the in-memory Promise queue and restores functionality.

OpenClaw version

OpenClaw 2026.3.2 (85377a2)

Operating system

mac mini

Install method

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions