-
-
Notifications
You must be signed in to change notification settings - Fork 54.4k
Description
Summary
When the gateway restarts in-process (SIGUSR1, triggered by config changes, model switches, or /new), the inboundDedupeCache is re-initialized as an empty Map. The Feishu/Lark SDK's WebSocket client then reconnects and replays recent events. Since the cache is empty, previously-processed messages pass shouldSkipDuplicateInbound() and get dispatched to the agent again — resulting in duplicate replies to the user.
Reproduction:
Send a few messages via Feishu DM
Trigger a gateway restart (e.g. config.patch to change model)
Observe: one or more of the recently-sent messages get re-dispatched and replied to again
Log evidence:
02:59:09.082 received message from ou_xxx (p2p)
02:59:09.087 dispatching to agent
02:59:09.095 received message from ou_xxx (p2p) ← same event, 13ms later
02:59:09.098 dispatching to agent
02:59:09.100 dispatch complete (replies=0) ← second copy gets no reply (race)
02:59:20.156 dispatch complete (replies=1) ← first copy gets replied
In other cases the duplicate does get a full reply, causing the user to see a response to a message they didn't just send.
Root cause:
inboundDedupeCache in src/auto-reply/reply/inbound-dedupe.ts is a pure in-memory Map (via createDedupeCache). It does not survive process restarts. The Lark SDK WSClient reconnects after restart and re-delivers recent events (standard at-least-once semantics). With an empty cache, all replayed events are treated as new.
Suggested fix (any of):
Persist the dedup cache to SQLite (already available in the codebase) and restore on startup
On SIGUSR1 in-process restart, preserve the inboundDedupeCache instance across the reload cycle instead of re-initializing
Add a startup grace period: after Feishu WS reconnects, ignore events with timestamps older than N seconds before the restart
Impact: Affects any channel using WebSocket with at-least-once delivery (Feishu confirmed; potentially others). Users see "ghost replies" to messages they didn't just send, eroding trust in the system.
Environment: OpenClaw 2026.2.9, Feishu channel (websocket mode), macOS