Summary
After multiple rapid gateway restarts (especially involving kill -9 or quick SIGTERM/restart cycles), DingTalk stops routing inbound messages to the gateway entirely. The documented 30–60s window does not apply here — the routing stays broken for an indeterminate period (observed: 40+ minutes with no messages received across 6+ restart cycles).
Symptoms
- Gateway logs show successful WebSocket connection (ticket registered,
✓ dingtalk connected)
- TCP connection is ESTAB (confirmed via
ss -tp)
- No
inbound message log entries despite user sending messages
- DingTalk app shows messages are sent (no error on sender side)
- Reactions and outbound sends still work (session_webhooks from earlier in the session remain valid)
Reproduction Steps
- Start gateway, confirm messages arrive
- Rapidly restart the gateway 5–10 times within a few minutes (simulating active development)
- Send a message from DingTalk — observe nothing in
~/.hermes/logs/gateway.log
Root Cause Hypothesis
DingTalk's stream routing appears to track a "preferred" connection per app credential. After many quick disconnect/reconnect cycles, the server may apply backpressure or enter a confused state where it doesn't route to any of the new connections. Unlike a single restart (which recovers in ~30–60s via keepalive timeout), multiple rapid restarts may require a longer cool-down or a DingTalk console action to reset.
The ghost-connection fix (monkey-patching open_connection to raise KeyboardInterrupt during shutdown) prevents duplicate ticket registration but does not help when routing is already stuck.
Investigation Needed
Note
This is a DingTalk platform behavior issue, not a bug in our adapter logic per se. The open_connection ghost-fix is correct and prevents the immediate ghost-connection problem on a single restart. The issue is the cumulative effect of many rapid restarts.
Summary
After multiple rapid gateway restarts (especially involving
kill -9or quick SIGTERM/restart cycles), DingTalk stops routing inbound messages to the gateway entirely. The documented 30–60s window does not apply here — the routing stays broken for an indeterminate period (observed: 40+ minutes with no messages received across 6+ restart cycles).Symptoms
✓ dingtalk connected)ss -tp)inbound messagelog entries despite user sending messagesReproduction Steps
~/.hermes/logs/gateway.logRoot Cause Hypothesis
DingTalk's stream routing appears to track a "preferred" connection per app credential. After many quick disconnect/reconnect cycles, the server may apply backpressure or enter a confused state where it doesn't route to any of the new connections. Unlike a single restart (which recovers in ~30–60s via keepalive timeout), multiple rapid restarts may require a longer cool-down or a DingTalk console action to reset.
The ghost-connection fix (monkey-patching
open_connectionto raiseKeyboardInterruptduring shutdown) prevents duplicate ticket registration but does not help when routing is already stuck.Investigation Needed
Note
This is a DingTalk platform behavior issue, not a bug in our adapter logic per se. The
open_connectionghost-fix is correct and prevents the immediate ghost-connection problem on a single restart. The issue is the cumulative effect of many rapid restarts.