Skip to content

fix(dingtalk): adapt message handler to dingtalk-stream SDK CallbackMessage format#9609

Closed
audanye-sudo wants to merge 1 commit into
NousResearch:mainfrom
audanye-sudo:fix/dingtalk-stream-adapter
Closed

fix(dingtalk): adapt message handler to dingtalk-stream SDK CallbackMessage format#9609
audanye-sudo wants to merge 1 commit into
NousResearch:mainfrom
audanye-sudo:fix/dingtalk-stream-adapter

Conversation

@audanye-sudo

Copy link
Copy Markdown

Summary

The DingTalk platform adapter fails to process any incoming messages in stream mode because it reads business fields (msgId, senderId, sessionWebhook, text, etc.) from the wrong layer of the SDK's message object. This PR fixes the data extraction to match the actual dingtalk-stream SDK wire format.

Root Cause

The dingtalk-stream Python SDK delivers incoming messages as CallbackMessage objects. The actual business payload (sender info, message text, conversation context, session webhook) is nested inside message.data (a dict/JSON string), not exposed as top-level attributes.

The existing code used getattr(message, "sender_id", ""), getattr(message, "text", ""), etc. — all of which silently fell through to empty defaults. The result:

Symptom Cause
All messages logged as "Empty message, skipping" _extract_text() read message.text → always empty
Reply webhooks never stored message.session_webhook → always empty
Sender info all blank message.sender_id, message.sender_nick → defaults

Additionally, DingTalkStreamClient.start() is a native async coroutine in the current SDK, but the code wrapped it in asyncio.to_thread() as if it were blocking — this caused event loop conflicts and the _CallbackHandler.process() had to bridge between threads unnecessarily.

Fix

Data extraction — Read all business fields from message.data dict instead of top-level attributes:

# Before (broken)
msg_id = getattr(message, "message_id", None)
text = getattr(message, "text", None)
session_webhook = getattr(message, "session_webhook", None)

# After (correct)
data = getattr(message, "data", None) or {}
msg_id = data.get("msgId")
text = data.get("text", "")
session_webhook = data.get("sessionWebhook", "")

Async handling — Removed thread-bridging (asyncio.to_thread, run_coroutine_threadsafe) since start() is already async:

# Before
await asyncio.to_thread(self._stream_client.start)
# _CallbackHandler.process() used run_coroutine_threadsafe

# After
await self._stream_client.start()
# _CallbackHandler.process() is async, awaits directly

Diagnostic logging — Added session_webhook logging to aid future debugging of reply routing issues.

Changed Methods

Method Change
_on_message() Extract fields from message.data dict instead of top-level attrs
_extract_text()_extract_text_from_data() Renamed; takes data: Dict instead of ChatbotMessage
_run_stream() await self._stream_client.start() directly (no to_thread)
_CallbackHandler.process() Now async def; awaits _on_message() directly

Commits

Commit Scope
fix(dingtalk): adapt message handler to dingtalk-stream SDK CallbackMessage format gateway/platforms/dingtalk.py

Test Plan

  • Verified CallbackMessage.data dict contains all expected fields (msgId, senderId, senderNick, text, sessionWebhook, conversationId, conversationType, conversationTitle, createAt, senderStaffId, richText)
  • Tested 1:1 DM → bot receives text, extracts sender info, replies successfully
  • Tested group @mention → bot processes mention, replies via session webhook
  • Verified rich text fallback (richText field) still works
  • Verified deduplication by msgId still functions correctly
  • Confirmed _stream_client.start() runs as native coroutine without event loop conflicts
  • Checked diagnostic logging output shows correct session_webhook and chat_id values

Risk Assessment

Medium risk — changes the core message processing path for the DingTalk adapter. However:

  • The fix aligns the code with the actual SDK wire format (the current code is completely non-functional in stream mode)
  • All field names match the DingTalk Robot callback documentation
  • The async refactoring removes a layer of complexity (thread bridging) rather than adding it
  • Backward compatibility: data extraction falls back to {} if the attribute is missing, and getattr fallbacks remain for message_id

Recommendation: This PR should be merged together with the webhook domain fix PR (which adds oapi.dingtalk.com to the allowed webhook regex), as both are required for end-to-end DingTalk stream functionality.

…essage format

The dingtalk-stream SDK passes CallbackMessage objects whose business fields
(msgId, senderId, sessionWebhook, etc.) live inside message.data dict, not as
top-level attributes. The existing code used getattr() on the top-level object,
which always fell through to defaults — causing empty text, missing sender info,
and broken reply routing.

Key changes:
- Extract message.data dict from CallbackMessage and parse fields from it
- Rename _extract_text() to _extract_text_from_data() to work with data dict
- Make _on_message handler and _CallbackHandler.process() async — the SDK's
  start() is a native coroutine, so no thread bridging is needed
- Add diagnostic logging for session webhook routing
@meng93

meng93 commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

nice change

@audanye-sudo

Copy link
Copy Markdown
Author

Some context on this one — this is the second of three DingTalk PRs (#9608, this one, #9610).

Why this matters

As I mentioned in #9608, DingTalk is the largest enterprise messaging platform in China (700M+ users, think "Chinese Slack/Teams"). The current DingTalk adapter in Hermes has a critical bug that makes it completely non-functional in stream mode — which is the primary way DingTalk bots work.

The core problem

The dingtalk-stream Python SDK delivers messages as CallbackMessage objects, but the business payload (who sent it, what they said, where to reply) lives inside message.data as a dict. The current code does getattr(message, "sender_id", "") on the top-level object — which always returns empty strings.

The result? Every single incoming message gets logged as "Empty message, skipping" and silently dropped. From the user's perspective: the bot is online, they send a message, and... nothing. No error, no response, just silence.

What this PR does

  • Reads all fields from message.data dict instead of top-level attributes
  • Fixes the async handling — DingTalkStreamClient.start() is actually a native coroutine, so we don't need the asyncio.to_thread() wrapper and the cross-thread run_coroutine_threadsafe dance. That whole thread-bridging layer was adding complexity for no reason.
  • Adds some diagnostic logging so when something goes wrong with webhook routing, you can actually see what's happening

Together with #9608 (webhook domain fix), these two PRs take DingTalk from "completely broken" to "actually works." They're independent commits but both are needed for a working end-to-end experience.

@RuckVibeCodes RuckVibeCodes left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[gus-first-pass] fix(dingtalk): adapt message handler to dingtalk-stream SDK CallbackMessage format - Clear fix, no issues found.

@Jiangxuejian Jiangxuejian left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works on my WSL2. (Hermes Agent v0.9.0 (2026.4.13))

@teknium1

Copy link
Copy Markdown
Contributor

Closing — CallbackMessage → ChatbotMessage conversion and the broader SDK-compat rewrite are now on main as of #11471 (#11471). Thanks for the investigation!

@teknium1 teknium1 closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants