Skip to content

[DingTalk] Add file/image sending support and fix 5 bugs in DingTalk adapter #9149

@chenyf1010

Description

@chenyf1010

Feature Description

Add file and image sending support to the DingTalk platform adapter (gateway/platforms/dingtalk.py).

Currently the adapter only supports sending markdown text replies via the session webhook. It should also support sending:

  • Images (jpg, png, gif, bmp — up to 20MB)
  • Files (doc, docx, xls, xlsx, ppt, pptx, zip, pdf, rar — up to 20MB)
  • Voice messages (amr, mp3, wav — up to 2MB)

Motivation

  • Users want to receive generated documents, images, and other files from the agent through DingTalk
  • The DingTalk API already supports these message types via session webhook
  • The media upload permission (/media/upload) is default-enabled for enterprise internal apps — no additional permission needed
  • Other platforms (Discord, Telegram) already support file sending

Proposed Solution

  1. Upload flow: Use DingTalk's /media/upload API to upload files and obtain media_id
  2. Send flow: Send image/file/voice message types via session webhook (same endpoint, different msgtype)
  3. Integration: Hook into the existing send() method or add a send_file() method

Session webhook already accepts these payloads:

// Image
{"msgtype": "image", "image": {"media_id": "@xxx"}}

// File  
{"msgtype": "file", "file": {"media_id": "@xxx"}}

Bugs Found During Setup

Also found 5 bugs in the current DingTalk adapter:

Bug 1: start() should be start_forever()

asyncio.to_thread(self._stream_client.start) only gets a coroutine without executing it. The stream client exits immediately. Should use start_forever().

Bug 2: process() must be async

The SDK's ChatbotHandler.process() is an async method. The override must also be async, otherwise raw_process fails with object tuple can't be used in 'await' expression.

Bug 3: _extract_text() wrong type handling

message.text is a TextContent object (has .content attr), not a dict or str. Current code produces garbled output like TextContent(content=hello) instead of extracting hello.

Bug 4: CallbackMessage vs ChatbotMessage

The SDK passes CallbackMessage to process(), not ChatbotMessage. Need to extract from CallbackMessage.data via ChatbotMessage.from_dict(data).

Bug 5: Webhook URL regex too strict

_DINGTALK_WEBHOOK_RE only matches api.dingtalk.com, but session webhooks use oapi.dingtalk.com. Fix: r'^https://(api|oapi)\.dingtalk\.com/'

Environment

  • Hermes Agent (latest)
  • DingTalk Stream Mode (dingtalk-stream SDK 0.24.3)
  • Docker on Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions