Skip to content

feat(gateway/dingtalk): download inbound pictures for vision models#8988

Closed
sputnicyoji wants to merge 5 commits into
NousResearch:mainfrom
sputnicyoji:feat/dingtalk-picture-download
Closed

feat(gateway/dingtalk): download inbound pictures for vision models#8988
sputnicyoji wants to merge 5 commits into
NousResearch:mainfrom
sputnicyoji:feat/dingtalk-picture-download

Conversation

@sputnicyoji

Copy link
Copy Markdown

Summary

Closes the last gap preventing DingTalk vision use cases: previously every picture (and inline richText image) was replaced with the literal text [图片], so vision tools had no image bytes to work with.

This PR exchanges each downloadCode for the temporary CDN URL via `/v1.0/robot/messageFiles/download`, fetches the bytes, caches them locally, and populates MessageEvent.media_urls — exactly the pattern Telegram/Slack/WhatsApp already use.

Implementation notes

  • Piggybacks on dingtalk-stream's built-in helpers: register_callback_handler wires handler.dingtalk_client = stream_client so the SDK's access-token plumbing and get_image_download_url are reachable without re-implementing token caching. Adapter keeps the handler reference on self._handler.
  • get_image_download_url is sync (requests-based) — offloaded via asyncio.to_thread to keep the event loop clean.
  • Failures at any stage (missing code, empty URL, HTTP error) degrade gracefully: the event still dispatches with the [图片] text placeholder so the agent knows something arrived but couldn't be decoded.
  • message.get_image_list() handles both picture and richText msgtypes, so inline pictures in rich messages are handled the same way.
  • On successful download the placeholder text is swapped for [图片 × N] as a short caption that accompanies the attached media list.
  • No DingTalk permission point needs to be enabled for this — the robot app's existing AppKey/AppSecret scope already covers messageFiles download for its own inbound messages.

Test plan

  • Unit tests (29 total, +5 new):
    • missing handler → empty result
    • single image end-to-end success
    • SDK returns empty URL → skipped
    • HTTP failure on first image, success on second → continues
    • _on_message picture dispatch → populates MessageEvent.media_urls and rewrites text to [图片 × 1]
  • Live-tested locally: sending a photo to the bot now reaches vision_analyze with real bytes via onehub/gemini-3-pro-latest.

Stacking

Depends on:

This branch merges #8954 inline so it applies against main directly, but reviewers should merge in order #8954#8957#8960 → this one to keep history tidy.

Yoji and others added 5 commits April 13, 2026 17:59
The adapter was written against an older dingtalk-stream release and no
longer runs on the current PyPI version:

- `DingTalkStreamClient.start()` became a coroutine; wrapping it in
  `asyncio.to_thread` raises `RuntimeWarning: coroutine ... was never
  awaited` and the stream never connects. Detect via
  `iscoroutinefunction` and await or thread-dispatch accordingly.
- `ChatbotHandler.process()` became an async callback that receives a
  `CallbackMessage` (raw envelope), not a `ChatbotMessage`. Make the
  handler async and parse `callback.data` into `ChatbotMessage` via
  `from_dict`. Drops the now-unused `run_coroutine_threadsafe` plumbing
  and `_loop` field.
- `ChatbotMessage.text` is now a `TextContent` dataclass; the old
  `str(text).strip()` path produces `<TextContent object at 0x...>`.
  Check `hasattr(text, "content")` first, keep dict/str fallbacks for
  older SDKs and custom handlers.

Tests stay green against the existing test doubles (dict-shaped `text`).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aligns the DingTalk adapter with the env-var bootstrap pattern other
platforms use, and extends inbound handling beyond the `text` msgtype.

Environment variables (gateway/config.py):
- `DINGTALK_CLIENT_ID` / `DINGTALK_CLIENT_SECRET` enable the platform
- `DINGTALK_ALLOWED_USERS` (comma-separated staff ids) populates the
  allow-list under `extra.allowed_users`
- `DINGTALK_HOME_CHANNEL` / `DINGTALK_HOME_CHANNEL_NAME` set the home
  conversation
- `get_connected_platforms` now recognizes DingTalk via `extra.client_id`

Message-type dispatch (gateway/platforms/dingtalk.py):
- DingTalk Stream delivers exactly three chatbot msgtypes: `text`,
  `picture`, `richText`. Dispatch each explicitly and render unknown
  types as a labelled placeholder rather than silently dropping.
- Picture messages map to `MessageType.PHOTO` with `[图片]` placeholder
  (the download API requires an OpenAPI scope this adapter doesn't call).
- `richText` renders each segment: plain text, `@user` mentions, and
  inline pictures as `[图片]`.
- `_extract_text` now takes an explicit `msgtype` to keep callers
  honest; falls back to `message.message_type` when omitted.

Tests updated to cover each dispatch path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ync process handler

# Conflicts:
#	gateway/platforms/dingtalk.py
Previously the adapter replaced any `picture` or inline `richText` image
with the literal string `[图片]`; vision tools therefore never saw the
actual bytes. This patch exchanges each `downloadCode` for the temporary
CDN URL via `/v1.0/robot/messageFiles/download`, fetches the bytes, and
caches them locally so `MessageEvent.media_urls` points at a real file.

Implementation notes:
- Piggyback on the dingtalk-stream SDK: `register_callback_handler`
  wires `handler.dingtalk_client = stream_client`, which gives us the
  SDK's access-token helper and `get_image_download_url`. Keep the
  handler reference on the adapter (`self._handler`).
- `get_image_download_url` is sync (requests-based) — offload via
  `asyncio.to_thread` to avoid blocking the event loop.
- Failures at any stage (missing code, empty URL, HTTP error) are
  logged but degrade gracefully: the event still dispatches with the
  `[图片]` text placeholder so the agent knows something arrived.
- `message.get_image_list()` handles both `picture` and `richText`
  shapes, so the same path supports inline images in rich messages.
- On successful download the placeholder text is replaced with
  `[图片 × N]` as a short caption alongside the attached media list.

Tests cover: missing handler, single-image success, empty downloadUrl,
partial HTTP failure (continue on remaining codes), and full end-to-end
picture dispatch through `_on_message`.

Depends on NousResearch#8954 (SDK 0.24 async process) and NousResearch#8960 (msgtype dispatch).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code review feedback on the initial picture-download implementation:

- Hand-rolled httpx.get + cache_image_from_bytes duplicated
  `cache_image_from_url` from gateway/platforms/base.py, which already
  ships retry-with-backoff, SSRF guarding, and magic-byte validation.
  Swap to the shared helper and drop the dependency on `self._http_client`.
- Sequential for-loop serialized N × (token lookup + HTTP) for richText
  messages carrying multiple inline images. Split the per-code work into
  a private `_fetch_one` and fan out via `asyncio.gather`.
- Narrow `except Exception` on the cache step to `(httpx.HTTPError,
  ValueError)` so genuine bugs surface instead of being swallowed.

Tests updated to patch the new `cache_image_from_url` async helper
instead of the previous bytes+http-client combo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alt-glitch alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists platform/dingtalk DingTalk adapter comp/gateway Gateway runner, session dispatch, delivery tool/vision Vision analysis and image generation labels Apr 27, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for this contribution! The picture-download feature you implemented here has since landed on main via a related salvage, so this PR is now superseded.

Automated hermes-sweeper review

  • _resolve_media_codes + _fetch_download_url are both present in gateway/platforms/dingtalk.py (lines 1163 and 1202 on current main)
  • MessageEvent.media_urls is populated via the same flow (line 628)
  • The implementing commit is 4459913f4 (feat(dingtalk): AI Cards streaming, emoji reactions, and media handling, merged Apr 17 2026), which explicitly notes "alibabacloud-dingtalk for media URL resolution (replaces raw HTTP that was 403-ing)"
  • That commit covers the same three surfaces changed here: dingtalk.py, gateway/config.py extras, and tests/gateway/test_dingtalk.py

The main-branch implementation uses the alibabacloud-dingtalk robot SDK async API (robot_message_file_download_with_options_async) rather than the stream-client helper proposed here — a slightly different path but functionally equivalent and verified working in production.

Closing as implemented. If you see any gaps in the current implementation compared to what you had here, feel free to open a targeted follow-up issue!

@teknium1 teknium1 closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/dingtalk DingTalk adapter tool/vision Vision analysis and image generation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants