feat(gateway/dingtalk): download inbound pictures for vision models#8988
Closed
sputnicyoji wants to merge 5 commits into
Closed
feat(gateway/dingtalk): download inbound pictures for vision models#8988sputnicyoji wants to merge 5 commits into
sputnicyoji wants to merge 5 commits into
Conversation
The adapter was written against an older dingtalk-stream release and no longer runs on the current PyPI version: - `DingTalkStreamClient.start()` became a coroutine; wrapping it in `asyncio.to_thread` raises `RuntimeWarning: coroutine ... was never awaited` and the stream never connects. Detect via `iscoroutinefunction` and await or thread-dispatch accordingly. - `ChatbotHandler.process()` became an async callback that receives a `CallbackMessage` (raw envelope), not a `ChatbotMessage`. Make the handler async and parse `callback.data` into `ChatbotMessage` via `from_dict`. Drops the now-unused `run_coroutine_threadsafe` plumbing and `_loop` field. - `ChatbotMessage.text` is now a `TextContent` dataclass; the old `str(text).strip()` path produces `<TextContent object at 0x...>`. Check `hasattr(text, "content")` first, keep dict/str fallbacks for older SDKs and custom handlers. Tests stay green against the existing test doubles (dict-shaped `text`). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aligns the DingTalk adapter with the env-var bootstrap pattern other platforms use, and extends inbound handling beyond the `text` msgtype. Environment variables (gateway/config.py): - `DINGTALK_CLIENT_ID` / `DINGTALK_CLIENT_SECRET` enable the platform - `DINGTALK_ALLOWED_USERS` (comma-separated staff ids) populates the allow-list under `extra.allowed_users` - `DINGTALK_HOME_CHANNEL` / `DINGTALK_HOME_CHANNEL_NAME` set the home conversation - `get_connected_platforms` now recognizes DingTalk via `extra.client_id` Message-type dispatch (gateway/platforms/dingtalk.py): - DingTalk Stream delivers exactly three chatbot msgtypes: `text`, `picture`, `richText`. Dispatch each explicitly and render unknown types as a labelled placeholder rather than silently dropping. - Picture messages map to `MessageType.PHOTO` with `[图片]` placeholder (the download API requires an OpenAPI scope this adapter doesn't call). - `richText` renders each segment: plain text, `@user` mentions, and inline pictures as `[图片]`. - `_extract_text` now takes an explicit `msgtype` to keep callers honest; falls back to `message.message_type` when omitted. Tests updated to cover each dispatch path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ync process handler # Conflicts: # gateway/platforms/dingtalk.py
Previously the adapter replaced any `picture` or inline `richText` image with the literal string `[图片]`; vision tools therefore never saw the actual bytes. This patch exchanges each `downloadCode` for the temporary CDN URL via `/v1.0/robot/messageFiles/download`, fetches the bytes, and caches them locally so `MessageEvent.media_urls` points at a real file. Implementation notes: - Piggyback on the dingtalk-stream SDK: `register_callback_handler` wires `handler.dingtalk_client = stream_client`, which gives us the SDK's access-token helper and `get_image_download_url`. Keep the handler reference on the adapter (`self._handler`). - `get_image_download_url` is sync (requests-based) — offload via `asyncio.to_thread` to avoid blocking the event loop. - Failures at any stage (missing code, empty URL, HTTP error) are logged but degrade gracefully: the event still dispatches with the `[图片]` text placeholder so the agent knows something arrived. - `message.get_image_list()` handles both `picture` and `richText` shapes, so the same path supports inline images in rich messages. - On successful download the placeholder text is replaced with `[图片 × N]` as a short caption alongside the attached media list. Tests cover: missing handler, single-image success, empty downloadUrl, partial HTTP failure (continue on remaining codes), and full end-to-end picture dispatch through `_on_message`. Depends on NousResearch#8954 (SDK 0.24 async process) and NousResearch#8960 (msgtype dispatch). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code review feedback on the initial picture-download implementation: - Hand-rolled httpx.get + cache_image_from_bytes duplicated `cache_image_from_url` from gateway/platforms/base.py, which already ships retry-with-backoff, SSRF guarding, and magic-byte validation. Swap to the shared helper and drop the dependency on `self._http_client`. - Sequential for-loop serialized N × (token lookup + HTTP) for richText messages carrying multiple inline images. Split the per-code work into a private `_fetch_one` and fan out via `asyncio.gather`. - Narrow `except Exception` on the cache step to `(httpx.HTTPError, ValueError)` so genuine bugs surface instead of being swallowed. Tests updated to patch the new `cache_image_from_url` async helper instead of the previous bytes+http-client combo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tasks
19 tasks
Contributor
|
Thanks for this contribution! The picture-download feature you implemented here has since landed on Automated hermes-sweeper review
The main-branch implementation uses the Closing as implemented. If you see any gaps in the current implementation compared to what you had here, feel free to open a targeted follow-up issue! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the last gap preventing DingTalk vision use cases: previously every
picture(and inlinerichTextimage) was replaced with the literal text[图片], so vision tools had no image bytes to work with.This PR exchanges each
downloadCodefor the temporary CDN URL via `/v1.0/robot/messageFiles/download`, fetches the bytes, caches them locally, and populatesMessageEvent.media_urls— exactly the pattern Telegram/Slack/WhatsApp already use.Implementation notes
register_callback_handlerwireshandler.dingtalk_client = stream_clientso the SDK's access-token plumbing andget_image_download_urlare reachable without re-implementing token caching. Adapter keeps the handler reference onself._handler.get_image_download_urlis sync (requests-based) — offloaded viaasyncio.to_threadto keep the event loop clean.[图片]text placeholder so the agent knows something arrived but couldn't be decoded.message.get_image_list()handles bothpictureandrichTextmsgtypes, so inline pictures in rich messages are handled the same way.[图片 × N]as a short caption that accompanies the attached media list.Test plan
_on_messagepicture dispatch → populatesMessageEvent.media_urlsand rewrites text to[图片 × 1]vision_analyzewith real bytes via onehub/gemini-3-pro-latest.Stacking
Depends on:
This branch merges #8954 inline so it applies against
maindirectly, but reviewers should merge in order #8954 → #8957 → #8960 → this one to keep history tidy.