feat(dingtalk): rich-media inbound pipeline — download, parse, persist#14335
Open
meng93 wants to merge 1 commit into
Open
feat(dingtalk): rich-media inbound pipeline — download, parse, persist#14335meng93 wants to merge 1 commit into
meng93 wants to merge 1 commit into
Conversation
Collaborator
This was referenced Apr 23, 2026
891bcf2 to
56a3d97
Compare
- Add file content parsers (_parse_text_file, _parse_docx_file, _parse_pdf_file, _parse_excel_file) with graceful fallbacks when optional dependencies are missing. Parsed text is injected into the agent context so the LLM can reason over document contents. - Add inbound media directory helpers (_inbound_media_dir, _cleanup_inbound_media) with 24 h automatic purge so downloaded attachments do not accumulate on disk. - Add _download_file_to_inbox() — authenticated download via DingTalk /v1.0/robot/messageFiles/download with resumable-range support and _DINGTALK_MEDIA_MAX_SIZE (20 MB) guard. - Add _download_images_to_local() — batch download of photo/rich-text image download_codes to local paths for multi-modal agent input. - Add _extract_and_parse_file_attachments() — walk the raw inbound extensions payload, resolve download URLs, download, parse, and return structured attachment metadata for the agent context. - Integrate media pipeline into _on_message(); send() @mention support. - Add python-docx, pdfplumber, openpyxl to dingtalk optional deps in pyproject.toml.
56a3d97 to
680c1bb
Compare
19 tasks
Open
5 tasks
This was referenced May 24, 2026
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a full inbound rich-media pipeline for the DingTalk adapter: download images / files from DingTalk's CDN, parse document contents (docx / pdf / xlsx / txt), persist to a local inbox directory with automatic 24 h purge, and inject parsed text into the agent context so the LLM can reason over attachments.
Motivation
DingTalk users regularly share screenshots, PDFs, spreadsheets, and Word documents in chats. Without media handling, the agent only sees a placeholder
[文件]or[图片]and cannot act on the content. This PR gives the agent the same document-understanding capability that the Telegram and Discord adapters already have.Changes
gateway/platforms/dingtalk.py_parse_text_file,_parse_docx_file,_parse_pdf_file,_parse_excel_filewith graceful fallbacksgateway/platforms/dingtalk.py_inbound_media_dir()+_cleanup_inbound_media()— 24 h auto-purgegateway/platforms/dingtalk.py_download_file_to_inbox()— authenticated download via DingTalk/v1.0/robot/messageFiles/downloadwith 20 MB guardgateway/platforms/dingtalk.py_download_images_to_local()— batch download photo/rich-text images for multi-modal inputgateway/platforms/dingtalk.py_extract_and_parse_file_attachments()— walk raw extensions payload, resolve URLs, download, parsegateway/platforms/dingtalk.py_resolve_single_download_url()— thin wrapper for the download endpointgateway/platforms/dingtalk.py_on_message();send()@mention supportpyproject.tomlpython-docx,pdfplumber,openpyxlto dingtalk optional depsTest Plan
tests/gateway/test_dingtalk.pysuite passes.Risk Assessment
Low-Medium. File parsing dependencies (
python-docx,pdfplumber,openpyxl) are optional — the parsers degrade gracefully with a log warning when not installed. Downloaded files are capped at 20 MB and auto-purged after 24 h.