Skip to content

fix(gateway): classify document attachments as DOCUMENT on Signal, Email, and SimpleX#44695

Merged
teknium1 merged 7 commits into
mainfrom
hermes/hermes-fd5f7237
Jun 12, 2026
Merged

fix(gateway): classify document attachments as DOCUMENT on Signal, Email, and SimpleX#44695
teknium1 merged 7 commits into
mainfrom
hermes/hermes-fd5f7237

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Inbound document attachments now classify as MessageType.DOCUMENT on Signal, Email, and SimpleX — previously they were cached into media_urls but left as TEXT, so run.py's document-context injection (gated strictly on DOCUMENT) silently dropped the file and the agent never saw it. Fixes #12845.

Salvages PR #12851 by @kdunn926 (authorship preserved via cherry-pick) and widens the fix to the whole bug class per the contribution rubric.

Changes

  • gateway/platforms/signal.py: application/* and text/* -> DOCUMENT (@kdunn926's fix), widened to video/* -> VIDEO and catch-all -> DOCUMENT, matching the WhatsApp/Slack/BlueBubbles/Mattermost pattern
  • gateway/platforms/email.py: document attachments -> DOCUMENT (was image-only classification); DOCUMENT wins for mixed image+doc emails since image handling keys off per-path mime types while doc injection gates on message_type
  • plugins/platforms/simplex/adapter.py: non-image/non-audio files -> DOCUMENT (was audio/image only)
  • Tests: 3 from the original PR + 7 new (Signal video/unknown-MIME, email doc/mixed, simplex doc/photo-regression)

Cross-platform audit

Adapter DOCUMENT classification
telegram, whatsapp, slack, matrix, bluebubbles, wecom, weixin, dingtalk, feishu, yuanbao, discord, mattermost, google_chat, line, photon already correct
signal, email, simplex fixed here
sms, webhook text-only by design
teams only caches images at all — bigger gap, tracked separately

Validation

scripts/run_tests.sh tests/gateway/test_signal.py tests/gateway/test_email.py tests/gateway/test_simplex_plugin.py — 200 passed, 0 failed.

Infographic

Document attachment classification infographic

kdunn926 and others added 7 commits June 11, 2026 22:54
…fallback

Widen the salvaged #12851 fix to match the established classification
pattern (WhatsApp/Slack/BlueBubbles/Mattermost): video/* -> VIDEO, and
any remaining MIME type falls through to DOCUMENT instead of TEXT, so
exotic types still trigger run.py's document-context injection.
Email cached document attachments and placed them in media_urls, but
msg_type only flipped on image attachments — documents stayed TEXT and
run.py's document-context injection (gated on MessageType.DOCUMENT)
silently dropped them. Same bug class as Signal #12845. DOCUMENT wins
over PHOTO for mixed attachments since image handling keys off per-path
mime types while document injection gates strictly on message_type.
SimpleX tagged unknown files application/octet-stream in media_types
but classification only handled audio/image, leaving msg_type TEXT —
run.py never injected the document context. Same bug class as #12845.
@liuhao1024

Copy link
Copy Markdown
Contributor

✅ Verified — document attachment classification on Signal, Email, and SimpleX

Reviewed the diff for MessageType.DOCUMENT classification across three platform adapters.

  • Signal (signal.py): Added VIDEO type for video/* MIME and DOCUMENT catch-all for application/*, text/*, and unknown types. Previously these fell through to TEXT, causing run.py's document-context injection to silently drop the file.
  • Email (email.py): Document attachments now set DOCUMENT type; for mixed image+document attachments, DOCUMENT wins because run.py's image handling keys off per-path MIME types regardless of message_type, but document injection gates strictly on MessageType.DOCUMENT.
  • SimpleX (simplex/adapter.py): Same catch-all pattern — non-image/non-audio files classified as DOCUMENT.
  • Test coverage: Comprehensive — PDF, text/plain, text/html, video/mp4, unknown MIME, mixed attachments, and image regression guard. All three platforms have dedicated test classes.

The fix is correct and consistent. The catch-all else branch ensures any attachment type that isn't image/voice/video gets properly surfaced to the agent. No issues found.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/signal Signal CLI adapter platform/email Email (IMAP/SMTP) adapter labels Jun 12, 2026
@teknium1 teknium1 merged commit 8b2a3c9 into main Jun 12, 2026
28 checks passed
@teknium1 teknium1 deleted the hermes/hermes-fd5f7237 branch June 12, 2026 08:07
teknium1 added a commit that referenced this pull request Jun 12, 2026
…CUMENT (#44778)

The Teams adapter only handled image/* attachments — documents (the
application/vnd.microsoft.teams.file.download.info consent-free download
payload and any direct-URL non-image attachment) never reached media_urls
at all, so run.py's document-context injection had nothing to surface.
Completes the class-wide sweep from PR #44695 (Signal/Email/SimpleX).

- download.info attachments: fetch the pre-authed SharePoint downloadUrl
  (SSRF-guarded, same guard chain as base.py cache_*_from_url) and route
  through cache_media_bytes
- direct-URL non-image attachments: same fetch + classify path
- skip Teams' text/html message-body mirror and adaptive-card attachments
- DOCUMENT > PHOTO > VIDEO > AUDIO precedence for mixed attachments,
  matching the Email precedence rationale from #44695
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/email Email (IMAP/SMTP) adapter platform/signal Signal CLI adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Document attachments from Signal not detected

4 participants