Skip to content

fix(signal): classify inbound document attachments#20347

Open
bricolage wants to merge 3 commits into
NousResearch:mainfrom
bricolage:pr-signal-pdf-fix-20260505
Open

fix(signal): classify inbound document attachments#20347
bricolage wants to merge 3 commits into
NousResearch:mainfrom
bricolage:pr-signal-pdf-fix-20260505

Conversation

@bricolage

@bricolage bricolage commented May 5, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes Signal inbound attachment handling so PDFs and other non-image/non-audio/non-video files arrive as MessageType.DOCUMENT instead of MessageType.TEXT. That restores the document branch in gateway/run.py, so Hermes can surface the saved file path to the agent instead of silently dropping the attachment context.

This also preserves inbound document filenames and extensions when caching Signal attachments. If Signal does not provide a filename, the adapter falls back to Hermes' shared SUPPORTED_DOCUMENT_TYPES mapping before using generic MIME guessing. video/* attachments still remain MessageType.VIDEO.

This supersedes #12851 by centralizing the classification logic, adding regression coverage for mixed media and OOXML attachments, and preserving video/* as MessageType.VIDEO instead of leaving it as plain text.

Related Issue

Fixes #12845

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • Added _detect_inbound_message_type() in gateway/platforms/signal.py to classify inbound Signal attachments from MIME type.
  • Routed non-image/non-audio/non-video attachments to MessageType.DOCUMENT so document context injection runs.
  • Routed video/* attachments to MessageType.VIDEO so broader document handling does not regress video behavior.
  • Passed inbound attachment metadata into _fetch_attachment() so Signal document caching can preserve the user-visible filename when available.
  • Added MIME-based fallback naming for inbound documents using Hermes' shared SUPPORTED_DOCUMENT_TYPES mapping when Signal omits the filename.
  • Added regression tests for helper-level classification (image, video, application/pdf, mixed media).
  • Added regression tests for inbound PDF and DOCX envelope handling so cached document paths and MessageType.DOCUMENT are preserved.
  • Added regression tests for DOCX filename preservation and content-type-based extension fallback when the inbound attachment has no filename.

How to Test

  1. Run python -m pytest tests/gateway/test_signal.py -q in a normal repo dev environment.
  2. Send a Signal message with a PDF attachment and confirm the dispatched event uses MessageType.DOCUMENT instead of MessageType.TEXT.
  3. Send a Signal message with a DOCX attachment and confirm the cached file retains a .docx suffix instead of degrading to .zip.
  4. Confirm the gateway enters the document branch in gateway/run.py and injects the saved file path into agent context.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Ubuntu 24.04 in Docker

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

$ /root/hermes/src/.venv-hermes-tests/bin/python -m pytest -o addopts='' tests/gateway/test_signal.py -q
110 passed, 1 warning in 1.79s

@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery platform/signal Signal CLI adapter P2 Medium — degraded but workaround exists labels May 5, 2026
@0xtotaylor

Copy link
Copy Markdown

I’m hit by the exact issue this PR fixes. Can we get it across the finish line, @alt-glitch?

@bricolage

Copy link
Copy Markdown
Author

I just updated it again to be even more robust for other attachments, and to align more closely with existing patterns.

@0xtotaylor

Copy link
Copy Markdown

@alt-glitch, would love to get this pushed through. Big lift for privacy-focused users, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/signal Signal CLI adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Document attachments from Signal not detected

3 participants