Skip to content

feat(gateway): add more file extensions to MEDIA delivery regex#33454

Closed
anxkhn wants to merge 1 commit into
NousResearch:mainfrom
anxkhn:feat/media-more-file-extensions
Closed

feat(gateway): add more file extensions to MEDIA delivery regex#33454
anxkhn wants to merge 1 commit into
NousResearch:mainfrom
anxkhn:feat/media-more-file-extensions

Conversation

@anxkhn

@anxkhn anxkhn commented May 27, 2026

Copy link
Copy Markdown

What does this PR do?

The gateway's MEDIA:<path> file delivery regex only recognized a hardcoded set of extensions (images, video, audio, pdf, zip, office docs). When the agent tried to send other file types like .vcf contacts, .html, .md, or .json via MEDIA:<path>, the tag was silently ignored — the file never reached the user on Telegram/Discord/etc.

I ran into this firsthand: I asked Hermes to send me a .vcf contacts file over Telegram and it just... didn't show up. The text arrived but the file was dropped silently. Turns out the extension wasn't in the allowlist.

This adds 11 new extensions to all 3 regex patterns that handle MEDIA tag extraction:

  • Contact/calendar: vcf, ics
  • Data/config: json, yaml, yml, toml
  • Document/web: md, html/htm, xml, svg

All of these are file types that Telegram (and other messaging platforms) already handle natively as document attachments — we just weren't passing them through.

Related Issue

N/A — hit this while using Hermes day-to-day.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)

Changes Made

  • gateway/platforms/base.py — extended media_pattern regex in _extract_media_paths()
  • gateway/run.py — extended _TOOL_MEDIA_RE in 2 spots (history dedup scan + tool result media scan)

How to Test

  1. Create a test .vcf, .html, or .json file on the host
  2. Have the agent respond with MEDIA:/path/to/file.vcf on Telegram
  3. Verify the file arrives as a downloadable document attachment (previously: silently dropped)

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run ruff check and all checks pass
  • I've added tests for my changes — see note below
  • I've tested on my platform: Ubuntu 24.04 via Telegram gateway

Documentation & Housekeeping

  • N/A — no new config keys, no architecture changes, no doc changes needed

Note on tests

The three regex patterns have identical structure and only differ in file extension coverage. I'm happy to add a unit test if the maintainers would find it useful — just wasn't sure if there's an existing test module for the MEDIA extraction logic I should extend vs. creating a new one.

The MEDIA tag regex only recognized a narrow set of extensions (images,
video, audio, pdf, zip, office docs). Files like .vcf, .html, .md, .json
were silently dropped instead of being sent as attachments on Telegram
and other platforms.

Add 11 new extensions: vcf, ics, json, yaml, yml, toml, md, html/htm,
xml, svg — across all 3 regex patterns in the gateway.
@alt-glitch alt-glitch added type/feature New feature or request comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists P3 Low — cosmetic, nice to have labels May 27, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #29609 which dynamically derives the extension set from SUPPORTED_DOCUMENT_TYPES — the preferred approach. This PR adds static extensions to the regex, same as the 10+ other competing PRs in this cluster (#32294, #33244, #32398, #33127, etc.). See #29609 for the canonical fix.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #34844, which consolidates this cluster.

This PR widens the extract_media extension allowlist, which is the right direction — but on its own it leaves the unconditional MEDIA:\s*\S+ strip in place, so a MEDIA: tag with any extension still outside the (now wider) list keeps getting deleted from the body before extract_local_files can pick up the bare path. #34844 fixes both halves: it unifies the two extractors onto a single shared extension set (MEDIA_DELIVERY_EXTS) AND replaces the loose strip with an extension-anchored one, so an unknown-extension path survives in the text instead of vanishing.

Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants