Skip to content

feat(gateway): add document extensions to MEDIA: regex#19926

Closed
maxneo4 wants to merge 1 commit into
NousResearch:mainfrom
maxneo4:feat/media-document-extensions
Closed

feat(gateway): add document extensions to MEDIA: regex#19926
maxneo4 wants to merge 1 commit into
NousResearch:mainfrom
maxneo4:feat/media-document-extensions

Conversation

@maxneo4

@maxneo4 maxneo4 commented May 4, 2026

Copy link
Copy Markdown

Description

Extend extract_media() in gateway/platforms/base.py to recognize document, data, GPS, and config file types so they are delivered natively as file attachments via send_document() instead of being silently ignored.

Added Extensions

Category Extensions Use Case
GPS/GIS gpx, kml, geojson Routes, waypoints, maps (Locus Map, Google Maps)
Data/code json, xml, md, html, js, py, sh APIs, configs, scripts, notes
Config yaml, yml, toml Configuration files

Why This Matters

The gateway already has send_document() implemented for platforms that support it (Telegram, Discord, etc.), but the regex in extract_media() only matched image, video, and audio extensions. When the model includes MEDIA:/path/to/file.gpx, it was silently ignored — the user never received the file.

This change enables:

  • Native GPX delivery — GPS routes sent as downloadable documents in Telegram, no workaround needed
  • PDF/CSV/ZIP support — Documents and data exports arrive as native file attachments
  • Config/code sharing — Scripts and configs delivered as downloadable files
  • Zero infrastructure changessend_document() already handles non-media types

Testing

Verified that MEDIA:/path/to/file.gpx is now extracted from response text and delivered as a native document attachment via Telegram gateway.

Checklist

  • Code follows project conventions
  • Change is minimal (single regex line modification)
  • No breaking changes — existing media types still work

@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery labels May 4, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #14307 (merged) — same change: extending MEDIA: regex with document extensions in gateway/platforms/base.py. The extensions added here (json, xml, md, html, js, py, sh, yaml, yml, toml) may partially overlap with what #14307 already added. Please verify which extensions are actually missing before merging.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #14307 (merged) — same change: extending MEDIA: regex with document extensions in gateway/platforms/base.py.

- Add .gpx, .kml, .geojson, .html, .htm, .js MIME types
- Extend MEDIA: regex: gpx|kml|geojson|html|htm|js|py|sh|json|xml|yaml|yml|toml|md
@maxneo4 maxneo4 force-pushed the feat/media-document-extensions branch from ad5be50 to 449f2e9 Compare May 27, 2026 02:38
@maxneo4

maxneo4 commented May 27, 2026

Copy link
Copy Markdown
Author

This PR adds support for GIS and code/document extensions that are not covered by any of the similar open PRs:

New MIME types (not in #24049 or #24217):

  • .gpx, .kml, .geojson — GPS/GIS file formats that users can produce and send via
  • .html, .htm, .js — web artifacts that skills generate and should deliver as native documents

**Extended to recognize: gpx|kml|geojson|html|htm|js|py|sh|json|xml|yaml|yml|toml|md

Real-world use case: I use Hermes to generate GPX files from my travels — asking it for points of interest in a city, restaurants, or landmarks, then rendering the output as .gpx with waypoints to import into map apps like HERE WeGo, Organic Maps, etc. Without these extensions in the MIME dict and the the gateway couldn't recognize and deliver these files natively.

PRs #24049 (Windows drive letters + GIS extensions by another author) and #24217 (spaced paths) don't overlap with the actual MIME type additions or the full extension list in the regex. This PR provides the complete set needed for the GPS/map workflow.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #34844, which consolidates this cluster.

This PR widens the extract_media extension allowlist, which is the right direction — but on its own it leaves the unconditional MEDIA:\s*\S+ strip in place, so a MEDIA: tag with any extension still outside the (now wider) list keeps getting deleted from the body before extract_local_files can pick up the bare path. #34844 fixes both halves: it unifies the two extractors onto a single shared extension set (MEDIA_DELIVERY_EXTS) AND replaces the loose strip with an extension-anchored one, so an unknown-extension path survives in the text instead of vanishing.

Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844.

@teknium1 teknium1 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants