Skip to content

fix: add md, html, and htm to MEDIA: extractor and make extension matching case-insensitive#31822

Closed
Kailigithub wants to merge 1 commit into
NousResearch:mainfrom
Kailigithub:fix/media-extraction-add-document-extensions
Closed

fix: add md, html, and htm to MEDIA: extractor and make extension matching case-insensitive#31822
Kailigithub wants to merge 1 commit into
NousResearch:mainfrom
Kailigithub:fix/media-extraction-add-document-extensions

Conversation

@Kailigithub

Copy link
Copy Markdown
Contributor

fix: add md, html, and htm to MEDIA: extractor and make extension matching case-insensitive

Closes #31560

The BasePlatformAdapter.extract_media() regex did not recognize .md, .html, or .htm files in MEDIA: tags, causing generated markdown reports and HTML artifacts to be silently dropped instead of delivered as attachments. The extractor was also case-sensitive for all extensions.

Changes:

  • Add md, html? (matching .html and .htm) to the MEDIA: tag extension list
  • Make the regex case-insensitive so .MD, .HTML, .PNG, .JPG etc. are all recognized
  • Add tests covering .md, .html, .htm extraction and case-insensitive matching

@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #31561 which already adds md, html, htm to the MEDIA: extractor regex with case-insensitive matching. #31754 was also closed as a duplicate of #31561.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists labels May 25, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #34844, which consolidates this cluster.

This PR widens the extract_media extension allowlist, which is the right direction — but on its own it leaves the unconditional MEDIA:\s*\S+ strip in place, so a MEDIA: tag with any extension still outside the (now wider) list keeps getting deleted from the body before extract_local_files can pick up the bare path. #34844 fixes both halves: it unifies the two extractors onto a single shared extension set (MEDIA_DELIVERY_EXTS) AND replaces the loose strip with an extension-anchored one, so an unknown-extension path survives in the text instead of vanishing.

Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844.

@teknium1 teknium1 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway MEDIA tags do not recognize markdown attachments

3 participants