Skip to content

fix(gateway): restore .md extension to MEDIA extraction allowlist#30192

Closed
Kailigithub wants to merge 1 commit into
NousResearch:mainfrom
Kailigithub:fix/30186-add-md-to-media-extraction
Closed

fix(gateway): restore .md extension to MEDIA extraction allowlist#30192
Kailigithub wants to merge 1 commit into
NousResearch:mainfrom
Kailigithub:fix/30186-add-md-to-media-extraction

Conversation

@Kailigithub

Copy link
Copy Markdown
Contributor

Summary

Commit ea49b38 ("tighten MEDIA extraction regex + silent skip on file-not-found") replaced the permissive MEDIA:\\S+ regex with an explicit extension allowlist in three places. The .md (Markdown) extension was inadvertently omitted from that allowlist.

Changes

  • gateway/platforms/base.pyextract_media(): add md to the extension alternation
  • gateway/run.py_TOOL_MEDIA_RE (×2): add md to the extension alternation

All three patterns now include |md alongside |txt|csv|apk|ipa.

Verification

  • python3 -m py_compile passes on both modified files
  • All 15 TestExtractMedia tests pass (including a new regression test added for this fix)
  • No other extensions or behaviour are affected
  • The fix is a pure additive change to three regex literals; no logic changes

Closes #30186

@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists labels May 22, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #29609 which dynamically derives the MEDIA extension allowlist from SUPPORTED_DOCUMENT_TYPES — the preferred approach per #30106 triage. Also duplicates #30193 (identical .md-only fix). See also #29710 (.html variant).

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #34844, which consolidates this cluster.

This PR widens the extract_media extension allowlist, which is the right direction — but on its own it leaves the unconditional MEDIA:\s*\S+ strip in place, so a MEDIA: tag with any extension still outside the (now wider) list keeps getting deleted from the body before extract_local_files can pick up the bare path. #34844 fixes both halves: it unifies the two extractors onto a single shared extension set (MEDIA_DELIVERY_EXTS) AND replaces the loose strip with an extension-anchored one, so an unknown-extension path survives in the text instead of vanishing.

Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844.

@teknium1 teknium1 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MEDIA extraction regex missing .md extension after ea49b3862

3 participants