MEDIA: tag silently drops .md (and other) files due to regex whitelist mismatch

## Related

Introduced by PR #28350 (diagnosable MEDIA rejections + canonical cache roots + null-path guard).

## Problem

`extract_media` uses a strict extension whitelist that does **not** include `.md` (nor `.json`, `.yaml`, `.xml`, `.tsv`, etc.), while the fallback `extract_local_files` does support them.

However, line 3709 unconditionally strips **all** `MEDIA:` tags from the response text with a loose regex (`MEDIA:\s*\S+`) — even those that `extract_media` failed to match.

This creates a black hole for unsupported extensions:

1. `extract_media` (strict regex) → no match for `.md`
2. Cleanup regex `re.sub(r"MEDIA:\s*\S+", "", ...)` → removes the path from text
3. `extract_local_files` (broad extension list) → runs on already-cleaned text, path is gone

**Result:** The file is neither extracted as media nor detected as a bare path. The user receives nothing.

## Reproduction

```python
import re

# extract_media pattern (line 2524)
media_pattern = re.compile(
    r'''[`"']?MEDIA:\s*(?P<path>...)\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|txt|csv|apk|ipa)(?=[\s\`"',;:)\]}]|$))[`"']?'''
)

# cleanup pattern (line 3709)
cleanup = re.compile(r'MEDIA:\s*\S+')

text = 'Here is your report: MEDIA:/tmp/paid_users_up_analysis.md'

assert media_pattern.search(text) is None        # not extracted
cleaned = cleanup.sub('', text).strip()
assert '/tmp/paid_users_up_analysis.md' not in cleaned  # path gone
```

## Fix

Align `extract_media`'s extension whitelist with `extract_local_files`'s supported set. Missing extensions include: `md`, `json`, `xml`, `ya?ml`, `tsv`, `odt`, `rtf`, `bmp`, `tiff`, `svg`, `tar`, `gz`, `tgz`, `bz2`, `xz`, `xls`, `ods`, `ppt`, `odp`, `key`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MEDIA: tag silently drops .md (and other) files due to regex whitelist mismatch #34517

Related

Problem

Reproduction

Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MEDIA: tag silently drops .md (and other) files due to regex whitelist mismatch #34517

Description

Related

Problem

Reproduction

Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions