Skip to content

fix(gateway): preserve spaced MEDIA paths#24217

Open
felix-windsor wants to merge 2 commits into
NousResearch:mainfrom
felix-windsor:codex/fix-media-spaced-paths
Open

fix(gateway): preserve spaced MEDIA paths#24217
felix-windsor wants to merge 2 commits into
NousResearch:mainfrom
felix-windsor:codex/fix-media-spaced-paths

Conversation

@felix-windsor

Copy link
Copy Markdown
Contributor

Summary

  • preserve full MEDIA paths that contain spaces, including Windows drive paths and UNC shares
  • add GIS / structured-data extensions such as kmz, kml, geojson, gpx, json, xml, and html
  • reuse the base platform MEDIA parser in gateway result collection and update MCP/TUI handling

Fixes #24032

Tests

  • scripts/run_tests.sh tests/gateway/test_platform_base.py::TestExtractMedia tests/gateway/test_stream_consumer.py::TestCleanForDisplay tests/gateway/test_send_image_file.py tests/test_mcp_serve.py::TestAttachmentExtraction tests/cron/test_scheduler.py::TestResolveDeliveryTarget
  • npm test -- --run src/__tests__/markdown.test.ts
  • ruff check gateway/platforms/base.py gateway/stream_consumer.py gateway/run.py mcp_serve.py tests/gateway/test_platform_base.py tests/gateway/test_stream_consumer.py tests/test_mcp_serve.py

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/tui Terminal UI (ui-tui/ + tui_gateway/) tool/mcp MCP client and OAuth labels May 12, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related: This PR is a superset of #24132 and #24049 (both fix #24032). Also overlaps with #6742 (MCP spaced paths). This PR unifies the fix across gateway, MCP, stream consumer, and TUI.

@liuhao1024 liuhao1024 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extension list _MEDIA_EXTENSIONS_RE is duplicated verbatim in 4 places: gateway/platforms/base.py, gateway/stream_consumer.py, mcp_serve.py, and (partially) ui-tui/src/components/markdown.tsx. If someone adds a new extension, they'd need to find and update all four copies — an easy miss.

Since base.py already defines it as a class constant on BasePlatformAdapter, consider importing it from there in the other Python files:

# stream_consumer.py / mcp_serve.py
from gateway.platforms.base import BasePlatformAdapter
_MEDIA_EXTENSIONS_RE = BasePlatformAdapter._MEDIA_EXTENSIONS_RE

For the TypeScript side, a shared MEDIA_EXTENSIONS array or a comment cross-referencing the Python source would help keep them in sync.

@Bartok9

Bartok9 commented May 12, 2026

Copy link
Copy Markdown
Contributor

I rechecked this against current origin/main (99ad2d137).

tests/gateway/test_platform_base.py::TestExtractMedia and tests/gateway/test_stream_consumer.py::TestCleanForDisplay both pass cleanly on current main (24 tests). The spaced-path parsing gap is still present in main. This PR's fix for space-containing paths and the added GIS/structured-data extensions look correct and the CI baseline should be clean for merge.

@felix-windsor

Copy link
Copy Markdown
Contributor Author

The extension list _MEDIA_EXTENSIONS_RE is duplicated verbatim in 4 places: gateway/platforms/base.py, gateway/stream_consumer.py, mcp_serve.py, and (partially) ui-tui/src/components/markdown.tsx. If someone adds a new extension, they'd need to find and update all four copies — an easy miss.

Since base.py already defines it as a class constant on BasePlatformAdapter, consider importing it from there in the other Python files:

# stream_consumer.py / mcp_serve.py
from gateway.platforms.base import BasePlatformAdapter
_MEDIA_EXTENSIONS_RE = BasePlatformAdapter._MEDIA_EXTENSIONS_RE

For the TypeScript side, a shared MEDIA_EXTENSIONS array or a comment cross-referencing the Python source would help keep them in sync.

Addressed in 663120ff5.

  • Reused BasePlatformAdapter._MEDIA_TAG_RE for the stream consumer.
  • Reused BasePlatformAdapter.extract_media() in mcp_serve.py instead of keeping a separate extension regex.
  • Added a shared MEDIA_EXTENSIONS array in the TUI with a comment pointing back to the Python source of truth.

Verified with:

  • scripts/run_tests.sh tests/gateway/test_platform_base.py::TestExtractMedia tests/gateway/test_stream_consumer.py::TestCleanForDisplay tests/gateway/test_send_image_file.py tests/test_mcp_serve.py::TestAttachmentExtraction tests/cron/test_scheduler.py::TestResolveDeliveryTarget
  • npm run build --prefix packages/hermes-ink && npm test -- --run src/__tests__/markdown.test.ts
  • ruff check gateway/stream_consumer.py mcp_serve.py

@felix-windsor

Copy link
Copy Markdown
Contributor Author

Follow-up commit pushed for the extension-list deduplication feedback. This is ready for another look when convenient.

@Bartok9

Bartok9 commented May 14, 2026

Copy link
Copy Markdown
Contributor

Verified 663120ff5. Clean dedup work:

  • gateway/stream_consumer.py now uses BasePlatformAdapter._MEDIA_TAG_RE (single source of truth)
  • mcp_serve.py calls BasePlatformAdapter.extract_media() directly — eliminates the separate extension regex entirely
  • ✅ TUI MEDIA_EXTENSIONS array with cross-reference comment back to Python — best you can do across language boundaries
  • ✅ Test coverage confirmed across Python (gateway + mcp_serve + cron) and TUI (Vitest)
  • ruff check clean

The cross-language comment pattern is the right tradeoff. Future maintainers updating the Python regex will see the TUI comment when they grep for the extension list. Approving on the follow-up. 🎻

@felix-windsor

Copy link
Copy Markdown
Contributor Author

Thanks for rechecking and validating the follow-up. I’ll leave the PR unchanged unless maintainers spot anything else.

@felix-windsor

Copy link
Copy Markdown
Contributor Author

Rebased on latest main (as of 2026-05-18) and force-pushed.

Local verification:

  • ./scripts/run_tests.sh tests/gateway/test_platform_base.py::TestExtractMedia tests/gateway/test_stream_consumer.py::TestCleanForDisplay tests/test_mcp_serve.py::TestAttachmentExtraction tests/cron/test_scheduler.py::TestResolveDeliveryTarget

(Still seeing no CI statuses reported on the PR head; may need maintainer Actions approval for fork PRs.)

@felix-windsor felix-windsor force-pushed the codex/fix-media-spaced-paths branch from 9d12f80 to 34c4512 Compare May 18, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery comp/tui Terminal UI (ui-tui/ + tui_gateway/) P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] extract_media regex truncates Windows spaced paths and rejects GIS extensions (.kmz/.kml/.geojson/.gpx)

4 participants