Skip to content

fix(gateway): parse Windows media paths with spaces#26368

Closed
aqilaziz wants to merge 3 commits into
NousResearch:mainfrom
aqilaziz:fix/extract-media-unicode-paths
Closed

fix(gateway): parse Windows media paths with spaces#26368
aqilaziz wants to merge 3 commits into
NousResearch:mainfrom
aqilaziz:fix/extract-media-unicode-paths

Conversation

@aqilaziz

Copy link
Copy Markdown
Contributor

Summary

  • Preserve unquoted Windows MEDIA: paths that include spaces and Unicode characters, e.g. C:\Users\Иван Иванов\voice file.ogg.
  • Keep the existing quoted/backticked, POSIX, tilde, and fallback path parsing behavior intact.
  • Add regression coverage for the Cyrillic/space filename case called out as the novel Bug 1 in Telegram: TTS voice bubbles not delivered despite valid OGG Opus + #26355.

Related Issue

Related to #26355 (Bug 1: Cyrillic filenames / path parsing in extract_media). I did not mark the whole issue as closing because the issue also mentions auto-voice .mp3 / OGG delivery items that maintainers identified as duplicates or covered by other PRs.

Test Plan

  • python -m pytest -o addopts= tests\gateway\test_platform_base.py -q --tb=short
  • python -m pytest -o addopts= tests\gateway\test_platform_base.py tests\run_agent\test_provider_parity.py::TestDeveloperRoleSwap::test_developer_role_via_nous_portal tests\run_agent\test_provider_parity.py::TestBuildApiKwargsNousPortal::test_includes_nous_product_tags tests\run_agent\test_provider_parity.py::TestBuildApiKwargsNousPortal::test_uses_chat_completions_format tests\gateway\test_discord_free_response.py::test_fetch_channel_context_returns_empty_when_channel_lacks_history tests\e2e\test_discord_adapter.py -q --tb=short
  • python -m ruff check gateway\platforms\base.py tests\gateway\test_platform_base.py tests\run_agent\test_provider_parity.py tests\gateway\test_discord_free_response.py tests\e2e\test_discord_adapter.py gateway\platforms\discord.py
  • git diff --check HEAD~3..HEAD

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/discord Discord bot adapter labels May 15, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #24032 (root issue). Multiple competing open PRs exist for the same extract_media regex bug: #24049, #24132, #24217, #24384. This PR adds Cyrillic/Unicode path handling per #26355 Bug 1.

@teknium1

Copy link
Copy Markdown
Contributor

This looks implemented on current main now.

Automated hermes-sweeper review evidence:

  • gateway/platforms/base.py:1225 now accepts Windows drive-letter MEDIA: paths (X:\ / X:/) and preserves spaced path segments through the deliverable extension.
  • BasePlatformAdapter.extract_media() uses that shared regex for extraction at gateway/platforms/base.py:2916.
  • I verified the exact case from this PR against current main: MEDIA:C:\Users\Иван Иванов\voice file.ogg extracts as C:\Users\Иван Иванов\voice file.ogg and leaves empty cleaned text.
  • Windows MEDIA tag support landed in 51d165a8e71ca84112708af4a9add7a71e4ee424 (fix(gateway): support Windows absolute paths in MEDIA tag regex and extract_local_files (#34632)).

Thanks for calling out the Cyrillic/space filename case; the current extractor behavior covers it.

@teknium1 teknium1 closed this Jun 12, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/discord Discord bot adapter sweeper:implemented-on-main Sweeper: behavior already present on current main type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants