Skip to content

fix(discord): transcribe native voice notes#28918

Closed
helix4u wants to merge 1 commit into
NousResearch:mainfrom
helix4u:fix/discord-voice-note-transcription
Closed

fix(discord): transcribe native voice notes#28918
helix4u wants to merge 1 commit into
NousResearch:mainfrom
helix4u:fix/discord-voice-note-transcription

Conversation

@helix4u

@helix4u helix4u commented May 19, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes Discord native voice-note handling so inbound voice notes are classified as MessageType.VOICE instead of generic MessageType.AUDIO.

Discord.py exposes native voice-note metadata through Attachment.is_voice_message(). Hermes was only checking content_type.startswith("audio/"), which routed Discord voice notes into the plain audio attachment path. The gateway intentionally skips automatic STT for MessageType.AUDIO, so native Discord voice notes were cached as files instead of being automatically transcribed.

This keeps ordinary audio uploads as MessageType.AUDIO and only marks native Discord voice-message attachments as MessageType.VOICE.

Related Issue

Support report: Discord voice notes stopped auto-transcribing after update.

Fixes #

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/platforms/discord.py: detect native Discord voice-note attachments with attachment.is_voice_message() before the generic audio/* branch.
  • tests/gateway/test_discord_attachment_download.py: add regression coverage proving native voice notes become MessageType.VOICE while ordinary audio uploads remain MessageType.AUDIO.

How to Test

  1. Send a native Discord voice note to the Hermes Discord bot.
  2. Confirm the Discord adapter emits a MessageType.VOICE event for the attachment.
  3. Confirm the gateway auto-STT path handles the voice note instead of surfacing it only as a generic audio file attachment.

Targeted tests run locally:

  • python -m pytest tests/gateway/test_discord_attachment_download.py tests/gateway/test_telegram_audio_vs_voice.py -q — 19 passed
  • python -m pytest tests/gateway/test_stt_config.py -q — 6 passed

Full suite run locally:

  • scripts/run_tests.sh — 17 failed, 24561 passed, 54 skipped, 250 warnings in 646.20s

Full-suite failures observed:

  • tests/gateway/test_api_server.py::TestAdapterInit::test_default_config
  • tests/gateway/test_approve_deny_commands.py::TestBlockingApprovalE2E::test_blocking_approval_approve_once
  • tests/gateway/test_approve_deny_commands.py::TestBlockingApprovalE2E::test_blocking_approval_deny
  • tests/gateway/test_config.py::TestLoadGatewayConfig::test_bridges_quoted_false_platform_enabled_from_config_yaml
  • tests/gateway/test_discord_bot_filter.py::TestDiscordBotFilter::test_default_is_none
  • tests/gateway/test_restart_resume_pending.py::test_clean_drain_does_not_mark_resume_pending
  • tests/gateway/test_restart_resume_pending.py::test_drain_timeout_only_marks_still_running_sessions
  • tests/gateway/test_runner_startup_failures.py::test_start_gateway_replace_force_uses_terminate_pid
  • tests/hermes_cli/test_gateway_service.py::TestGatewaySystemServiceRouting::test_gateway_install_passes_system_flags
  • tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages::test_install_wsl_with_systemd_warns
  • tests/hermes_cli/test_update_gateway_restart.py::TestLaunchdPlistPath::test_plist_path_starts_with_venv_bin
  • tests/tools/test_file_operations.py::TestGitBaselineCheck::test_git_not_available_returns_none
  • tests/tools/test_file_operations.py::TestGitBaselineCheck::test_not_in_git_repo_returns_none
  • tests/tools/test_file_operations.py::TestGitBaselineCheck::test_clean_repo_returns_none
  • tests/tools/test_file_operations.py::TestGitBaselineCheck::test_dirty_repo_returns_warning
  • tests/tools/test_file_operations.py::TestGitBaselineCheck::test_write_file_includes_git_warning_when_dirty
  • tests/tools/test_tirith_security.py::TestDiskFailureMarker::test_cosign_missing_marker_clears_when_cosign_appears

The Discord voice-note regression tests added by this PR pass in the targeted run.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: WSL2 / Linux local test environment

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

N/A

Screenshots / Logs

Targeted test output:

  • tests/gateway/test_discord_attachment_download.py tests/gateway/test_telegram_audio_vs_voice.py: 19 passed
  • tests/gateway/test_stt_config.py: 6 passed

Full suite output summary:

  • scripts/run_tests.sh: 17 failed, 24561 passed, 54 skipped, 250 warnings in 646.20s

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/discord Discord bot adapter tool/tts Text-to-speech and transcription labels May 19, 2026
@helix4u helix4u marked this pull request as ready for review May 19, 2026 20:44
teknium1 added a commit that referenced this pull request May 20, 2026
Sibling fix to PR #28918 (Discord voice notes). DingTalk's rich-text
"voice" item type is its native voice-message format, but the adapter
was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips
for STT. The docs claim every voice-capable platform auto-transcribes,
so this brings DingTalk in line.

Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are
unchanged — they were already classified as DOCUMENT, not AUDIO.

Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the
voice path and the audio-passthrough invariant.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #28993 — your commit was cherry-picked onto current main with your authorship preserved (commit 448a3f9, rebase-merge). Thanks @helix4u!

Audit of the other gateway platforms turned up DingTalk with the same bug (rich-text voice items routed to MessageType.AUDIO), so that's fixed in the same PR as a follow-up commit. Every other voice-capable adapter (Telegram, WhatsApp, Signal, Slack, Matrix, Mattermost, BlueBubbles, WeCom, Weixin, Yuanbao, QQBot) was already emitting MessageType.VOICE correctly. Feishu's API has no separate voice-note type, so its AUDIO-only behavior is policy-correct.

#28993

EloquentBrush0x added a commit to EloquentBrush0x/hermes-agent that referenced this pull request May 21, 2026
Feishu's "audio" message type is exclusively for in-app voice
recordings, but _resolve_normalized_message_type was delegating to
_resolve_media_message_type which maps audio/* MIME types to
MessageType.AUDIO.  gateway/run.py:7605 skips STT for AUDIO, so every
voice note sent on Feishu was silently dropped instead of transcribed.

Generic audio file uploads in Feishu travel through message_type="file"
→ preferred_message_type="document", never through the "audio" branch,
so returning MessageType.VOICE here is unambiguous.

Sibling fix to PR NousResearch#28922 (DingTalk) and PR NousResearch#28918 (Discord) which
corrected the same AUDIO-vs-VOICE misclassification on those platforms.

Update the existing test to assert MessageType.VOICE and rename it to
reflect the invariant it actually guards.
Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text
"voice" item type is its native voice-message format, but the adapter
was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips
for STT. The docs claim every voice-capable platform auto-transcribes,
so this brings DingTalk in line.

Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are
unchanged — they were already classified as DOCUMENT, not AUDIO.

Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the
voice path and the audio-passthrough invariant.
Gpapas pushed a commit to Gpapas/hermes-agent that referenced this pull request May 23, 2026
Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text
"voice" item type is its native voice-message format, but the adapter
was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips
for STT. The docs claim every voice-capable platform auto-transcribes,
so this brings DingTalk in line.

Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are
unchanged — they were already classified as DOCUMENT, not AUDIO.

Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the
voice path and the audio-passthrough invariant.
Mucky010 pushed a commit to Mucky010/hermes-agent that referenced this pull request May 24, 2026
Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text
"voice" item type is its native voice-message format, but the adapter
was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips
for STT. The docs claim every voice-capable platform auto-transcribes,
so this brings DingTalk in line.

Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are
unchanged — they were already classified as DOCUMENT, not AUDIO.

Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the
voice path and the audio-passthrough invariant.
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text
"voice" item type is its native voice-message format, but the adapter
was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips
for STT. The docs claim every voice-capable platform auto-transcribes,
so this brings DingTalk in line.

Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are
unchanged — they were already classified as DOCUMENT, not AUDIO.

Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the
voice path and the audio-passthrough invariant.

#AI commit#
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text
"voice" item type is its native voice-message format, but the adapter
was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips
for STT. The docs claim every voice-capable platform auto-transcribes,
so this brings DingTalk in line.

Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are
unchanged — they were already classified as DOCUMENT, not AUDIO.

Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the
voice path and the audio-passthrough invariant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/discord Discord bot adapter tool/tts Text-to-speech and transcription type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants