fix(discord): transcribe native voice notes by helix4u · Pull Request #28918 · NousResearch/hermes-agent

helix4u · 2026-05-19T20:29:00Z

What does this PR do?

Fixes Discord native voice-note handling so inbound voice notes are classified as MessageType.VOICE instead of generic MessageType.AUDIO.

Discord.py exposes native voice-note metadata through Attachment.is_voice_message(). Hermes was only checking content_type.startswith("audio/"), which routed Discord voice notes into the plain audio attachment path. The gateway intentionally skips automatic STT for MessageType.AUDIO, so native Discord voice notes were cached as files instead of being automatically transcribed.

This keeps ordinary audio uploads as MessageType.AUDIO and only marks native Discord voice-message attachments as MessageType.VOICE.

Related Issue

Support report: Discord voice notes stopped auto-transcribing after update.

Fixes #

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

gateway/platforms/discord.py: detect native Discord voice-note attachments with attachment.is_voice_message() before the generic audio/* branch.
tests/gateway/test_discord_attachment_download.py: add regression coverage proving native voice notes become MessageType.VOICE while ordinary audio uploads remain MessageType.AUDIO.

How to Test

Send a native Discord voice note to the Hermes Discord bot.
Confirm the Discord adapter emits a MessageType.VOICE event for the attachment.
Confirm the gateway auto-STT path handles the voice note instead of surfacing it only as a generic audio file attachment.

Targeted tests run locally:

python -m pytest tests/gateway/test_discord_attachment_download.py tests/gateway/test_telegram_audio_vs_voice.py -q — 19 passed
python -m pytest tests/gateway/test_stt_config.py -q — 6 passed

Full suite run locally:

scripts/run_tests.sh — 17 failed, 24561 passed, 54 skipped, 250 warnings in 646.20s

Full-suite failures observed:

tests/gateway/test_api_server.py::TestAdapterInit::test_default_config
tests/gateway/test_approve_deny_commands.py::TestBlockingApprovalE2E::test_blocking_approval_approve_once
tests/gateway/test_approve_deny_commands.py::TestBlockingApprovalE2E::test_blocking_approval_deny
tests/gateway/test_config.py::TestLoadGatewayConfig::test_bridges_quoted_false_platform_enabled_from_config_yaml
tests/gateway/test_discord_bot_filter.py::TestDiscordBotFilter::test_default_is_none
tests/gateway/test_restart_resume_pending.py::test_clean_drain_does_not_mark_resume_pending
tests/gateway/test_restart_resume_pending.py::test_drain_timeout_only_marks_still_running_sessions
tests/gateway/test_runner_startup_failures.py::test_start_gateway_replace_force_uses_terminate_pid
tests/hermes_cli/test_gateway_service.py::TestGatewaySystemServiceRouting::test_gateway_install_passes_system_flags
tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages::test_install_wsl_with_systemd_warns
tests/hermes_cli/test_update_gateway_restart.py::TestLaunchdPlistPath::test_plist_path_starts_with_venv_bin
tests/tools/test_file_operations.py::TestGitBaselineCheck::test_git_not_available_returns_none
tests/tools/test_file_operations.py::TestGitBaselineCheck::test_not_in_git_repo_returns_none
tests/tools/test_file_operations.py::TestGitBaselineCheck::test_clean_repo_returns_none
tests/tools/test_file_operations.py::TestGitBaselineCheck::test_dirty_repo_returns_warning
tests/tools/test_file_operations.py::TestGitBaselineCheck::test_write_file_includes_git_warning_when_dirty
tests/tools/test_tirith_security.py::TestDiskFailureMarker::test_cosign_missing_marker_clears_when_cosign_appears

The Discord voice-note regression tests added by this PR pass in the targeted run.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: WSL2 / Linux local test environment

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

N/A

Screenshots / Logs

Targeted test output:

tests/gateway/test_discord_attachment_download.py tests/gateway/test_telegram_audio_vs_voice.py: 19 passed
tests/gateway/test_stt_config.py: 6 passed

Full suite output summary:

scripts/run_tests.sh: 17 failed, 24561 passed, 54 skipped, 250 warnings in 646.20s

Sibling fix to PR #28918 (Discord voice notes). DingTalk's rich-text "voice" item type is its native voice-message format, but the adapter was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips for STT. The docs claim every voice-capable platform auto-transcribes, so this brings DingTalk in line. Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are unchanged — they were already classified as DOCUMENT, not AUDIO. Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the voice path and the audio-passthrough invariant.

teknium1 · 2026-05-20T00:26:51Z

Merged via #28993 — your commit was cherry-picked onto current main with your authorship preserved (commit 448a3f9, rebase-merge). Thanks @helix4u!

Audit of the other gateway platforms turned up DingTalk with the same bug (rich-text voice items routed to MessageType.AUDIO), so that's fixed in the same PR as a follow-up commit. Every other voice-capable adapter (Telegram, WhatsApp, Signal, Slack, Matrix, Mattermost, BlueBubbles, WeCom, Weixin, Yuanbao, QQBot) was already emitting MessageType.VOICE correctly. Feishu's API has no separate voice-note type, so its AUDIO-only behavior is policy-correct.

#28993

Feishu's "audio" message type is exclusively for in-app voice recordings, but _resolve_normalized_message_type was delegating to _resolve_media_message_type which maps audio/* MIME types to MessageType.AUDIO. gateway/run.py:7605 skips STT for AUDIO, so every voice note sent on Feishu was silently dropped instead of transcribed. Generic audio file uploads in Feishu travel through message_type="file" → preferred_message_type="document", never through the "audio" branch, so returning MessageType.VOICE here is unambiguous. Sibling fix to PR NousResearch#28922 (DingTalk) and PR NousResearch#28918 (Discord) which corrected the same AUDIO-vs-VOICE misclassification on those platforms. Update the existing test to assert MessageType.VOICE and rename it to reflect the invariant it actually guards.

Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text "voice" item type is its native voice-message format, but the adapter was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips for STT. The docs claim every voice-capable platform auto-transcribes, so this brings DingTalk in line. Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are unchanged — they were already classified as DOCUMENT, not AUDIO. Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the voice path and the audio-passthrough invariant.

Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text "voice" item type is its native voice-message format, but the adapter was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips for STT. The docs claim every voice-capable platform auto-transcribes, so this brings DingTalk in line. Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are unchanged — they were already classified as DOCUMENT, not AUDIO. Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the voice path and the audio-passthrough invariant. #AI commit#

Sibling fix to PR NousResearch#28918 (Discord voice notes). DingTalk's rich-text "voice" item type is its native voice-message format, but the adapter was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips for STT. The docs claim every voice-capable platform auto-transcribes, so this brings DingTalk in line. Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are unchanged — they were already classified as DOCUMENT, not AUDIO. Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the voice path and the audio-passthrough invariant.

fix(discord): transcribe native voice notes

36e3cfd

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/discord Discord bot adapter tool/tts Text-to-speech and transcription labels May 19, 2026

helix4u marked this pull request as ready for review May 19, 2026 20:44

teknium1 mentioned this pull request May 20, 2026

fix(gateway): transcribe native voice notes (Discord + DingTalk) #28993

Merged

teknium1 closed this in #28993 May 20, 2026

EloquentBrush0x mentioned this pull request May 20, 2026

fix(feishu): transcribe native voice notes #29295

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(discord): transcribe native voice notes#28918

fix(discord): transcribe native voice notes#28918
helix4u wants to merge 1 commit into
NousResearch:mainfrom
helix4u:fix/discord-voice-note-transcription

helix4u commented May 19, 2026 •

edited

Loading

Uh oh!

teknium1 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

helix4u commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Uh oh!

teknium1 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

helix4u commented May 19, 2026 •

edited

Loading