Skip to content

fix(gateway): transcribe native voice notes (Discord + DingTalk)#28993

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-91361c28
May 20, 2026
Merged

fix(gateway): transcribe native voice notes (Discord + DingTalk)#28993
teknium1 merged 2 commits into
mainfrom
hermes/hermes-91361c28

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvages #28918 (helix4u/Gille) onto current main + extends the same fix to DingTalk after a parity audit.

Summary

Discord and DingTalk native voice notes were being classified as MessageType.AUDIO instead of MessageType.VOICE. Gateway STT routing at gateway/run.py:7605 intentionally skips AUDIO (file uploads — never auto-transcribe an mp3) and always STTs VOICE (voice bubbles — always transcribe). Discord and DingTalk are the only two voice-capable platforms that never emitted VOICE, contradicting the docs at website/docs/user-guide/features/tts.md:302.

This wasn't a recent regression — git log -S 'MessageType.VOICE' -- gateway/platforms/discord.py returns zero results. Discord auto-STT never worked. DingTalk had a 'voice' -> 'audio' mapping in DINGTALK_TYPE_MAPPING that hinted at the intent but never reached VOICE.

Changes

  • gateway/platforms/discord.py — cherry-picked from fix(discord): transcribe native voice notes #28918. Use Attachment.is_voice_message() (with duration+waveform fallback for older discord.py) to split native voice notes from generic audio uploads. Authored by @helix4u.
  • gateway/platforms/dingtalk.py — when a rich-text item has type: voice, route to MessageType.VOICE instead of MessageType.AUDIO. Generic audio uploads (mapped to file by DINGTALK_TYPE_MAPPING) remain DOCUMENT as before.
  • tests/gateway/test_discord_attachment_download.py — regression tests from fix(discord): transcribe native voice notes #28918 covering both voice-note and plain-audio paths.
  • tests/gateway/test_dingtalk.py — new TestExtractMedia covering the same split.

Other platforms audited (telegram, whatsapp, signal, slack, matrix, mattermost, bluebubbles, wecom, weixin, yuanbao, qqbot) all already emit MessageType.VOICE correctly. Feishu has no separate voice-note semantic in its API — every audio comes in as type audio (file upload), so leaving it as AUDIO is policy-correct.

Validation

Targeted suite Result
test_discord_attachment_download.py 14 tests (incl. 2 new) pass
test_dingtalk.py 67 tests (incl. 2 new) pass
test_telegram_audio_vs_voice.py parity check pass
test_stt_config.py STT routing pass

scripts/run_tests.sh tests/gateway/test_dingtalk.py tests/gateway/test_discord_attachment_download.py tests/gateway/test_telegram_audio_vs_voice.py tests/gateway/test_stt_config.py — 92 passed.

Closes #28918.

helix4u and others added 2 commits May 19, 2026 17:13
Sibling fix to PR #28918 (Discord voice notes). DingTalk's rich-text
"voice" item type is its native voice-message format, but the adapter
was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips
for STT. The docs claim every voice-capable platform auto-transcribes,
so this brings DingTalk in line.

Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are
unchanged — they were already classified as DOCUMENT, not AUDIO.

Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the
voice path and the audio-passthrough invariant.
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-91361c28 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8991 on HEAD, 8985 on base (🆕 +6)

🆕 New issues (1):

Rule Count
invalid-argument-type 1
First entries
tests/gateway/test_dingtalk.py:588: [invalid-argument-type] invalid-argument-type: Argument to function `DingTalkAdapter._extract_media` is incorrect: Expected `DingTalkAdapter`, found `<class 'DingTalkAdapter'>`

✅ Fixed issues: none

Unchanged: 4736 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit 93734c2 into main May 20, 2026
20 of 21 checks passed
@teknium1 teknium1 deleted the hermes/hermes-91361c28 branch May 20, 2026 00:26
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/discord Discord bot adapter platform/dingtalk DingTalk adapter tool/tts Text-to-speech and transcription labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/dingtalk DingTalk adapter platform/discord Discord bot adapter tool/tts Text-to-speech and transcription type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants