fix(gateway): transcribe native voice notes (Discord + DingTalk)#28993
Merged
Conversation
Sibling fix to PR #28918 (Discord voice notes). DingTalk's rich-text "voice" item type is its native voice-message format, but the adapter was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips for STT. The docs claim every voice-capable platform auto-transcribes, so this brings DingTalk in line. Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are unchanged — they were already classified as DOCUMENT, not AUDIO. Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the voice path and the audio-passthrough invariant.
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
invalid-argument-type |
1 |
First entries
tests/gateway/test_dingtalk.py:588: [invalid-argument-type] invalid-argument-type: Argument to function `DingTalkAdapter._extract_media` is incorrect: Expected `DingTalkAdapter`, found `<class 'DingTalkAdapter'>`
✅ Fixed issues: none
Unchanged: 4736 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
19 tasks
19 tasks
6 tasks
13 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Salvages #28918 (helix4u/Gille) onto current main + extends the same fix to DingTalk after a parity audit.
Summary
Discord and DingTalk native voice notes were being classified as
MessageType.AUDIOinstead ofMessageType.VOICE. Gateway STT routing atgateway/run.py:7605intentionally skipsAUDIO(file uploads — never auto-transcribe an mp3) and always STTsVOICE(voice bubbles — always transcribe). Discord and DingTalk are the only two voice-capable platforms that never emittedVOICE, contradicting the docs atwebsite/docs/user-guide/features/tts.md:302.This wasn't a recent regression —
git log -S 'MessageType.VOICE' -- gateway/platforms/discord.pyreturns zero results. Discord auto-STT never worked. DingTalk had a'voice' -> 'audio'mapping inDINGTALK_TYPE_MAPPINGthat hinted at the intent but never reachedVOICE.Changes
gateway/platforms/discord.py— cherry-picked from fix(discord): transcribe native voice notes #28918. UseAttachment.is_voice_message()(with duration+waveform fallback for older discord.py) to split native voice notes from generic audio uploads. Authored by @helix4u.gateway/platforms/dingtalk.py— when a rich-text item hastype: voice, route toMessageType.VOICEinstead ofMessageType.AUDIO. Generic audio uploads (mapped tofilebyDINGTALK_TYPE_MAPPING) remainDOCUMENTas before.tests/gateway/test_discord_attachment_download.py— regression tests from fix(discord): transcribe native voice notes #28918 covering both voice-note and plain-audio paths.tests/gateway/test_dingtalk.py— newTestExtractMediacovering the same split.Other platforms audited (telegram, whatsapp, signal, slack, matrix, mattermost, bluebubbles, wecom, weixin, yuanbao, qqbot) all already emit
MessageType.VOICEcorrectly. Feishu has no separate voice-note semantic in its API — every audio comes in as typeaudio(file upload), so leaving it asAUDIOis policy-correct.Validation
test_discord_attachment_download.pytest_dingtalk.pytest_telegram_audio_vs_voice.pytest_stt_config.pyscripts/run_tests.sh tests/gateway/test_dingtalk.py tests/gateway/test_discord_attachment_download.py tests/gateway/test_telegram_audio_vs_voice.py tests/gateway/test_stt_config.py— 92 passed.Closes #28918.