fix(gateway): transcribe native voice notes (Discord + DingTalk) by teknium1 · Pull Request #28993 · NousResearch/hermes-agent

teknium1 · 2026-05-20T00:16:24Z

Salvages #28918 (helix4u/Gille) onto current main + extends the same fix to DingTalk after a parity audit.

Summary

Discord and DingTalk native voice notes were being classified as MessageType.AUDIO instead of MessageType.VOICE. Gateway STT routing at gateway/run.py:7605 intentionally skips AUDIO (file uploads — never auto-transcribe an mp3) and always STTs VOICE (voice bubbles — always transcribe). Discord and DingTalk are the only two voice-capable platforms that never emitted VOICE, contradicting the docs at website/docs/user-guide/features/tts.md:302.

This wasn't a recent regression — git log -S 'MessageType.VOICE' -- gateway/platforms/discord.py returns zero results. Discord auto-STT never worked. DingTalk had a 'voice' -> 'audio' mapping in DINGTALK_TYPE_MAPPING that hinted at the intent but never reached VOICE.

Changes

gateway/platforms/discord.py — cherry-picked from fix(discord): transcribe native voice notes #28918. Use Attachment.is_voice_message() (with duration+waveform fallback for older discord.py) to split native voice notes from generic audio uploads. Authored by @helix4u.
gateway/platforms/dingtalk.py — when a rich-text item has type: voice, route to MessageType.VOICE instead of MessageType.AUDIO. Generic audio uploads (mapped to file by DINGTALK_TYPE_MAPPING) remain DOCUMENT as before.
tests/gateway/test_discord_attachment_download.py — regression tests from fix(discord): transcribe native voice notes #28918 covering both voice-note and plain-audio paths.
tests/gateway/test_dingtalk.py — new TestExtractMedia covering the same split.

Other platforms audited (telegram, whatsapp, signal, slack, matrix, mattermost, bluebubbles, wecom, weixin, yuanbao, qqbot) all already emit MessageType.VOICE correctly. Feishu has no separate voice-note semantic in its API — every audio comes in as type audio (file upload), so leaving it as AUDIO is policy-correct.

Validation

	Targeted suite	Result
`test_discord_attachment_download.py`	14 tests (incl. 2 new)	pass
`test_dingtalk.py`	67 tests (incl. 2 new)	pass
`test_telegram_audio_vs_voice.py`	parity check	pass
`test_stt_config.py`	STT routing	pass

scripts/run_tests.sh tests/gateway/test_dingtalk.py tests/gateway/test_discord_attachment_download.py tests/gateway/test_telegram_audio_vs_voice.py tests/gateway/test_stt_config.py — 92 passed.

Closes #28918.

Sibling fix to PR #28918 (Discord voice notes). DingTalk's rich-text "voice" item type is its native voice-message format, but the adapter was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips for STT. The docs claim every voice-capable platform auto-transcribes, so this brings DingTalk in line. Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are unchanged — they were already classified as DOCUMENT, not AUDIO. Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the voice path and the audio-passthrough invariant.

github-actions · 2026-05-20T00:16:59Z

🔎 Lint report: `hermes/hermes-91361c28` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8991 on HEAD, 8985 on base (🆕 +6)

🆕 New issues (1):

Rule	Count
`invalid-argument-type`	1

First entries

tests/gateway/test_dingtalk.py:588: [invalid-argument-type] invalid-argument-type: Argument to function `DingTalkAdapter._extract_media` is incorrect: Expected `DingTalkAdapter`, found `<class 'DingTalkAdapter'>`

✅ Fixed issues: none

Unchanged: 4736 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

helix4u and others added 2 commits May 19, 2026 17:13

fix(discord): transcribe native voice notes

281ca7a

teknium1 merged commit 93734c2 into main May 20, 2026
20 of 21 checks passed

teknium1 deleted the hermes/hermes-91361c28 branch May 20, 2026 00:26

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/discord Discord bot adapter platform/dingtalk DingTalk adapter tool/tts Text-to-speech and transcription labels May 20, 2026

teknium1 mentioned this pull request May 20, 2026

fix(discord): transcribe native voice notes #28918

Closed

19 tasks

Haderach-Ram mentioned this pull request May 20, 2026

Ecosystem Digest — 2026-05-20 Haderach-Ram/openclaw-radar#13

Open

wuli666 mentioned this pull request May 20, 2026

fix(feishu): classify native voice messages as VOICE for auto-transcription #29235

Open

19 tasks

alt-glitch mentioned this pull request May 20, 2026

fix(feishu): transcribe native voice notes #29295

Open

6 tasks

liuhao1024 mentioned this pull request Jun 3, 2026

fix(dingtalk): prevent richText fallback from resetting voice to text #38225

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): transcribe native voice notes (Discord + DingTalk)#28993

fix(gateway): transcribe native voice notes (Discord + DingTalk)#28993
teknium1 merged 2 commits into
mainfrom
hermes/hermes-91361c28

teknium1 commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 20, 2026

Summary

Changes

Validation

Uh oh!

github-actions Bot commented May 20, 2026

🔎 Lint report: hermes/hermes-91361c28 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🔎 Lint report: `hermes/hermes-91361c28` vs `origin/main`