Skip to content

feat(wecom): add voice message send support via media upload#12521

Closed
chqchshj wants to merge 4 commits into
NousResearch:mainfrom
chqchshj:feat/wecom-voice-send
Closed

feat(wecom): add voice message send support via media upload#12521
chqchshj wants to merge 4 commits into
NousResearch:mainfrom
chqchshj:feat/wecom-voice-send

Conversation

@chqchshj

Copy link
Copy Markdown
Contributor

Summary

Add voice message send support for WeCom (企业微信) callback adapter.

When the agent generates a TTS voice reply, the adapter now uploads the audio file to WeCom's temporary media store and sends it as a native voice message, instead of falling back to text.

Changes

  • **** — Upload audio and send as WeCom voice message via /cgi-bin/message/send
  • **** — Upload files to WeCom temporary media store via /cgi-bin/media/upload (respects 2MB voice limit)
  • **** — ffmpeg-based audio conversion to AMR format (WeCom voice API requires AMR 8kHz mono)

Design decisions

  • Graceful fallback: if voice upload or AMR conversion fails, falls back to the parent class send_voice (text-based reply)
  • Uses existing _resolve_app_for_chat for multi-app support
  • Auto-cleanup of temporary converted files
  • Requires ffmpeg on PATH for non-AMR audio files (logged warning if missing)

Test plan

  • All existing WeCom tests pass (54 passed, 3 skipped)
  • All CLI/quick_commands tests pass (124 passed)
  • Python import check passes

chqchshj added 2 commits April 19, 2026 18:48
- Add send_voice() to upload audio and send as WeCom voice message
- Add _upload_media() for WeCom temporary media store upload
- Add _convert_to_amr() for ffmpeg-based audio conversion (WeCom requires AMR 8kHz mono)
- Graceful fallback to text when voice upload/convert fails
- Auto-cleanup of temporary converted files
- Handle incoming voice messages in _build_event (previously dropped)
- Download voice audio from WeCom media store via _download_voice_media
- Cache locally using cache_audio_from_bytes for STT transcription pipeline
- Set MessageType.VOICE + media_urls so gateway auto-transcribes
- Make _build_event async (requires media download)
- Update tests to use async/await
chqchshj added 2 commits April 19, 2026 19:55
WeCom voice messages are in AMR format which most STT engines don't support.
Convert to WAV (16kHz mono) via ffmpeg before caching for transcription.
@alt-glitch alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists platform/wecom WeCom / WeChat Work adapter comp/gateway Gateway runner, session dispatch, delivery labels Apr 23, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Overlaps with #8312 which also adds voice message support for the WeCom callback adapter. Maintainers should decide which approach to merge.

@chqchshj

Copy link
Copy Markdown
Contributor Author

Closing this out for now from my side to reduce stale/open Hermes Agent PRs. Thanks for the review and context.

@chqchshj chqchshj closed this May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/wecom WeCom / WeChat Work adapter type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants