feat(gateway): make auto-TTS markdown strip overridable via prepare_tts_text() hook#26134
Closed
francip wants to merge 1 commit into
Closed
feat(gateway): make auto-TTS markdown strip overridable via prepare_tts_text() hook#26134francip wants to merge 1 commit into
francip wants to merge 1 commit into
Conversation
…) hook Refactor the inlined `re.sub(...)[:4000].strip()` cleanup at the auto-TTS site in `_process_message_background` into an overridable method `BasePlatformAdapter.prepare_tts_text(text: str) -> str`. The default implementation is byte-identical to the previous inline expression — strip `* _ \` # [ ] ( )` and truncate to 4000 chars — so every existing adapter (Telegram, Discord, Slack, Matrix, IRC, etc.) gets exactly the same behaviour as before. Zero behaviour change for any consumer that doesn't override the method. Why add the hook: voice-first platform adapters need stricter cleanup than text-bubble platforms. The default strips a handful of markdown sigils, which is fine when the output goes into a Discord embed or a Telegram message bubble — but read aloud by a TTS engine, URLs (`https://example.com/foo`), fenced code blocks, file paths (`/Users/x/foo.py`), and `MEDIA:` tags turn into long sequences of unintelligible characters. With this hook an adapter can drop those spans before TTS while leaving the data-channel transcript intact for visual rendering. Without the hook, voice adapters have to either - duplicate the auto-TTS flow inside their own `handle_response` pipeline, which means re-implementing the entire `extract_media`, `extract_images`, `extract_local_files`, attachment routing and error-handling sequence in `_process_message_background`, or - live with TTS speaking URLs character-by-character. Both are worse than a 7-line method addition. Example consumer: https://github.com/kortexa-ai/hermes-livekit — LiveKit WebRTC voice gateway plugin. Its `LiveKitAdapter.prepare_tts_text()` additionally strips fenced code blocks, inline code, URLs, file paths, and `MEDIA:` tags before TTS synthesis, while the full response still reaches connected clients via the data channel. Drop-in installable via `pip install git+https://github.com/kortexa-ai/hermes-livekit.git`. Carved out of NousResearch#3894 (LiveKit WebRTC gateway PR) so the generic hook can land independently of the LiveKit platform itself.
teknium1
added a commit
that referenced
this pull request
May 17, 2026
…tors Adds release-note attribution mappings for 9 contributors from group 4: - @EloquentBrush0x (PR #26657) - @subtract0 (PR #25658) - @zwolniony (PR #26961) - @that-ambuj (PR #26582) - @zccyman (PR #25294) - @lidge-jun (PR #26814) - @phoenixshen (PR #26768) - @AhmetArif0 (PR #26635) - (francip already mapped from prior PR #26134 attribution) #27147 dropped from this batch — already landed on main as 4b17c24.
Contributor
|
Merged via PR #27308 — your commit was cherry-picked onto current |
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…tors Adds release-note attribution mappings for 9 contributors from group 4: - @EloquentBrush0x (PR NousResearch#26657) - @subtract0 (PR NousResearch#25658) - @zwolniony (PR NousResearch#26961) - @that-ambuj (PR NousResearch#26582) - @zccyman (PR NousResearch#25294) - @lidge-jun (PR NousResearch#26814) - @phoenixshen (PR NousResearch#26768) - @AhmetArif0 (PR NousResearch#26635) - (francip already mapped from prior PR NousResearch#26134 attribution) NousResearch#27147 dropped from this batch — already landed on main as 4b17c24.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Extracts the inlined markdown-strip-and-truncate at the auto-TTS site in
BasePlatformAdapter._process_message_backgroundinto an overridable method:…and replaces the call site:
That's the entire change. 8 lines added, 1 removed, one file.
Why
The default implementation is byte-identical to the previous inline expression, so this is a zero-behaviour-change refactor for every existing adapter (Telegram, Discord, Slack, Matrix, IRC, Mattermost, BlueBubbles, Feishu, DingTalk, WeCom, WhatsApp, Signal, etc.). They all continue to get exactly the same TTS text they got before.
The hook unblocks voice-first platform adapters. What works for text-bubble platforms is wrong for read-aloud platforms:
Run \pip install foo` then visit https://docs.example.com/install#step-2` looks fine in the chat bubble.Run backtick pip install foo backtick then visit h-t-t-p-s colon slash slash docs dot example dot com slash install hash step dash two. Markdown sigils get stripped, but URLs, file paths, fenced code blocks, andMEDIA:tags are spoken character-by-character.A voice-output adapter wants to drop those spans before TTS synthesis, while keeping the full text in the data-channel transcript for clients that render visually.
Today, without this hook, voice adapters have two options, both worse:
handle_responseoverride — which means re-implementingextract_media,extract_images,extract_local_files, attachment routing, the auto-TTS gate, error handling, and the auto-TTS playback ordering from_process_message_background. ~80 lines of fragile copy-paste that goes stale every time someone touches the base flow.A 7-line overridable method removes the dilemma without affecting any existing adapter.
Example consumer
kortexa-ai/hermes-livekit — a LiveKit WebRTC voice gateway plugin for hermes-agent, installable via pip into an existing hermes install:
pip install git+https://github.com/kortexa-ai/hermes-livekit.git hermes plugins enable livekitIt registers as a platform plugin via the
hermes_agent.pluginsentry-point group (zero core edits required to load it) and overridesprepare_tts_text()to additionally strip:``` ... ```)`...`)https?://...)/x/y,~/x,C:\x)MEDIA:<path>tags…before TTS synthesis. The full original response still reaches connected LiveKit clients via the data channel for visual rendering.
That override is the only piece of the LiveKit platform implementation that can't be done entirely from within a plugin against the existing
register_platform()hook surface. Every other LiveKit-specific behaviour (env-driven auto-enable, connected-status check, cron home-channel, allowed-users gate, platform prompt hint, interactive setup) maps cleanly onto an existing kwarg onregister_platform(). This one hook is the missing piece.Relation to #3894
#3894 (feat(gateway): add LiveKit WebRTC voice platform support) bundles this same hook change together with the full LiveKit platform adapter. This PR carves the generic hook out so it can be reviewed and merged independently — the hook is useful to any future voice-output platform adapter (LiveKit, Twilio Voice, telephony bridges, etc.), and shouldn't be coupled to LiveKit-specific review concerns.
Risk
Approximately zero. The default returns exactly the same string as the previous inline expression. Existing tests for the auto-TTS path do not need updates (verified manually — the call site is unreachable from anywhere except the auto-TTS branch, and behaviour at that branch is unchanged for non-overriding adapters).
Testing
Verified end-to-end against the consumer plugin: real LiveKit server → STT (qwen3-asr) → agent loop → TTS (qwen3-tts) → audio published back to the room → captured WAV transcribes to the agent's reply word-for-word.
The override in
hermes_livekit/adapter.pyactivates only when running against a hermes-agent build that has this hook inbase.py; on stock upstreammaintoday, the override is dead code and TTS falls back to the inline default — TTS still works, just speaks URLs aloud. This PR makes the override live for any installation that wants voice-friendly TTS prep.