feat(gateway): make auto-TTS markdown strip overridable via prepare_tts_text() hook by francip · Pull Request #26134 · NousResearch/hermes-agent

francip · 2026-05-15T05:38:33Z

What

Extracts the inlined markdown-strip-and-truncate at the auto-TTS site in BasePlatformAdapter._process_message_background into an overridable method:

def prepare_tts_text(self, text: str) -> str:
    """Prepare text for TTS. Override to filter tool output, code, etc.

    Default strips markdown formatting and truncates to 4000 chars.
    """
    return re.sub(r'[*_`#\[\]()]', '', text)[:4000].strip()

…and replaces the call site:

-speech_text = re.sub(r'[*_`#\[\]()]', '', text_content)[:4000].strip()
+speech_text = self.prepare_tts_text(text_content)

That's the entire change. 8 lines added, 1 removed, one file.

Why

The default implementation is byte-identical to the previous inline expression, so this is a zero-behaviour-change refactor for every existing adapter (Telegram, Discord, Slack, Matrix, IRC, Mattermost, BlueBubbles, Feishu, DingTalk, WeCom, WhatsApp, Signal, etc.). They all continue to get exactly the same TTS text they got before.

The hook unblocks voice-first platform adapters. What works for text-bubble platforms is wrong for read-aloud platforms:

A Telegram bot reply containing Run \pip install foo` then visit https://docs.example.com/install#step-2` looks fine in the chat bubble.
A LiveKit voice agent speaking that same reply has to read it out as Run backtick pip install foo backtick then visit h-t-t-p-s colon slash slash docs dot example dot com slash install hash step dash two. Markdown sigils get stripped, but URLs, file paths, fenced code blocks, and MEDIA: tags are spoken character-by-character.

A voice-output adapter wants to drop those spans before TTS synthesis, while keeping the full text in the data-channel transcript for clients that render visually.

Today, without this hook, voice adapters have two options, both worse:

Duplicate the auto-TTS pipeline inside their own handle_response override — which means re-implementing extract_media, extract_images, extract_local_files, attachment routing, the auto-TTS gate, error handling, and the auto-TTS playback ordering from _process_message_background. ~80 lines of fragile copy-paste that goes stale every time someone touches the base flow.
Live with TTS reading URLs and code blocks aloud.

A 7-line overridable method removes the dilemma without affecting any existing adapter.

Example consumer

kortexa-ai/hermes-livekit — a LiveKit WebRTC voice gateway plugin for hermes-agent, installable via pip into an existing hermes install:

pip install git+https://github.com/kortexa-ai/hermes-livekit.git
hermes plugins enable livekit

It registers as a platform plugin via the hermes_agent.plugins entry-point group (zero core edits required to load it) and overrides prepare_tts_text() to additionally strip:

Fenced code blocks (``` ... ```)
Inline code (`...`)
URLs (https?://...)
File paths (/x/y, ~/x, C:\x)
MEDIA:<path> tags
Collapses repeated whitespace

…before TTS synthesis. The full original response still reaches connected LiveKit clients via the data channel for visual rendering.

That override is the only piece of the LiveKit platform implementation that can't be done entirely from within a plugin against the existing register_platform() hook surface. Every other LiveKit-specific behaviour (env-driven auto-enable, connected-status check, cron home-channel, allowed-users gate, platform prompt hint, interactive setup) maps cleanly onto an existing kwarg on register_platform(). This one hook is the missing piece.

Relation to #3894

#3894 (feat(gateway): add LiveKit WebRTC voice platform support) bundles this same hook change together with the full LiveKit platform adapter. This PR carves the generic hook out so it can be reviewed and merged independently — the hook is useful to any future voice-output platform adapter (LiveKit, Twilio Voice, telephony bridges, etc.), and shouldn't be coupled to LiveKit-specific review concerns.

Risk

Approximately zero. The default returns exactly the same string as the previous inline expression. Existing tests for the auto-TTS path do not need updates (verified manually — the call site is unreachable from anywhere except the auto-TTS branch, and behaviour at that branch is unchanged for non-overriding adapters).

Testing

Verified end-to-end against the consumer plugin: real LiveKit server → STT (qwen3-asr) → agent loop → TTS (qwen3-tts) → audio published back to the room → captured WAV transcribes to the agent's reply word-for-word.

The override in hermes_livekit/adapter.py activates only when running against a hermes-agent build that has this hook in base.py; on stock upstream main today, the override is dead code and TTS falls back to the inline default — TTS still works, just speaks URLs aloud. This PR makes the override live for any installation that wants voice-friendly TTS prep.

…) hook Refactor the inlined `re.sub(...)[:4000].strip()` cleanup at the auto-TTS site in `_process_message_background` into an overridable method `BasePlatformAdapter.prepare_tts_text(text: str) -> str`. The default implementation is byte-identical to the previous inline expression — strip `* _ \` # [ ] ( )` and truncate to 4000 chars — so every existing adapter (Telegram, Discord, Slack, Matrix, IRC, etc.) gets exactly the same behaviour as before. Zero behaviour change for any consumer that doesn't override the method. Why add the hook: voice-first platform adapters need stricter cleanup than text-bubble platforms. The default strips a handful of markdown sigils, which is fine when the output goes into a Discord embed or a Telegram message bubble — but read aloud by a TTS engine, URLs (`https://example.com/foo`), fenced code blocks, file paths (`/Users/x/foo.py`), and `MEDIA:` tags turn into long sequences of unintelligible characters. With this hook an adapter can drop those spans before TTS while leaving the data-channel transcript intact for visual rendering. Without the hook, voice adapters have to either - duplicate the auto-TTS flow inside their own `handle_response` pipeline, which means re-implementing the entire `extract_media`, `extract_images`, `extract_local_files`, attachment routing and error-handling sequence in `_process_message_background`, or - live with TTS speaking URLs character-by-character. Both are worse than a 7-line method addition. Example consumer: https://github.com/kortexa-ai/hermes-livekit — LiveKit WebRTC voice gateway plugin. Its `LiveKitAdapter.prepare_tts_text()` additionally strips fenced code blocks, inline code, URLs, file paths, and `MEDIA:` tags before TTS synthesis, while the full response still reaches connected clients via the data channel. Drop-in installable via `pip install git+https://github.com/kortexa-ai/hermes-livekit.git`. Carved out of NousResearch#3894 (LiveKit WebRTC gateway PR) so the generic hook can land independently of the LiveKit platform itself.

@EloquentBrush0x

…tors Adds release-note attribution mappings for 9 contributors from group 4: - @EloquentBrush0x (PR #26657) - @subtract0 (PR #25658) - @zwolniony (PR #26961) - @that-ambuj (PR #26582) - @zccyman (PR #25294) - @lidge-jun (PR #26814) - @phoenixshen (PR #26768) - @AhmetArif0 (PR #26635) - (francip already mapped from prior PR #26134 attribution) #27147 dropped from this batch — already landed on main as 4b17c24.

teknium1 · 2026-05-17T06:12:06Z

Merged via PR #27308 — your commit was cherry-picked onto current main as part of a batch salvage of low-risk new-contributor PRs. Authorship preserved (feat(gateway): extract auto-TTS markdown strip into prepare_tts_text() hook). Thanks for the contribution.

@EloquentBrush0x

…tors Adds release-note attribution mappings for 9 contributors from group 4: - @EloquentBrush0x (PR NousResearch#26657) - @subtract0 (PR NousResearch#25658) - @zwolniony (PR NousResearch#26961) - @that-ambuj (PR NousResearch#26582) - @zccyman (PR NousResearch#25294) - @lidge-jun (PR NousResearch#26814) - @phoenixshen (PR NousResearch#26768) - @AhmetArif0 (PR NousResearch#26635) - (francip already mapped from prior PR NousResearch#26134 attribution) NousResearch#27147 dropped from this batch — already landed on main as 4b17c24.

alt-glitch added type/refactor Code restructuring, no behavior change comp/gateway Gateway runner, session dispatch, delivery tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels May 15, 2026

francip mentioned this pull request May 15, 2026

Voice/read-aloud platform adapters can't customize auto-TTS text prep without forking the auto-TTS pipeline #26176

Closed

This comment was marked as outdated.

Sign in to view

teknium1 mentioned this pull request May 17, 2026

Batch salvage group 4: 9 low-risk new-contributor PRs (proxy-env/gateway-fixes/security-headers/custom-providers) #27308

Merged

teknium1 closed this May 17, 2026

alt-glitch mentioned this pull request May 25, 2026

feat: generalized Slack adapter extension points (on_slack_app_init, transform_tts_text, plugin slash priority) #31848

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): make auto-TTS markdown strip overridable via prepare_tts_text() hook#26134

feat(gateway): make auto-TTS markdown strip overridable via prepare_tts_text() hook#26134
francip wants to merge 1 commit into
NousResearch:mainfrom
kortexa-ai:kortexa/prepare-tts-text-hook

francip commented May 15, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

teknium1 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

francip commented May 15, 2026

What

Why

Example consumer

Relation to #3894

Risk

Testing

Uh oh!

This comment was marked as outdated.

Uh oh!

teknium1 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants