Skip to content

feat(gateway): make auto-TTS markdown strip overridable via prepare_tts_text() hook#26134

Closed
francip wants to merge 1 commit into
NousResearch:mainfrom
kortexa-ai:kortexa/prepare-tts-text-hook
Closed

feat(gateway): make auto-TTS markdown strip overridable via prepare_tts_text() hook#26134
francip wants to merge 1 commit into
NousResearch:mainfrom
kortexa-ai:kortexa/prepare-tts-text-hook

Conversation

@francip

@francip francip commented May 15, 2026

Copy link
Copy Markdown
Contributor

What

Extracts the inlined markdown-strip-and-truncate at the auto-TTS site in BasePlatformAdapter._process_message_background into an overridable method:

def prepare_tts_text(self, text: str) -> str:
    """Prepare text for TTS. Override to filter tool output, code, etc.

    Default strips markdown formatting and truncates to 4000 chars.
    """
    return re.sub(r'[*_`#\[\]()]', '', text)[:4000].strip()

…and replaces the call site:

-speech_text = re.sub(r'[*_`#\[\]()]', '', text_content)[:4000].strip()
+speech_text = self.prepare_tts_text(text_content)

That's the entire change. 8 lines added, 1 removed, one file.

Why

The default implementation is byte-identical to the previous inline expression, so this is a zero-behaviour-change refactor for every existing adapter (Telegram, Discord, Slack, Matrix, IRC, Mattermost, BlueBubbles, Feishu, DingTalk, WeCom, WhatsApp, Signal, etc.). They all continue to get exactly the same TTS text they got before.

The hook unblocks voice-first platform adapters. What works for text-bubble platforms is wrong for read-aloud platforms:

  • A Telegram bot reply containing Run \pip install foo` then visit https://docs.example.com/install#step-2` looks fine in the chat bubble.
  • A LiveKit voice agent speaking that same reply has to read it out as Run backtick pip install foo backtick then visit h-t-t-p-s colon slash slash docs dot example dot com slash install hash step dash two. Markdown sigils get stripped, but URLs, file paths, fenced code blocks, and MEDIA: tags are spoken character-by-character.

A voice-output adapter wants to drop those spans before TTS synthesis, while keeping the full text in the data-channel transcript for clients that render visually.

Today, without this hook, voice adapters have two options, both worse:

  1. Duplicate the auto-TTS pipeline inside their own handle_response override — which means re-implementing extract_media, extract_images, extract_local_files, attachment routing, the auto-TTS gate, error handling, and the auto-TTS playback ordering from _process_message_background. ~80 lines of fragile copy-paste that goes stale every time someone touches the base flow.
  2. Live with TTS reading URLs and code blocks aloud.

A 7-line overridable method removes the dilemma without affecting any existing adapter.

Example consumer

kortexa-ai/hermes-livekit — a LiveKit WebRTC voice gateway plugin for hermes-agent, installable via pip into an existing hermes install:

pip install git+https://github.com/kortexa-ai/hermes-livekit.git
hermes plugins enable livekit

It registers as a platform plugin via the hermes_agent.plugins entry-point group (zero core edits required to load it) and overrides prepare_tts_text() to additionally strip:

  • Fenced code blocks (``` ... ```)
  • Inline code (`...`)
  • URLs (https?://...)
  • File paths (/x/y, ~/x, C:\x)
  • MEDIA:<path> tags
  • Collapses repeated whitespace

…before TTS synthesis. The full original response still reaches connected LiveKit clients via the data channel for visual rendering.

That override is the only piece of the LiveKit platform implementation that can't be done entirely from within a plugin against the existing register_platform() hook surface. Every other LiveKit-specific behaviour (env-driven auto-enable, connected-status check, cron home-channel, allowed-users gate, platform prompt hint, interactive setup) maps cleanly onto an existing kwarg on register_platform(). This one hook is the missing piece.

Relation to #3894

#3894 (feat(gateway): add LiveKit WebRTC voice platform support) bundles this same hook change together with the full LiveKit platform adapter. This PR carves the generic hook out so it can be reviewed and merged independently — the hook is useful to any future voice-output platform adapter (LiveKit, Twilio Voice, telephony bridges, etc.), and shouldn't be coupled to LiveKit-specific review concerns.

Risk

Approximately zero. The default returns exactly the same string as the previous inline expression. Existing tests for the auto-TTS path do not need updates (verified manually — the call site is unreachable from anywhere except the auto-TTS branch, and behaviour at that branch is unchanged for non-overriding adapters).

Testing

Verified end-to-end against the consumer plugin: real LiveKit server → STT (qwen3-asr) → agent loop → TTS (qwen3-tts) → audio published back to the room → captured WAV transcribes to the agent's reply word-for-word.

The override in hermes_livekit/adapter.py activates only when running against a hermes-agent build that has this hook in base.py; on stock upstream main today, the override is dead code and TTS falls back to the inline default — TTS still works, just speaks URLs aloud. This PR makes the override live for any installation that wants voice-friendly TTS prep.

…) hook

Refactor the inlined `re.sub(...)[:4000].strip()` cleanup at the
auto-TTS site in `_process_message_background` into an overridable
method `BasePlatformAdapter.prepare_tts_text(text: str) -> str`.

The default implementation is byte-identical to the previous inline
expression — strip `* _ \` # [ ] ( )` and truncate to 4000 chars — so
every existing adapter (Telegram, Discord, Slack, Matrix, IRC, etc.)
gets exactly the same behaviour as before. Zero behaviour change for
any consumer that doesn't override the method.

Why add the hook: voice-first platform adapters need stricter
cleanup than text-bubble platforms. The default strips a handful of
markdown sigils, which is fine when the output goes into a Discord
embed or a Telegram message bubble — but read aloud by a TTS engine,
URLs (`https://example.com/foo`), fenced code blocks, file paths
(`/Users/x/foo.py`), and `MEDIA:` tags turn into long sequences of
unintelligible characters. With this hook an adapter can drop those
spans before TTS while leaving the data-channel transcript intact
for visual rendering.

Without the hook, voice adapters have to either
  - duplicate the auto-TTS flow inside their own `handle_response`
    pipeline, which means re-implementing the entire `extract_media`,
    `extract_images`, `extract_local_files`, attachment routing and
    error-handling sequence in `_process_message_background`, or
  - live with TTS speaking URLs character-by-character.

Both are worse than a 7-line method addition.

Example consumer:
  https://github.com/kortexa-ai/hermes-livekit — LiveKit WebRTC voice
  gateway plugin. Its `LiveKitAdapter.prepare_tts_text()` additionally
  strips fenced code blocks, inline code, URLs, file paths, and
  `MEDIA:` tags before TTS synthesis, while the full response still
  reaches connected clients via the data channel. Drop-in installable
  via `pip install git+https://github.com/kortexa-ai/hermes-livekit.git`.

Carved out of NousResearch#3894 (LiveKit WebRTC gateway PR) so the generic hook
can land independently of the LiveKit platform itself.
@alt-glitch alt-glitch added type/refactor Code restructuring, no behavior change comp/gateway Gateway runner, session dispatch, delivery tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels May 15, 2026
NishantEC

This comment was marked as outdated.

teknium1 added a commit that referenced this pull request May 17, 2026
…tors

Adds release-note attribution mappings for 9 contributors from group 4:
- @EloquentBrush0x (PR #26657)
- @subtract0 (PR #25658)
- @zwolniony (PR #26961)
- @that-ambuj (PR #26582)
- @zccyman (PR #25294)
- @lidge-jun (PR #26814)
- @phoenixshen (PR #26768)
- @AhmetArif0 (PR #26635)
- (francip already mapped from prior PR #26134 attribution)

#27147 dropped from this batch — already landed on main as 4b17c24.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #27308 — your commit was cherry-picked onto current main as part of a batch salvage of low-risk new-contributor PRs. Authorship preserved (feat(gateway): extract auto-TTS markdown strip into prepare_tts_text() hook). Thanks for the contribution.

@teknium1 teknium1 closed this May 17, 2026
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…tors

Adds release-note attribution mappings for 9 contributors from group 4:
- @EloquentBrush0x (PR NousResearch#26657)
- @subtract0 (PR NousResearch#25658)
- @zwolniony (PR NousResearch#26961)
- @that-ambuj (PR NousResearch#26582)
- @zccyman (PR NousResearch#25294)
- @lidge-jun (PR NousResearch#26814)
- @phoenixshen (PR NousResearch#26768)
- @AhmetArif0 (PR NousResearch#26635)
- (francip already mapped from prior PR NousResearch#26134 attribution)

NousResearch#27147 dropped from this batch — already landed on main as 4b17c24.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription type/refactor Code restructuring, no behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants