Skip to content

fix(tts): restore Fish Audio implementation reverted in 19098b06 (voice regression) + closes toryx-private#797#8

Merged
lmsanch merged 1 commit into
mainfrom
fix/restore-fish-audio-tts
Apr 23, 2026
Merged

fix(tts): restore Fish Audio implementation reverted in 19098b06 (voice regression) + closes toryx-private#797#8
lmsanch merged 1 commit into
mainfrom
fix/restore-fish-audio-tts

Conversation

@lmsanch

@lmsanch lmsanch commented Apr 23, 2026

Copy link
Copy Markdown
Owner

Cherry-pick of 00280b0 from feat/consult-colleague-stopgap branch (never merged). Revert in 19098b0 dropped 218 lines including _generate_fish_audio + the voice-only-reply suppression logic in base.py. Ray (new 6th MD deployed today) loaded the broken main fresh at 19:32 EDT and has been generating replies via the Edge TTS else-branch fallback — Microsoft generic voice instead of his Fish clone.

Also restores:

  • Voice-only reply (suppress text when TTS audio was sent)
  • SMTP port 465 support in email adapter
  • Groq reasoning guard in run.py

Permanence

TODO in follow-up PR: add CI test asserting provider == 'fish' branch exists in tts_tool.py, and remove the silent Edge fallback (fail-loud when provider doesn't match any known branch). This prevents future reverts from silently regressing this class again.

Closes lmsanch/toryx-private#797

…Groq reasoning guard

This is the patch that WAS in place and working as of Apr 17 but got
wiped during today's fork rebase (it was sitting as an uncommitted
stash on Spark). Restoring it and committing so it's permanent.

Five concerns, all bundled because they were developed together and
are interdependent for the MD-bot voice UX:

1. tools/tts_tool.py — add "fish" provider
   - _generate_fish_audio(): POST https://api.fish.audio/v1/tts with
     reference_id = voice_id from tts_config.fish.voice_id (or
     FISH_AUDIO_VOICE_ID env). Format: opus when Telegram session,
     mp3 otherwise. Zero new dependencies — stdlib urllib only.
   - dispatch stanza wires provider=="fish" to the helper.
   - .ogg extension + voice_compatible detection include "fish".
   - HERMES_SESSION_PLATFORM fallback to os.environ for contexts
     where gateway.session_context hasn't set the session var yet.

2. gateway/platforms/base.py — voice input gets voice-ONLY reply
   - `if text_content and not _tts_path:` — when the auto-TTS path
     generated audio for a voice-inbound turn, the text send is
     skipped. Before this change the gateway always sent BOTH voice
     and text for voice-in, which is the wrong UX.

3. gateway/run.py — brevity hint for voice input
   - When MessageType.VOICE, inject a system note into context:
     "your reply will be read aloud — 2-3 sentences max, no markdown,
     no lists". Necessary because long markdown text becomes painful
     spoken output.
   - Export HERMES_SESSION_PLATFORM to os.environ so tts_tool can
     read it from anywhere in the call stack.

4. gateway/platforms/email.py — SMTP 465 SSL + per-recipient From
   - _smtp_connect() helper picks SMTP_SSL for port 465 (Zoho), plain
     SMTP+STARTTLS otherwise. Before this the gateway hardcoded
     STARTTLS and failed against Zoho 465, which is what the
     spark/errors.log "SMTP Connection unexpectedly closed" loop was.
   - Captures Delivered-To / To on inbound mail and replies From that
     address so multi-persona inboxes don't leak the wrong identity.

5. run_agent.py — _supports_reasoning_content() provider guard
   - Groq / Cerebras / SambaNova reject `reasoning_content` in the
     messages payload. When the current provider matches one of them,
     omit the field. Prevents 400 errors on TS turns that sampled a
     non-reasoning provider.

How this was lost + how to prevent regression
The patch was uncommitted on Spark's ~/.hermes/hermes-agent/ tree as
stash@{0} "pre-fork-repin 2026-04-18". When today's rebase repointed
origin and I ran `git reset --hard` to fast-forward Spark, the stash
survived but nothing re-applied it. For ~4 hours the 5 live gateways
ran without Fish Audio (edge-tts fallback, producing MP3 with a
generic voice and ALSO text — i.e. the exact symptoms Luis flagged).

Committing here on the fork branch keeps the patch with the code
that depends on it. `git stash pop` was already done on Spark so
those gateways are already fixed; DNN now matches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant