Skip to content

feat(telegram): skip-STT audio path + 2GB cap via local Bot API server#28541

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6063e704
May 19, 2026
Merged

feat(telegram): skip-STT audio path + 2GB cap via local Bot API server#28541
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6063e704

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvage of #25280 (@alber70g). Two coordinated changes for downstream audio pipelines on Telegram attachments larger than the public Bot API's 20MB getFile ceiling:

  1. stt.enabled: false surfaces audio file paths: instead of a no-op note, the gateway caches the voice/audio attachment, probes duration (wave → mutagen → ffprobe), and surfaces [The user sent a voice message: <path> (duration: …)] so skills/tools can consume the audio directly.
  2. 2GB document cap when local Bot API server configured: extra.base_url opt-in raises the document size cap from 20MB → 2GB to match the locally-hosted telegram-bot-api server's limit.

Authorship preserved via cherry-pick. 41/41 documents + 3/3 max-doc-bytes tests passing.

alber70g and others added 2 commits May 18, 2026 22:59
Two coordinated changes that unblock downstream audio pipelines
(diarization, custom transcription, archival) on attachments larger
than the public Bot API's 20MB getFile ceiling.

- `stt.enabled: false` no longer drops voice/audio with a generic
  "transcription disabled" note. The gateway probes the cached file's
  duration (wave → mutagen → ffprobe ladder) and surfaces
  `[The user sent a voice message: <abs path> (duration: M:SS)]` to
  the agent so a skill or tool can pick up the raw file. The previous
  placeholder is replaced rather than appended when present.

- `platforms.telegram.extra.base_url` set → adapter auto-lifts its
  document size cap from 20MB to 2GB (the local telegram-bot-api
  `--local` ceiling) and the "too large" reply reports the active
  limit dynamically. No new config knob; presence of `base_url` is the
  opt-in.

- `platforms.telegram.extra.local_mode: true` wires
  `Application.builder().local_mode(True)` on the python-telegram-bot
  builder. PTB then reads files from disk instead of HTTP, which is
  required when telegram-bot-api runs in `--local` mode (the server
  returns absolute filesystem paths, not `/file/bot...` URLs).

- gateway/run.py: rewrites the `stt.enabled: false` branch of
  `_enrich_message_with_transcription`. New `_format_duration` +
  `_probe_audio_duration` helpers.
- gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute
  derived from `extra.base_url`; `local_mode` builder wiring;
  dynamic "too large" message.
- tests/gateway/test_stt_config.py: covers path-surfacing with and
  without an existing user message, and placeholder replacement.
- tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB
  without base_url, 2GB when set, empty-string base_url keeps default.
- website/docs/user-guide/messaging/telegram.md: new "Skipping STT"
  subsection under Voice Messages and a full "Large Files (>20MB) via
  Local Bot API Server" walkthrough (api_id/api_hash, docker-compose,
  one-time `logOut` migration, `platforms.telegram.extra` config, the
  `local_mode` disk-access requirement, the silent HTTP-fallback 404).
- website/docs/user-guide/features/voice-mode.md: documents the
  `stt.enabled` knob in the config reference.

- `pytest tests/gateway/test_telegram_max_doc_bytes.py
  tests/gateway/test_stt_config.py` → 9/9 passing.
- Verified end-to-end on a live deployment: gateway log shows
  `Using custom Telegram base_url: http://...` and
  `Using Telegram local_mode (read files from disk)` on startup;
  voice messages above 20MB cache to disk and surface their path to
  the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@teknium1 teknium1 merged commit 69b1d31 into main May 19, 2026
@teknium1 teknium1 deleted the hermes/hermes-6063e704 branch May 19, 2026 05:59
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-6063e704 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8952 on HEAD, 8951 on base (🆕 +1)

🆕 New issues (1):

Rule Count
unresolved-import 1
First entries
gateway/run.py:952: [unresolved-import] unresolved-import: Cannot resolve imported module `mutagen.oggopus`

✅ Fixed issues: none

Unchanged: 4701 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter tool/tts Text-to-speech and transcription labels May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants