fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half) by kshitijk4poor · Pull Request #41984 · NousResearch/hermes-agent

kshitijk4poor · 2026-06-08T09:37:56Z

Summary

Salvage of #6600 (by @kristianvast), re-scoped to the voice half only and rebased onto current main.

While preparing this, the cascading-interrupt-hang half of the original PR (Problem 2) landed independently in dd0d1222a (fix(agent): don't retry interrupt-induced transport errors) — same _request_cancelled request-local token approach, in agent/chat_completion_helpers.py, with its own test tests/agent/test_cascading_interrupt_6600.py. So this PR now carries only Problem 1 (voice-interrupt transcription), which is still absent from main.

Problem 1 — voice messages during active runs silently drop

When a voice/audio message arrives while the agent is busy on the same session, it hit the interrupt path with empty text because STT only ran after the running-agent guard — the voice was effectively lost (empty interrupt, nothing echoed back).

Fix:

Transcribe audio media before signalling the agent (in the interrupt path) and on the fresh-message path.
Echo the raw transcript back to the user (🎙️ "..."), so voice interrupts feel identical to fresh voice messages.
_enrich_message_with_transcription now returns (text, transcripts) so callers can echo.
New _dequeue_pending_with_transcription drives the post-agent drain the same way.

Re-applied onto _prepare_inbound_message_text (inbound enrichment was extracted from the inline dispatch block since the original PR was opened) rather than replaying the stale inline diff.

Verification

Local venv (py3.13), addopts overridden (repo default uses xdist):

tests/gateway/test_stt_config.py — pass (tuple-unpacking updated for the new (text, transcripts) return).
Coexistence + regression: tests/agent/test_cascading_interrupt_6600.py (the independently-landed Problem-2 suite), tests/run_agent/test_stream_interrupt_retry.py, test_interrupt_propagation.py, tests/tools/test_interrupt.py — 26 passed total, deterministic.
Audited that fix(gateway): use FIFO queue for busy_input_mode pending messages #33817's FIFO queue code and the Problem-2 cancellation fix both survive untouched in gateway/run.py (voice hunks sit in disjoint regions).

Note

The 🎙️ echo logic appears in a few sites (fresh-message path, interrupt path, dequeue helper) — could be factored into one _transcribe_and_echo helper in a follow-up; left faithful to the original intent here.

Closes #6600 (Problem 2 already landed via dd0d1222a; this completes Problem 1).

Co-authored-by: Kristian Vastveit kristian@agrointel.no

@kristianvast

Salvaged from NousResearch#6600 (@kristianvast) — re-scoped to the voice half only and rebased onto current main. The cascading-interrupt hang half of the original PR landed independently in dd0d122, so this carries ONLY Problem 1. When a voice/audio message arrives while the agent is busy on the same session, it hit the interrupt path with empty text because STT only ran after the running-agent guard — the voice was effectively lost. Now we transcribe audio BEFORE signaling the agent (and on the fresh-message path), echo the raw transcript back to the user (🎙️), and _enrich_message_with_transcription returns (text, transcripts) so callers can echo. A new _dequeue_pending_with_transcription drives the post-agent drain the same way. Reapplied onto _prepare_inbound_message_text (inbound enrichment was extracted from the inline dispatch block since the original PR). Co-authored-by: Kristian Vastveit <kristian@agrointel.no>

…ption (#42090) ## What does this PR do? The voice-during-active-run feature (#41984) changed `_enrich_message_with_transcription` so that it returns a `(enriched_text, successful_transcripts)` tuple instead of a bare string, which lets callers echo the raw transcript back to the user. The signature and every other return path were updated to match, but one branch was missed: when a successfully transcribed clip arrives with the Discord "empty content" placeholder as its caption, the method still returned the prefix string on its own. All four call sites unpack the result with `text, transcripts = await self._enrich_message_with_transcription(...)`, so that path raised `ValueError: too many values to unpack (expected 2)` and the inbound voice message was dropped instead of reaching the agent. This is a real user-facing path rather than a corner case: a Discord voice note sent without a caption is delivered as exactly that placeholder, so a captionless voice message that transcribed correctly would crash the handler precisely when transcription had worked. The fix returns the proper tuple from that branch so the placeholder is still stripped while the transcripts continue to flow back to the caller for the echo. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ## Changes Made - `gateway/run.py`: in `_enrich_message_with_transcription`, return `(prefix, successful_transcripts)` instead of a bare `prefix` from the empty-content-placeholder branch, so the contract matches the signature and the other return paths. - `tests/gateway/test_stt_config.py`: add `test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`, which drives a successful transcription with the placeholder caption and asserts the placeholder is stripped while the transcript is still returned. ## How to Test 1. Check out `main` and run the new test — it fails with `ValueError: too many values to unpack (expected 2)`, reproducing the crash a captionless Discord voice note would trigger. 2. Apply this change and re-run `pytest tests/gateway/test_stt_config.py -q` — all tests pass. 3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and `python scripts/check-windows-footguns.py gateway/run.py tests/gateway/test_stt_config.py` both pass. ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

…ption (NousResearch#42090) ## What does this PR do? The voice-during-active-run feature (NousResearch#41984) changed `_enrich_message_with_transcription` so that it returns a `(enriched_text, successful_transcripts)` tuple instead of a bare string, which lets callers echo the raw transcript back to the user. The signature and every other return path were updated to match, but one branch was missed: when a successfully transcribed clip arrives with the Discord "empty content" placeholder as its caption, the method still returned the prefix string on its own. All four call sites unpack the result with `text, transcripts = await self._enrich_message_with_transcription(...)`, so that path raised `ValueError: too many values to unpack (expected 2)` and the inbound voice message was dropped instead of reaching the agent. This is a real user-facing path rather than a corner case: a Discord voice note sent without a caption is delivered as exactly that placeholder, so a captionless voice message that transcribed correctly would crash the handler precisely when transcription had worked. The fix returns the proper tuple from that branch so the placeholder is still stripped while the transcripts continue to flow back to the caller for the echo. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ## Changes Made - `gateway/run.py`: in `_enrich_message_with_transcription`, return `(prefix, successful_transcripts)` instead of a bare `prefix` from the empty-content-placeholder branch, so the contract matches the signature and the other return paths. - `tests/gateway/test_stt_config.py`: add `test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`, which drives a successful transcription with the placeholder caption and asserts the placeholder is stripped while the transcript is still returned. ## How to Test 1. Check out `main` and run the new test — it fails with `ValueError: too many values to unpack (expected 2)`, reproducing the crash a captionless Discord voice note would trigger. 2. Apply this change and re-run `pytest tests/gateway/test_stt_config.py -q` — all tests pass. 3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and `python scripts/check-windows-footguns.py gateway/run.py tests/gateway/test_stt_config.py` both pass. ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

…ming-worker fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half)

…ption (#42090) ## What does this PR do? The voice-during-active-run feature (#41984) changed `_enrich_message_with_transcription` so that it returns a `(enriched_text, successful_transcripts)` tuple instead of a bare string, which lets callers echo the raw transcript back to the user. The signature and every other return path were updated to match, but one branch was missed: when a successfully transcribed clip arrives with the Discord "empty content" placeholder as its caption, the method still returned the prefix string on its own. All four call sites unpack the result with `text, transcripts = await self._enrich_message_with_transcription(...)`, so that path raised `ValueError: too many values to unpack (expected 2)` and the inbound voice message was dropped instead of reaching the agent. This is a real user-facing path rather than a corner case: a Discord voice note sent without a caption is delivered as exactly that placeholder, so a captionless voice message that transcribed correctly would crash the handler precisely when transcription had worked. The fix returns the proper tuple from that branch so the placeholder is still stripped while the transcripts continue to flow back to the caller for the echo. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ## Changes Made - `gateway/run.py`: in `_enrich_message_with_transcription`, return `(prefix, successful_transcripts)` instead of a bare `prefix` from the empty-content-placeholder branch, so the contract matches the signature and the other return paths. - `tests/gateway/test_stt_config.py`: add `test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`, which drives a successful transcription with the placeholder caption and asserts the placeholder is stripped while the transcript is still returned. ## How to Test 1. Check out `main` and run the new test — it fails with `ValueError: too many values to unpack (expected 2)`, reproducing the crash a captionless Discord voice note would trigger. 2. Apply this change and re-run `pytest tests/gateway/test_stt_config.py -q` — all tests pass. 3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and `python scripts/check-windows-footguns.py gateway/run.py tests/gateway/test_stt_config.py` both pass. ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

kristianvast and others added 2 commits June 8, 2026 15:16

chore: add kristianvast to AUTHOR_MAP

f96eb85

kshitijk4poor force-pushed the salvage/6600-stale-streaming-worker branch from d609e18 to f96eb85 Compare June 8, 2026 09:46

kshitijk4poor changed the title ~~fix(agent,gateway): voice-interrupt transcription + cascading-interrupt hang (salvage #6600)~~ fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half) Jun 8, 2026

kshitijk4poor enabled auto-merge June 8, 2026 09:48

kshitijk4poor merged commit c3055d6 into NousResearch:main Jun 8, 2026
22 checks passed

synapsesx mentioned this pull request Jun 8, 2026

fix(gateway): return tuple from voice transcription on placeholder caption #42090

Merged

19 tasks

alt-glitch pushed a commit that referenced this pull request Jun 14, 2026

Merge pull request #41984 from kshitijk4poor/salvage/6600-stale-strea…

38fadb0

…ming-worker fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half)#41984

fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half)#41984
kshitijk4poor merged 2 commits into
NousResearch:mainfrom
kshitijk4poor:salvage/6600-stale-streaming-worker

kshitijk4poor commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kshitijk4poor commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem 1 — voice messages during active runs silently drop

Verification

Note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kshitijk4poor commented Jun 8, 2026 •

edited

Loading