Skip to content

fix(gateway): return tuple from voice transcription on placeholder caption#42090

Merged
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
synapsesx:fix/voice-transcription-placeholder-tuple
Jun 10, 2026
Merged

fix(gateway): return tuple from voice transcription on placeholder caption#42090
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
synapsesx:fix/voice-transcription-placeholder-tuple

Conversation

@synapsesx

Copy link
Copy Markdown
Contributor

What does this PR do?

The voice-during-active-run feature (#41984) changed
_enrich_message_with_transcription so that it returns a
(enriched_text, successful_transcripts) tuple instead of a bare string,
which lets callers echo the raw transcript back to the user. The signature
and every other return path were updated to match, but one branch was
missed: when a successfully transcribed clip arrives with the Discord
"empty content" placeholder as its caption, the method still returned the
prefix string on its own. All four call sites unpack the result with
text, transcripts = await self._enrich_message_with_transcription(...),
so that path raised ValueError: too many values to unpack (expected 2)
and the inbound voice message was dropped instead of reaching the agent.

This is a real user-facing path rather than a corner case: a Discord voice
note sent without a caption is delivered as exactly that placeholder, so a
captionless voice message that transcribed correctly would crash the
handler precisely when transcription had worked. The fix returns the
proper tuple from that branch so the placeholder is still stripped while
the transcripts continue to flow back to the caller for the echo.

Related Issue

N/A

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/run.py: in _enrich_message_with_transcription, return
    (prefix, successful_transcripts) instead of a bare prefix from the
    empty-content-placeholder branch, so the contract matches the signature
    and the other return paths.
  • tests/gateway/test_stt_config.py: add
    test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder,
    which drives a successful transcription with the placeholder caption and
    asserts the placeholder is stripped while the transcript is still returned.

How to Test

  1. Check out main and run the new test — it fails with
    ValueError: too many values to unpack (expected 2), reproducing the
    crash a captionless Discord voice note would trigger.
  2. Apply this change and re-run
    pytest tests/gateway/test_stt_config.py -q — all tests pass.
  3. ruff check gateway/run.py tests/gateway/test_stt_config.py and
    python scripts/check-windows-footguns.py gateway/run.py tests/gateway/test_stt_config.py both pass.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15 (Darwin 25.5)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

…ption

## What does this PR do?

The voice-during-active-run feature (NousResearch#41984) changed
`_enrich_message_with_transcription` so that it returns a
`(enriched_text, successful_transcripts)` tuple instead of a bare string,
which lets callers echo the raw transcript back to the user. The signature
and every other return path were updated to match, but one branch was
missed: when a successfully transcribed clip arrives with the Discord
"empty content" placeholder as its caption, the method still returned the
prefix string on its own. All four call sites unpack the result with
`text, transcripts = await self._enrich_message_with_transcription(...)`,
so that path raised `ValueError: too many values to unpack (expected 2)`
and the inbound voice message was dropped instead of reaching the agent.

This is a real user-facing path rather than a corner case: a Discord voice
note sent without a caption is delivered as exactly that placeholder, so a
captionless voice message that transcribed correctly would crash the
handler precisely when transcription had worked. The fix returns the
proper tuple from that branch so the placeholder is still stripped while
the transcripts continue to flow back to the caller for the echo.

## Related Issue

N/A

## Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)

## Changes Made

- `gateway/run.py`: in `_enrich_message_with_transcription`, return
  `(prefix, successful_transcripts)` instead of a bare `prefix` from the
  empty-content-placeholder branch, so the contract matches the signature
  and the other return paths.
- `tests/gateway/test_stt_config.py`: add
  `test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`,
  which drives a successful transcription with the placeholder caption and
  asserts the placeholder is stripped while the transcript is still returned.

## How to Test

1. Check out `main` and run the new test — it fails with
   `ValueError: too many values to unpack (expected 2)`, reproducing the
   crash a captionless Discord voice note would trigger.
2. Apply this change and re-run
   `pytest tests/gateway/test_stt_config.py -q` — all tests pass.
3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and
   `python scripts/check-windows-footguns.py gateway/run.py
   tests/gateway/test_stt_config.py` both pass.

## Checklist

### Code

- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/discord Discord bot adapter tool/tts Text-to-speech and transcription labels Jun 8, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

Positive Verification

This is a clean one-line fix that addresses a real ValueError crash.

Root cause: _enrich_message_with_transcription returned a bare str when the caption matched the empty-content placeholder, but every caller unpacks the result as text, transcripts = .... The bare string caused ValueError: too many values to unpack, silently dropping the voice message.

Fix: Return the expected (prefix, successful_transcripts) tuple. Correct and minimal.

Test coverage: The new test exercises exactly the failing path — a successful transcription whose caption is the placeholder. Good assertion that the placeholder is stripped and transcripts are still surfaced.

@kshitijk4poor kshitijk4poor merged commit 9ca9697 into NousResearch:main Jun 10, 2026
23 checks passed
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…ption (NousResearch#42090)

## What does this PR do?

The voice-during-active-run feature (NousResearch#41984) changed
`_enrich_message_with_transcription` so that it returns a
`(enriched_text, successful_transcripts)` tuple instead of a bare string,
which lets callers echo the raw transcript back to the user. The signature
and every other return path were updated to match, but one branch was
missed: when a successfully transcribed clip arrives with the Discord
"empty content" placeholder as its caption, the method still returned the
prefix string on its own. All four call sites unpack the result with
`text, transcripts = await self._enrich_message_with_transcription(...)`,
so that path raised `ValueError: too many values to unpack (expected 2)`
and the inbound voice message was dropped instead of reaching the agent.

This is a real user-facing path rather than a corner case: a Discord voice
note sent without a caption is delivered as exactly that placeholder, so a
captionless voice message that transcribed correctly would crash the
handler precisely when transcription had worked. The fix returns the
proper tuple from that branch so the placeholder is still stripped while
the transcripts continue to flow back to the caller for the echo.

## Related Issue

N/A

## Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)

## Changes Made

- `gateway/run.py`: in `_enrich_message_with_transcription`, return
  `(prefix, successful_transcripts)` instead of a bare `prefix` from the
  empty-content-placeholder branch, so the contract matches the signature
  and the other return paths.
- `tests/gateway/test_stt_config.py`: add
  `test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`,
  which drives a successful transcription with the placeholder caption and
  asserts the placeholder is stripped while the transcript is still returned.

## How to Test

1. Check out `main` and run the new test — it fails with
   `ValueError: too many values to unpack (expected 2)`, reproducing the
   crash a captionless Discord voice note would trigger.
2. Apply this change and re-run
   `pytest tests/gateway/test_stt_config.py -q` — all tests pass.
3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and
   `python scripts/check-windows-footguns.py gateway/run.py
   tests/gateway/test_stt_config.py` both pass.

## Checklist

### Code

- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
alt-glitch pushed a commit that referenced this pull request Jun 14, 2026
…ption (#42090)

## What does this PR do?

The voice-during-active-run feature (#41984) changed
`_enrich_message_with_transcription` so that it returns a
`(enriched_text, successful_transcripts)` tuple instead of a bare string,
which lets callers echo the raw transcript back to the user. The signature
and every other return path were updated to match, but one branch was
missed: when a successfully transcribed clip arrives with the Discord
"empty content" placeholder as its caption, the method still returned the
prefix string on its own. All four call sites unpack the result with
`text, transcripts = await self._enrich_message_with_transcription(...)`,
so that path raised `ValueError: too many values to unpack (expected 2)`
and the inbound voice message was dropped instead of reaching the agent.

This is a real user-facing path rather than a corner case: a Discord voice
note sent without a caption is delivered as exactly that placeholder, so a
captionless voice message that transcribed correctly would crash the
handler precisely when transcription had worked. The fix returns the
proper tuple from that branch so the placeholder is still stripped while
the transcripts continue to flow back to the caller for the echo.

## Related Issue

N/A

## Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)

## Changes Made

- `gateway/run.py`: in `_enrich_message_with_transcription`, return
  `(prefix, successful_transcripts)` instead of a bare `prefix` from the
  empty-content-placeholder branch, so the contract matches the signature
  and the other return paths.
- `tests/gateway/test_stt_config.py`: add
  `test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`,
  which drives a successful transcription with the placeholder caption and
  asserts the placeholder is stripped while the transcript is still returned.

## How to Test

1. Check out `main` and run the new test — it fails with
   `ValueError: too many values to unpack (expected 2)`, reproducing the
   crash a captionless Discord voice note would trigger.
2. Apply this change and re-run
   `pytest tests/gateway/test_stt_config.py -q` — all tests pass.
3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and
   `python scripts/check-windows-footguns.py gateway/run.py
   tests/gateway/test_stt_config.py` both pass.

## Checklist

### Code

- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/discord Discord bot adapter tool/tts Text-to-speech and transcription type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants