Skip to content

fix(gateway): queue voice/audio messages instead of interrupting with empty text#8434

Open
chaizijun1 wants to merge 1 commit into
NousResearch:mainfrom
chaizijun1:fix/queue-voice-messages-on-interrupt
Open

fix(gateway): queue voice/audio messages instead of interrupting with empty text#8434
chaizijun1 wants to merge 1 commit into
NousResearch:mainfrom
chaizijun1:fix/queue-voice-messages-on-interrupt

Conversation

@chaizijun1

Copy link
Copy Markdown

Summary

  • When a voice/audio message arrives while an agent is already running, the gateway interrupts the agent with event.text — but voice messages haven't been through STT transcription yet, so event.text is empty. This causes the agent to hang.
  • Photos already have dedicated queueing logic that avoids interrupting the running agent. This PR applies the same merge_pending_message_event pattern to MessageType.VOICE and MessageType.AUDIO.

Reproducer

  1. Send a voice message to the Telegram bot
  2. Before the agent finishes responding, send a second voice message
  3. Expected: second voice is queued and processed after the first completes
  4. Actual: agent hangs indefinitely (interrupted with empty text)

Changes

  • gateway/run.py: Add voice/audio check between the existing photo queueing block and the _AGENT_PENDING_SENTINEL check (+12 lines)

Test plan

  • Send two consecutive voice messages in quick succession — second should be queued and processed after the first
  • Send a voice message while agent is processing a text message — should queue without interrupt
  • Send a text message while agent is processing — should still interrupt normally (existing behavior preserved)
  • Verify photo queueing still works as before

🤖 Generated with Claude Code

… empty text

When a voice or audio message arrives while an agent is already running,
the gateway calls `running_agent.interrupt(event.text)`. However,
`event.text` is empty at this point because STT transcription only
happens later inside `_handle_message_with_agent`. The empty-text
interrupt causes the agent to hang waiting for model response.

Photos already have dedicated queueing logic that avoids this problem.
Apply the same pattern to voice/audio messages: queue them via
`merge_pending_message_event` so they are processed with full STT
transcription after the current agent turn completes.

Reproducer:
1. Send a voice message to the Telegram bot
2. Before the agent finishes responding, send a second voice message
3. The agent hangs indefinitely

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chaizijun1

Copy link
Copy Markdown
Author

Hi team, gentle bump on this one. This is a bug fix for voice/audio messages causing agent hangs when sent in quick succession. The fix mirrors the existing photo queueing pattern. Happy to adjust anything if needed!

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery tool/tts Text-to-speech and transcription labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround tool/tts Text-to-speech and transcription type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants