Split voice agent stream processors#1351
Closed
jonastemplestein wants to merge 15 commits into
Closed
Conversation
27f0a4b to
9be9263
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ed2beef. Configure here.
ffa1c6a to
6323aed
Compare
Contributor
Author
|
Consolidated this work into #1349, which has been rebuilt on top of current origin/main with the new stream processor host abstraction. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
messageAgentrealtime tools for Gemini Live, OpenAI Realtime, and Grok Realtime that appendevents.iterate.com/agent/input-addedVerification
Local realtime e2e
Ran the original audio in/out smoke against local OS before the message-agent bridge:
Ran direct Gemini Live tool probe against Doppler config dev_localhost:
Observed Gemini
toolCall.functionCalls[0].name === "messageAgent"using the docs-shaped function declaration.Ran text-mode message-agent bridge smokes against local OS at http://127.0.0.1:5176 with Doppler config dev_localhost:
doppler run --project os --config dev_localhost -- tsx scripts/voice-agent-e2e.ts \ --base-url http://127.0.0.1:5176 \ --provider gemini-live \ --input-mode text \ --expect-message-agent \ --prompt "Call the messageAgent function now. Use message: Fetch example.com and tell me what it says. Do not answer in natural language before the function call." \ --timeout-ms 180000Gemini result:
ok: true, stream/voice-agents/e2e-mpd08lmh,messageAgentInputAdded: true,agentOutputAdded: true,codeAgentVoiceTextAdded: true,codeAgentVoiceText: "Example.com says: Example Domain. This domain is for use in documentation examples without needing permission. Avoid use in operations."doppler run --project os --config dev_localhost -- tsx scripts/voice-agent-e2e.ts \ --base-url http://127.0.0.1:5176 \ --provider openai-realtime \ --input-mode text \ --expect-message-agent \ --prompt "Call the messageAgent function now. Use message: Fetch example.com and tell me what it says. Do not answer in natural language before the function call." \ --timeout-ms 180000OpenAI result:
ok: true, stream/voice-agents/e2e-mpd0dax0,messageAgentInputAdded: true,agentOutputAdded: true,codeAgentVoiceTextAdded: true,codeAgentVoiceText: "Example.com says: Example Domain. This domain is for use in documentation examples without needing permission. Avoid use in operations."doppler run --project os --config dev_localhost -- tsx scripts/voice-agent-e2e.ts \ --base-url http://127.0.0.1:5176 \ --provider grok-realtime \ --input-mode text \ --expect-message-agent \ --prompt "Call the messageAgent function now. Use message: Fetch example.com and tell me what it says. Do not answer in natural language before the function call." \ --timeout-ms 180000Grok regression result:
ok: true, stream/voice-agents/e2e-mpd0du05,messageAgentInputAdded: true,agentOutputAdded: true,codeAgentVoiceTextAdded: true,codeAgentVoiceText: "Example.com says: Example Domain. This domain is for use in documentation examples without needing permission. Avoid use in operations."Prompt-handoff regression coverage now verifies that if the background agent appends a caller-facing question such as
What occupation should I put on your profile?, all three provider adapters send it with instructions to ask the human they are speaking to rather than answer it themselves.Note: the original audio CLI returned ok=true based on output audio, but turnCompleted=false for the first three provider runs. The audio in/out proof works; the script should not currently treat provider turn-complete events as a reliable completion condition.
Environment Config Lease
Lease:
preview-6Doppler config:
preview_6Type:
environment-config-leaseLeased until: 2026-05-19T23:05:00.080Z
OS
Status: deployed
Commit:
637cd1aPreview: https://os.iterate-preview-6.com
Workflow run
Updated: 2026-05-19T22:07:36.660Z
Note
High Risk
High risk because it refactors durable-object bindings/subscriptions and realtime voice processing/tool-calling paths, which can break live voice sessions or agent handoff if misconfigured.
Overview
Stream processor refactor: Replaces the voice-agent-specific Durable Object binding (
VOICE_AGENT) with a genericSTREAM_PROCESSORDurable Object that selects a processor viaprocessorSlug, and updates OS runtime/context/exports accordingly.Voice agent pipeline split: New voice-agent streams now subscribe to two processors: a canonical
voice-agentprotocol processor plus a provider-specific adapter (voice-agent/gemini-live,voice-agent/openai-realtime,voice-agent/grok-realtime), with new slugs/helpers to generate subscription names.Text + tool-call bridge: Introduces canonical
voice-agentinput/output text events, forwards input text to providers, exposes amessageAgenttool for Gemini/OpenAI/Grok that appendsevents.iterate.com/agent/input-added, and wrapscode-agenttext so providers relay it to the human (not answer/acknowledge).OS/UI/scripts updates: Voice-agent routes and tooling move to
/agents/voice/*(while listing supports legacy/voice-agents/*), the stream console displays the new output-text events, and new/updated scripts (voice-agent-e2etext mode + expectations,gemini-live-tool-probe) validate provider tool-calling. Also updates the Alchemy patch to cap tags and includeartifactsbindings in metadata.Reviewed by Cursor Bugbot for commit 637cd1a. Bugbot is set up for automated code reviews on this repo. Configure here.