Split voice agent stream processors by jonastemplestein · Pull Request #1351 · iterate/iterate

jonastemplestein · 2026-05-19T15:29:18Z

Summary

replace the voice-agent-specific Durable Object binding with a generic STREAM_PROCESSOR runner and registry
subscribe both the passive voice-agent protocol processor and the selected provider adapter for new voice-agent streams
add canonical input/output text events and forward input text to Gemini Live, OpenAI Realtime, and Grok Realtime adapters
wire new voice-agent streams to the existing AGENT Durable Object from the passive voice-agent processor side effect
expose messageAgent realtime tools for Gemini Live, OpenAI Realtime, and Grok Realtime that append events.iterate.com/agent/input-added
wrap code-agent text before sending it to voice providers so the realtime model relays it to the human it is speaking to, including clarifying questions, rather than answering it itself
add local smoke modes and a direct Gemini Live tool probe for proving provider tool-call wiring outside the full stream path

Verification

pnpm --filter os typecheck
pnpm --dir packages/shared test:stream-processors
pnpm format:check packages/shared/src/stream-processors/voice-agent/contract.ts packages/shared/src/stream-processors/voice-agent/implementation.ts packages/shared/src/stream-processors/voice-agent/implementation.test.ts apps/os/scripts/voice-agent-e2e.ts apps/os/scripts/gemini-live-tool-probe.ts apps/os/src/domains/voice-agents/voice-agent-code-agent.ts
pnpm lint packages/shared/src/stream-processors/voice-agent/contract.ts packages/shared/src/stream-processors/voice-agent/implementation.ts packages/shared/src/stream-processors/voice-agent/implementation.test.ts apps/os/scripts/voice-agent-e2e.ts apps/os/scripts/gemini-live-tool-probe.ts apps/os/src/domains/voice-agents/voice-agent-code-agent.ts
focused prompt-handoff checks after follow-ups: packages/shared/src/stream-processors/voice-agent/implementation.ts, packages/shared/src/stream-processors/voice-agent/implementation.test.ts, apps/os/src/domains/voice-agents/voice-agent-code-agent.ts

Local realtime e2e

Ran the original audio in/out smoke against local OS before the message-agent bridge:

Gemini Live: passed, provider connected, setup completed, output audio/text returned, outputBytes=10084
OpenAI Realtime: passed, provider connected, setup completed, output audio/text returned, outputBytes=19200
Grok Realtime: passed, provider connected, setup completed, output audio/text returned, outputBytes=26880

Ran direct Gemini Live tool probe against Doppler config dev_localhost:

doppler run --project os --config dev_localhost -- tsx scripts/gemini-live-tool-probe.ts --timeout-ms 12000

Observed Gemini toolCall.functionCalls[0].name === "messageAgent" using the docs-shaped function declaration.

Ran text-mode message-agent bridge smokes against local OS at http://127.0.0.1:5176 with Doppler config dev_localhost:

doppler run --project os --config dev_localhost -- tsx scripts/voice-agent-e2e.ts \
  --base-url http://127.0.0.1:5176 \
  --provider gemini-live \
  --input-mode text \
  --expect-message-agent \
  --prompt "Call the messageAgent function now. Use message: Fetch example.com and tell me what it says. Do not answer in natural language before the function call." \
  --timeout-ms 180000

Gemini result: ok: true, stream /voice-agents/e2e-mpd08lmh, messageAgentInputAdded: true, agentOutputAdded: true, codeAgentVoiceTextAdded: true, codeAgentVoiceText: "Example.com says: Example Domain. This domain is for use in documentation examples without needing permission. Avoid use in operations."

doppler run --project os --config dev_localhost -- tsx scripts/voice-agent-e2e.ts \
  --base-url http://127.0.0.1:5176 \
  --provider openai-realtime \
  --input-mode text \
  --expect-message-agent \
  --prompt "Call the messageAgent function now. Use message: Fetch example.com and tell me what it says. Do not answer in natural language before the function call." \
  --timeout-ms 180000

OpenAI result: ok: true, stream /voice-agents/e2e-mpd0dax0, messageAgentInputAdded: true, agentOutputAdded: true, codeAgentVoiceTextAdded: true, codeAgentVoiceText: "Example.com says: Example Domain. This domain is for use in documentation examples without needing permission. Avoid use in operations."

doppler run --project os --config dev_localhost -- tsx scripts/voice-agent-e2e.ts \
  --base-url http://127.0.0.1:5176 \
  --provider grok-realtime \
  --input-mode text \
  --expect-message-agent \
  --prompt "Call the messageAgent function now. Use message: Fetch example.com and tell me what it says. Do not answer in natural language before the function call." \
  --timeout-ms 180000

Grok regression result: ok: true, stream /voice-agents/e2e-mpd0du05, messageAgentInputAdded: true, agentOutputAdded: true, codeAgentVoiceTextAdded: true, codeAgentVoiceText: "Example.com says: Example Domain. This domain is for use in documentation examples without needing permission. Avoid use in operations."

Prompt-handoff regression coverage now verifies that if the background agent appends a caller-facing question such as What occupation should I put on your profile?, all three provider adapters send it with instructions to ask the human they are speaking to rather than answer it themselves.

Note: the original audio CLI returned ok=true based on output audio, but turnCompleted=false for the first three provider runs. The audio in/out proof works; the script should not currently treat provider turn-complete events as a reliable completion condition.

Environment Config Lease

Lease: preview-6
Doppler config: preview_6
Type: environment-config-lease
Leased until: 2026-05-19T23:05:00.080Z

OS

Status: deployed
Commit: 637cd1a
Preview: https://os.iterate-preview-6.com
Workflow run
Updated: 2026-05-19T22:07:36.660Z

Note

High Risk
High risk because it refactors durable-object bindings/subscriptions and realtime voice processing/tool-calling paths, which can break live voice sessions or agent handoff if misconfigured.

Overview
Stream processor refactor: Replaces the voice-agent-specific Durable Object binding (VOICE_AGENT) with a generic STREAM_PROCESSOR Durable Object that selects a processor via processorSlug, and updates OS runtime/context/exports accordingly.

Voice agent pipeline split: New voice-agent streams now subscribe to two processors: a canonical voice-agent protocol processor plus a provider-specific adapter (voice-agent/gemini-live, voice-agent/openai-realtime, voice-agent/grok-realtime), with new slugs/helpers to generate subscription names.

Text + tool-call bridge: Introduces canonical voice-agent input/output text events, forwards input text to providers, exposes a messageAgent tool for Gemini/OpenAI/Grok that appends events.iterate.com/agent/input-added, and wraps code-agent text so providers relay it to the human (not answer/acknowledge).

OS/UI/scripts updates: Voice-agent routes and tooling move to /agents/voice/* (while listing supports legacy /voice-agents/*), the stream console displays the new output-text events, and new/updated scripts (voice-agent-e2e text mode + expectations, gemini-live-tool-probe) validate provider tool-calling. Also updates the Alchemy patch to cap tags and include artifacts bindings in metadata.

^{Reviewed by Cursor Bugbot for commit 637cd1a. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ed2beef. Configure here.}

jonastemplestein · 2026-06-10T10:58:19Z

Consolidated this work into #1349, which has been rebuilt on top of current origin/main with the new stream processor host abstraction.

jonastemplestein added 3 commits May 19, 2026 14:52

Add voice agent processor POC

ffa1c6a

Split voice agent stream processors

81487f1

Add Grok ask-agent voice bridge

42c1f92

jonastemplestein marked this pull request as ready for review May 19, 2026 15:57

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread apps/os/src/domains/agents/durable-objects/agent-durable-object.ts Outdated

Smoke test Grok ask-agent bridge

dd5e22c

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread apps/os/src/routes/_app/orgs/$organizationSlug/projects/$projectSlug/voice-agents/index.tsx

Add Gemini and OpenAI ask-agent tool bridge

14636d8

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread packages/shared/src/stream-processors/voice-agent/implementation.ts Outdated

jonastemplestein added 4 commits May 19, 2026 20:42

Clarify code agent voice responses

4cb9926

Tighten voice agent handoff prompts

7cd2c30

Rename voice tool to messageAgent

446b125

Clarify voice agent relay handoff

9ea5270

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread apps/os/src/components/voice-agent-stream-console.tsx

Comment thread packages/shared/src/stream-processors/voice-agent/implementation.ts

Clarify caller terminology for voice handoff

0f09a9f

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread packages/shared/src/stream-processors/voice-agent/implementation.ts

Comment thread apps/os/src/domains/agents/durable-objects/agent-durable-object.ts

Comment thread apps/os/src/domains/voice-agents/voice-agent-subscription.ts

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread apps/os/src/domains/agents/durable-objects/agent-durable-object.ts Outdated

Cap Alchemy worker tags for OS preview deploys

9be9263

jonastemplestein force-pushed the voice-agent-processor-split branch from 27f0a4b to 9be9263 Compare May 19, 2026 21:08

jonastemplestein added 3 commits May 19, 2026 22:19

Move voice agents under agents stream path

1895d66

Route voice agent chat responses to voice input

fb48aa9

Honor Gemini required messageAgent tool choice

ed2beef

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread apps/os/src/routes/_app/orgs/$organizationSlug/projects/$projectSlug/voice-agents/index.tsx Outdated

Show legacy voice agent streams

637cd1a

jonastemplestein force-pushed the electric-artichoke branch from ffa1c6a to 6323aed Compare June 10, 2026 10:58

jonastemplestein closed this Jun 10, 2026

jonastemplestein mentioned this pull request Jun 10, 2026

Rebuild voice agent processor on stream host #1349

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split voice agent stream processors#1351

Split voice agent stream processors#1351
jonastemplestein wants to merge 15 commits into
electric-artichokefrom
voice-agent-processor-split

jonastemplestein commented May 19, 2026 •

edited by iterate-bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

jonastemplestein commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jonastemplestein commented May 19, 2026 • edited by iterate-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Local realtime e2e

Environment Config Lease

OS

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonastemplestein commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jonastemplestein commented May 19, 2026 •

edited by iterate-bot

Loading