feat(android): streaming TTS via ElevenLabs WebSocket for voice screen by gregmousseau · Pull Request #29521 · openclaw/openclaw

gregmousseau · 2026-02-28T06:40:57Z

Adds real-time ElevenLabs streaming TTS to the Android voice screen, so the assistant speaks responses as they stream in rather than waiting for the full reply.

What changed

ElevenLabsStreamingTts (new class)
Streams text chunks to the ElevenLabs WebSocket API and plays audio via AudioTrack (PCM 24kHz). Text is sent incrementally as agent delta events arrive; EOS is sent when the response finalizes. Handles WebSocket connect timing: chunks are queued before onOpen fires, and finish() defers EOS if called before the socket is ready.

TalkModeManager wired into NodeRuntime
TalkModeManager runs in TTS-only mode (ttsOnAllResponses = true) alongside MicCaptureManager, which continues to own STT and chat.send. Barge-in (mic tap stops active TTS), voice screen lifecycle (TTS stops on tab switch or backgrounding), and session key wiring are all handled.

ChatController fix
final/aborted/error run events now refresh chat history regardless of whether the runId is tracked in pendingRuns. Previously, voice-initiated runs were silently dropped because they weren't registered — responses would play via TTS but never appear in the chat UI.

MicCaptureManager improvements
Don't auto-send on silence — accumulate segments and send when mic toggles off, with a 2s drain window to catch buffered audio. Multi-segment transcripts are joined with sentence-ending punctuation.

Testing

Tested on OnePlus CPH2581 (Android 15). Voice → STT → response → TTS confirmed working for sequential messages.

Known limitations (follow-up)

Mic rapid-toggle → duplicate TTS: the 2s drain coroutine in MicCaptureManager is not cancelled if the mic re-enables within the window; the deferred stop() fires into a live session and the previous response replays
STT first-word cutoff: 300ms recognizer restart delay after onResults means words spoken immediately after the window are lost
sendText thread-safety: sentFullText/sentTextLength are accessed from multiple threads without synchronization; rare false "text diverged" restart possible under concurrent delta events

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c4cacd258

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

apps/android/app/src/main/java/ai/openclaw/android/voice/MicCaptureManager.kt

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt

greptile-apps · 2026-02-28T06:51:10Z

Greptile Summary

Adds real-time streaming TTS to the Android voice screen via ElevenLabs WebSocket API. The implementation introduces ElevenLabsStreamingTts to stream text chunks and play audio as agent responses arrive, rather than waiting for complete responses.

Key changes:

ElevenLabsStreamingTts handles WebSocket streaming with PCM audio playback, queuing text before connection and deferring EOS until ready
TalkModeManager wired into NodeRuntime to run TTS-only mode (ttsOnAllResponses = true) alongside mic capture
ChatController fix ensures voice-initiated runs refresh chat history (previously dropped because not in pendingRuns)
MicCaptureManager improvements: accumulate transcript segments with 2s drain window, join with punctuation

Known limitations (documented by author for follow-up):

Mic rapid-toggle can cause duplicate TTS due to uncancelled drain coroutine
STT first-word cutoff from 300ms recognizer restart delay
Thread-safety issue in sendText where sentFullText/sentTextLength lack synchronization

The implementation handles WebSocket timing correctly and includes proper lifecycle management (barge-in, tab switching, backgrounding). Testing on OnePlus Android 15 confirms the feature works for sequential voice interactions.

Confidence Score: 4/5

This PR is safe to merge with minor known limitations documented for follow-up
Solid implementation of streaming TTS with proper WebSocket handling and lifecycle management. The author has transparently documented three known limitations (mic rapid-toggle duplicate TTS, STT first-word cutoff, sendText thread-safety) that should be addressed in follow-up work. The core functionality is sound: WebSocket text queuing works correctly, chat history fix resolves voice-initiated run visibility, and integration with NodeRuntime properly coordinates mic/TTS. Tested on real device confirms sequential voice interactions work. Score reflects working feature with documented edge cases rather than critical blocking issues.
apps/android/app/src/main/java/ai/openclaw/android/voice/MicCaptureManager.kt (drain coroutine should be tracked and cancelled) and apps/android/app/src/main/java/ai/openclaw/android/voice/ElevenLabsStreamingTts.kt (sendText needs synchronization)

_{Last reviewed commit: 9c4cacd}

gregmousseau · 2026-02-28T06:57:37Z

Hey @obviyus, voice screen TTS is working well for the core flow. Three known limitations are documented (mic rapid-toggle race, STT first-word cutoff, sendText thread-safety). Happy to fix those before merge if you'd prefer, or we can track them separately.

obviyus · 2026-02-28T09:52:23Z

@gregmousseau thanks for the PR! I'll manually test it and merge it if everything seems good.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70bf53770f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64749866e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt

apps/android/app/src/main/java/ai/openclaw/android/NodeRuntime.kt

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0b310125a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

apps/android/app/src/main/java/ai/openclaw/android/NodeRuntime.kt

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt

gregmousseau · 2026-02-28T18:33:02Z

@obviyus
Merge conflicts resolved, Codex comments addressed. TL;DR:

-Merged with main (adopted playbackToken cancellation pattern)
-System fallback voice works when ElevenLabs key invalid or empty (bad key, no audio, network error)
-Gated both agent and chat TTS on sessionKey (privacy fix)
-Wired MP3 fallback that was computed but never called
-Prevented double-speaking when both TTS pipelines handle same reply
-Speaker mute now effective in talk-mode path
-Audio focus released on stop

Happy to split into smaller PRs, but would create a dependency chain.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b50a89cdfb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt

apps/android/app/src/main/java/ai/openclaw/android/voice/ElevenLabsStreamingTts.kt

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43f4ed1156

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

apps/android/app/src/main/java/ai/openclaw/android/chat/ChatController.kt

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt

…PCM playback Streams text to the ElevenLabs WebSocket API and plays audio in real-time via AudioTrack (PCM 24kHz). Key design points: - sendText(fullText) takes the full accumulated text and only transmits the new suffix, detecting divergence for restart - Chunks are queued if the WebSocket isn't yet connected; flushed in onOpen - finish() sends EOS to ElevenLabs; deferred if called before onOpen fires - sendText returns true (not false) when finished=true to avoid treating a normal end-of-stream as a diverge restart - finishStreamingTts coroutine uses identity check before nulling streamingTts to prevent a mid-drain restart from orphaning a live TTS session - eleven_v3 does NOT support WebSocket streaming; use eleven_flash_v2_5

… TTS TalkModeManager is instantiated lazily in NodeRuntime and drives ElevenLabs streaming TTS for all assistant responses when the voice screen is active. MicCaptureManager continues to own STT and chat.send; TalkModeManager is TTS-only (ttsOnAllResponses = true, setEnabled never called). - talkMode.ttsOnAllResponses = true when mic is enabled or voice screen active - Barge-in: tapping the mic button calls stopTts() before re-enabling mic - Lifecycle: PostOnboardingTabs LaunchedEffect + VoiceTabScreen onDispose both call setVoiceScreenActive(false) so TTS stops cleanly on tab switch or app backgrounding - applyMainSessionKey wires the session key into TalkModeManager so it subscribes to the correct chat session for TTS

…oice ChatController: - final/aborted/error run events now trigger a history refresh regardless of whether the runId is in pendingRuns; only delta events require the run to be tracked (prevents voice-initiated responses from being silently dropped) MicCaptureManager: - Don't auto-send on onResults silence detection — accumulate transcript segments and send when mic is toggled off, giving the recognizer time to finish processing buffered audio - Capture any partial live transcript if no final segments arrived (2s drain window before stop) - Join multi-segment transcripts with sentence-ending punctuation to avoid run-on text sent to the gateway

@synchronized

…cooldown Bug fixes: - @synchronized on ElevenLabsStreamingTts.sendText/finish to prevent sentFullText/sentTextLength races across OkHttp and caller threads - Pre-set pendingRunId via onRunIdKnown callback before chat.send to eliminate race where gateway events arrive before runId is stored - Track drain coroutine as Job; cancel prior on rapid mic toggle to prevent duplicate TTS and stale transcript sends - Mic button disabled during 2s drain cooldown (micCooldown StateFlow) Codex review fixes: - Gate agent streaming TTS on sessionKey to prevent cross-session audio leaks (P1) - Clear ElevenLabs credentials when talk.provider is not elevenlabs; gate streaming TTS on activeProviderIsElevenLabs (P2) System TTS fallback fixes: - Null streamingTts immediately in finishStreamingTts so next response gets a fresh TTS instance - Add hasReceivedAudio flag to ElevenLabsStreamingTts to detect when WebSocket connects but returns no audio (invalid key, network error) - Fall back to playTtsForText when streaming TTS produced no audio - Track ttsJob to cleanly cancel prior playTtsForText on new response - Re-throw CancellationException instead of cascading into fallback attempts that also get cancelled

- Codex P1: streamAndPlayMp3 was computed but never called after PCM failure. Now properly invoked as fallback. - Codex P2: MicCaptureManager.speakAssistantReply now skipped when TalkModeManager.ttsOnAllResponses is active, preventing both pipelines from speaking the same assistant reply.

- Codex P1: setSpeakerEnabled now syncs talkMode.setPlaybackEnabled so muting the speaker works when ttsOnAllResponses is active. - Codex P2: abandonAudioFocus() called in stopSpeaking to prevent audio focus leak after TTS completes or is interrupted.

Agent events arrive on multiple threads concurrently. A stale event with shorter accumulated text was falsely triggering 'text diverged', causing the streaming TTS to restart with a new WebSocket — resulting in multiple simultaneous ElevenLabs connections (2-3 voices) and eventual system TTS fallback when hasReceivedAudio was false. Fix: if sentFullText.startsWith(fullText), the event is stale (we already have this text), not diverged. Accept and ignore it.

…ailure - Codex P2: drain coroutine now only clears drainingTts if it's the same instance (=== check), preventing a newer drain from being unreachable by stopTts. - Codex P2: set stopped=true on WebSocket onFailure so subsequent sendText calls are rejected and stale state doesn't persist.

…seau

…seau)

obviyus · 2026-03-01T14:34:01Z

Landed via temp rebase onto main.

Gate: ./gradlew :app:compileDebugKotlin --no-daemon
Land head: 1763daa
Merge commit: 59fd394

Changes added in this landing pass:

Added CHANGELOG.md entry for feat(android): streaming TTS via ElevenLabs WebSocket for voice screen #29521 (thanks @gregmousseau).
Kept the PR follow-up fixes in final code: speaker-mute now stops active streaming TTS, and the sticky interrupt-stop flag removal remains.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1763daa9dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-01T14:41:34Z

apps/android/app/src/main/java/ai/openclaw/android/voice/ElevenLabsStreamingTts.kt

+          })
+        }
+        webSocket.send(config.toString())
+        wsReady = true


Flush queued text before setting WebSocket ready state

onOpen flips wsReady to true before draining pendingText, so a concurrent sendText() call can send a newer chunk directly while older queued chunks are still waiting in the flush loop. When assistant deltas arrive quickly during socket startup, this reorders text sent to ElevenLabs and can produce garbled/truncated speech for the turn.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-01T14:41:34Z

apps/android/app/src/main/java/ai/openclaw/android/voice/MicCaptureManager.kt

+        delay(2000L)
+        stop()


Bypass drain delay when turning mic off from voice screen

Mic disable now always waits 2 seconds before calling stop(), and setVoiceScreenActive(false) uses this same path when the user leaves the voice tab. That means the recognizer can keep listening after the screen is closed/backgrounded and still flush/send transcript text, which is an unexpected capture window for users who just exited voice mode.

Useful? React with 👍 / 👎.

…seau

…seau)

…seau

…seau)

…seau

…seau)

…seau

…seau)

…seau

…seau)

…seau

…seau)

…seau

…seau)

…seau

…seau) (cherry picked from commit 59fd394) # Conflicts: # .gitignore # CHANGELOG.md

…seau

…seau)

…seau

…seau)

…seau

…seau)

…seau

…seau)

openclaw-barnacle bot added app: android App: android size: L labels Feb 28, 2026

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

openclaw-barnacle bot added size: XL and removed size: L labels Feb 28, 2026

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt Show resolved Hide resolved

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt Show resolved Hide resolved

apps/android/app/src/main/java/ai/openclaw/android/NodeRuntime.kt Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

apps/android/app/src/main/java/ai/openclaw/android/NodeRuntime.kt Show resolved Hide resolved

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt Outdated Show resolved Hide resolved

apps/android/app/src/main/java/ai/openclaw/android/voice/ElevenLabsStreamingTts.kt Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

apps/android/app/src/main/java/ai/openclaw/android/chat/ChatController.kt Show resolved Hide resolved

apps/android/app/src/main/java/ai/openclaw/android/voice/TalkModeManager.kt Outdated Show resolved Hide resolved

obviyus self-assigned this Mar 1, 2026

obviyus force-pushed the feat/talk-mode-pr3 branch from 6447032 to 96a4117 Compare March 1, 2026 14:24

gregmousseau and others added 9 commits March 1, 2026 19:59

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

1763daa

…seau)

obviyus force-pushed the feat/talk-mode-pr3 branch from 96a4117 to 1763daa Compare March 1, 2026 14:33

obviyus merged commit 59fd394 into openclaw:main Mar 1, 2026

chatgpt-codex-connector bot reviewed Mar 1, 2026

View reviewed changes

gemini-code-assist bot mentioned this pull request Mar 1, 2026

chore(sync): replay rollup fixes on upstream/main MillionthOdin16/openclaw#111

Merged

ansh pushed a commit to vibecode/openclaw that referenced this pull request Mar 2, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

436aa0c

…seau)

steipete pushed a commit to Sid-Qin/openclaw that referenced this pull request Mar 2, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

f714165

…seau)

safzanpirani pushed a commit to safzanpirani/clawdbot that referenced this pull request Mar 2, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

f33e7fd

…seau)

steipete pushed a commit to Sid-Qin/openclaw that referenced this pull request Mar 2, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

d34ce53

…seau)

robertchang-ga pushed a commit to robertchang-ga/openclaw that referenced this pull request Mar 2, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

6449f43

…seau)

hanqizheng pushed a commit to hanqizheng/openclaw that referenced this pull request Mar 2, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

6fd808f

…seau)

execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

86f81eb

…seau)

hughdidit pushed a commit to hughdidit/DAISy-Agency that referenced this pull request Mar 3, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

673670e

…seau) (cherry picked from commit 59fd394) # Conflicts: # .gitignore # CHANGELOG.md

dorgonman pushed a commit to kanohorizonia/openclaw that referenced this pull request Mar 3, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

40123a7

…seau)

sachinkundu pushed a commit to sachinkundu/openclaw that referenced this pull request Mar 6, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

914d6c3

…seau)

zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

a83c9a0

…seau)

Mateljan1 pushed a commit to Mateljan1/openclaw that referenced this pull request Mar 7, 2026

docs(changelog): add openclaw#29521 voice tts entry (thanks @gregmous…

d857e04

…seau)

Uh oh!

Conversation

gregmousseau commented Feb 28, 2026

What changed

Testing

Known limitations (follow-up)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 28, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

gregmousseau commented Feb 28, 2026

Uh oh!

obviyus commented Feb 28, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

gregmousseau commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

obviyus commented Mar 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gregmousseau commented Feb 28, 2026 •

edited

Loading