Skip to content

fix(ios): prefetch incremental talk tts audio#22833

Merged
ngutman merged 3 commits intomainfrom
fix/ios-talk-tts-prefetch-pauses
Feb 21, 2026
Merged

fix(ios): prefetch incremental talk tts audio#22833
ngutman merged 3 commits intomainfrom
fix/ios-talk-tts-prefetch-pauses

Conversation

@ngutman
Copy link
Contributor

@ngutman ngutman commented Feb 21, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: Talk mode synthesized incremental TTS segments serially, so each next sentence waited for a fresh ElevenLabs request after previous playback finished.
  • Why it matters: Users heard inter-sentence dead air (commonly a few hundred ms) that made responses feel laggy.
  • What changed: Added best-effort next-segment prefetch in TalkModeManager so upcoming segment audio is requested while current segment is playing, then reused for immediate next playback.
  • What changed: Added prefetch cancellation wiring to existing interrupt/reset/cancel paths, and ensured prefetch uses a compatible output format (mp3_44100) when PCM is restricted by plan.
  • What did NOT change (scope boundary): Incremental sentence splitting/buffering behavior and ElevenLabsKit package code.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #

User-visible / Behavior Changes

  • Talk mode should have reduced pauses between incremental spoken sentences.
  • Existing barge-in/interrupt behavior remains intact while cancelling in-flight prefetch work.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No) (same ElevenLabs endpoint; call timing changed via prefetch)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS + iOS device
  • Runtime/container: iOS app (Debug)
  • Model/provider: ElevenLabs TTS
  • Integration/channel (if any): Talk mode (voice)
  • Relevant config (redacted): Existing Talk configuration

Steps

  1. Trigger Talk mode with a prompt that produces a multi-sentence response.
  2. Observe incremental TTS logs and segment transitions.
  3. Confirm prefetch readiness/consumption and listen for reduced sentence transition delay.

Expected

  • Next segment audio is already prefetched during current playback, so transitions are tighter.

Actual

  • Device logs show prefetch ready followed by prefetch hit / prefetched=true on subsequent segments.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • xcodebuild Debug build succeeds for iOS after changes.
    • Installed and launched on physical iPhone.
    • Captured app logs from device container; observed prefetch success (prefetch ready + prefetch hit) and removal of prior 403 prefetch failures in the updated path.
  • Edge cases checked:
    • First segment (no prefetch available) still plays.
    • Follow-up segments consume prefetched audio.
    • Prefetch tasks are cancelled on stop/reset flows.
  • What you did not verify:
    • Automated unit/integration test coverage for this specific flow.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
    • Revert this PR/commit.
  • Files/config to restore:
    • apps/ios/Sources/Voice/TalkModeManager.swift
    • apps/ios/Sources/Gateway/GatewayConnectionController.swift
  • Known bad symptoms reviewers should watch for:
    • Missing or repeated incremental segments.
    • Interrupted speech not cancelling promptly.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: Prefetched audio can become stale if queue/context shifts.
    • Mitigation: Prefetch entries are keyed by segment/context and are discarded/replaced when mismatched.
  • Risk: In-flight prefetch could leak work on interruption.
    • Mitigation: Cancellation now explicitly covers prefetch tasks and monitor task during stop/reset/teardown.

Greptile Summary

Adds best-effort prefetch for upcoming incremental TTS segments in Talk mode on iOS. While the current segment plays, the next segment's audio is requested from ElevenLabs in the background and buffered. When playback advances, prefetched audio is consumed immediately, eliminating the inter-sentence dead air caused by serial synthesis requests.

  • Introduces IncrementalSpeechPrefetchState and IncrementalPrefetchedAudio to manage prefetch lifecycle and buffered audio data.
  • A polling monitor (startIncrementalPrefetchMonitor) watches for the next queued segment and triggers prefetch during playback.
  • Format-aware prefetch: when the context's output format is PCM (restricted by plan), the prefetch uses mp3_44100 instead, and the playback path correctly routes to the mp3 player.
  • Cancellation wiring added to resetIncrementalSpeech, cancelIncrementalSpeech, and the speech task's defer block to clean up in-flight prefetch on interrupt/reset/teardown.
  • Extracts makeIncrementalTTSRequest to eliminate duplicated request construction logic.
  • Minor incidental change in GatewayConnectionController.swift: replaces ntohl() with Swift-native UInt32(bigEndian:) for IPv4 loopback detection (functionally equivalent).

Confidence Score: 4/5

  • This PR is safe to merge with low risk — it adds an additive prefetch optimization with proper cancellation wiring and no changes to existing control flow semantics.
  • The prefetch logic is well-structured: keyed by segment+context for staleness detection, cancellation is wired into all interrupt/reset paths, and format resolution correctly handles the PCM-to-mp3 fallback. The main speech loop's existing behavior is preserved — prefetch is purely additive. The ntohlUInt32(bigEndian:) change is functionally equivalent. No security or data-access changes. Score docked from 5 only because the prefetch polling monitor and async task await patterns add concurrency complexity that would benefit from automated test coverage.
  • No files require special attention. TalkModeManager.swift carries the bulk of the change but the logic is sound.

Last reviewed commit: b45f4f8

@openclaw-barnacle openclaw-barnacle bot added app: ios App: ios size: M maintainer Maintainer-authored PR labels Feb 21, 2026
@ngutman ngutman force-pushed the fix/ios-talk-tts-prefetch-pauses branch from b45f4f8 to 741c537 Compare February 21, 2026 18:51
@ngutman ngutman merged commit 3ed71d6 into main Feb 21, 2026
10 checks passed
@ngutman ngutman deleted the fix/ios-talk-tts-prefetch-pauses branch February 21, 2026 18:52
@ngutman
Copy link
Contributor Author

ngutman commented Feb 21, 2026

Landed via temp rebase onto main.

  • Gate: pnpm check && pnpm build && pnpm test
  • Land commit: 741c537
  • Merge commit: 3ed71d6

Thanks @ngutman!

obviyus pushed a commit to guirguispierre/openclaw that referenced this pull request Feb 22, 2026
00xglitch pushed a commit to 00xglitch/openclaw that referenced this pull request Feb 22, 2026
00xglitch pushed a commit to 00xglitch/openclaw that referenced this pull request Feb 23, 2026
mreedr pushed a commit to mreedr/openclaw-custom that referenced this pull request Feb 24, 2026
00xglitch pushed a commit to 00xglitch/openclaw that referenced this pull request Feb 24, 2026
00xglitch pushed a commit to 00xglitch/openclaw that referenced this pull request Feb 24, 2026
clawd-xsl pushed a commit to clawd-xsl/openclaw that referenced this pull request Feb 26, 2026
hughdidit pushed a commit to hughdidit/DAISy-Agency that referenced this pull request Mar 1, 2026
…nks @ngutman)

(cherry picked from commit 3ed71d6)

# Conflicts:
#	CHANGELOG.md
hughdidit pushed a commit to hughdidit/DAISy-Agency that referenced this pull request Mar 3, 2026
…nks @ngutman)

(cherry picked from commit 3ed71d6)

# Conflicts:
#	CHANGELOG.md
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: ios App: ios maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant