Skip to content

refactor(voice): catalog voice models through providers#87794

Merged
steipete merged 14 commits into
mainfrom
feat/voice-model-provider
May 29, 2026
Merged

refactor(voice): catalog voice models through providers#87794
steipete merged 14 commits into
mainfrom
feat/voice-model-provider

Conversation

@vincentkoc

@vincentkoc vincentkoc commented May 28, 2026

Copy link
Copy Markdown
Member

Summary

  • catalog speech, realtime transcription, and realtime voice provider models as unified kind: "voice" model entries
  • route TTS, realtime voice, and realtime transcription model selection through provider-owned voice model metadata without letting incompatible voice refs hide explicit providers
  • rename speaker selection config to speakerVoice / speakerVoiceId, including schema, protocol, Discord, TTS, talk, and doctor migration coverage

Verification

  • node scripts/run-vitest.mjs src/commands/doctor/shared/legacy-config-migrate.provider-shapes.test.ts extensions/speech-core/src/tts.test.ts src/plugins/capability-provider-runtime.test.ts src/gateway/server-methods/talk.test.ts extensions/discord/src/config-schema.test.ts src/gateway/protocol/index.test.ts - 8 files / 258 tests passed
  • git diff --check - passed
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main - clean, no accepted/actionable findings
  • node scripts/crabbox-wrapper.mjs run --provider aws --target linux --idle-timeout 90m --ttl 240m --timing-json --stop-after always --shell -- "git fetch --deepen=200 origin main || git fetch --unshallow origin main || true; corepack pnpm check:changed" - AWS Crabbox run run_78ebe3e450ed, lease cbx_1530356b8522, exited 0

Real behavior proof

Behavior addressed: Voice-capable provider models are exposed through the model catalog as voice models; TTS/STT/realtime voice model selection follows provider capability metadata; speaker selection uses speakerVoice / speakerVoiceId with migration coverage for legacy voice / voiceName / voiceId config.

Real environment tested: AWS Crabbox Linux c7a.8xlarge, run run_78ebe3e450ed, lease cbx_1530356b8522.

Exact steps or command run after this patch: corepack pnpm check:changed through node scripts/crabbox-wrapper.mjs run --provider aws --target linux --idle-timeout 90m --ttl 240m --timing-json --stop-after always --shell -- "git fetch --deepen=200 origin main || git fetch --unshallow origin main || true; corepack pnpm check:changed".

Evidence after fix: Crabbox run run_78ebe3e450ed exited 0 after changed-surface typecheck, core/extension lint, import-cycle, media/runtime sidecar, webhook, and pairing guard checks.

Observed result after fix: Focused local tests passed 8 files / 258 tests, autoreview reported no accepted/actionable findings, and remote check:changed passed on branch tip 517a7c23f5.

What was not tested: Live paid speech/realtime provider calls were not run; this was validated through provider catalog/config/runtime unit coverage and the repo changed gate.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: discord Channel integration: discord channel: voice-call Channel integration: voice-call gateway Gateway runtime commands Command implementations agents Agent runtime and tooling extensions: openai extensions: minimax extensions: xiaomi plugin: google-meet extensions: tts-local-cli extensions: inworld Extension: inworld plugin: azure-speech Azure Speech plugin extensions: elevenlabs extensions: google size: XL maintainer Maintainer-authored PR labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 11:34 PM ET / 03:34 UTC.

Summary
The PR catalogs TTS, realtime transcription, and realtime voice models as provider-owned voice models, routes selection through that metadata, renames speaker voice config fields with migration coverage, and moves speech-core to packages/speech-core.

Reproducibility: not applicable. this is a refactor/feature PR rather than a bug report. The relevant verification is the PR's focused tests, live speech matrix, Crabbox check:changed, and current diff inspection.

Review metrics: 2 noteworthy metrics.

  • Config/default surfaces: 1 added family, 1 renamed family. agents.defaults.voiceModel is new and speaker selection moves from voice/voiceName/voiceId to speakerVoice/speakerVoiceId, which matters for upgrades and operator docs.
  • Workspace dependency graph: 1 importer removed, 1 importer added. speech-core moves from extensions/speech-core to packages/speech-core with a workspace dependency on openclaw, so the dependency approval must match the final head.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster ✨ media proof bonus
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Maintainer acceptance of the compatibility and auth-provider routing risk at the final head.
  • Confirm the dependency graph approval still covers the exact head that lands.

Risk before merge

  • [P1] The PR renames speaker selection config to speakerVoice/speakerVoiceId and adds agents.defaults.voiceModel; doctor migration and compatibility aliases reduce upgrade risk, but this still changes persisted config/default surfaces.
  • [P1] Voice model refs now influence provider activation and default model selection across TTS, realtime transcription, and realtime voice, so auth-backed provider loading and explicit provider/model precedence need maintainer acceptance.
  • [P1] Moving speech-core into packages/speech-core changes package exports and the workspace lockfile graph; comments say the delta is intentional workspace wiring, but maintainers still need to accept that packaging boundary before merge.

Maintainer options:

  1. Accept with final-head maintainer gate (recommended)
    A maintainer can land the PR after confirming the final head still matches the reviewed config migration, provider routing, and workspace dependency intent.
  2. Request narrower upgrade proof
    Maintainers can ask for one more focused upgrade proof if they want direct before/after evidence for existing voice/voiceId configs beyond the tests and doctor coverage already reported.
  3. Split package move from routing
    If the combined compatibility and dependency graph change is too broad, pause this branch and split the speech-core package move from voice model routing.

Next step before merge

  • [P2] The remaining action is human maintainer acceptance of protected compatibility, auth-provider routing, and dependency graph changes, not an automated repair.

Security
Cleared: The diff changes lockfile/workspace package wiring but does not introduce a new third-party package source, lifecycle hook, permission expansion, or unresolved secret-handling concern.

Review details

Best possible solution:

Land this only after a maintainer accepts the compatibility and auth-provider routing changes at the final head, keeping the migration/legacy alias coverage and dependency-graph approval intact.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this is a refactor/feature PR rather than a bug report. The relevant verification is the PR's focused tests, live speech matrix, Crabbox check:changed, and current diff inspection.

Is this the best way to solve the issue?

Yes, with maintainer acceptance: routing voice-capable models through provider-owned catalog metadata is a clean owner-boundary direction, but the config rename and auth-provider selection behavior are compatibility-sensitive.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a6472718da9.

Label changes

Label justifications:

  • P2: The PR is a normal-priority but broad voice provider/config refactor with bounded blast radius and strong validation, not an emergency regression.
  • merge-risk: 🚨 compatibility: The diff changes persisted config/default surfaces, migration behavior, package exports, and workspace dependency wiring that can affect upgrades.
  • merge-risk: 🚨 auth-provider: The diff changes how voice model refs select and activate speech, realtime transcription, and realtime voice providers, which can affect credential-backed provider routing.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • feature: ✨ showcase: ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. Unifying speech, transcription, and realtime voice models under provider-owned catalog metadata is a strategically useful capability for voice workflows and model pickers.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (linked_artifact): The PR includes after-change Crabbox changed-gate proof and maintainer-reported live speech provider tests, full CI, build, type, lint, and import proof for the changed behavior.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR includes after-change Crabbox changed-gate proof and maintainer-reported live speech provider tests, full CI, build, type, lint, and import proof for the changed behavior.
Evidence reviewed

What I checked:

  • Protected live PR state: Public GitHub API shows the PR is open at head 736a9e9, authored by a MEMBER, and labeled maintainer, proof: sufficient, dependencies-changed, merge-risk: compatibility, and merge-risk: auth-provider. (736a9e915a7e)
  • Repository policy applied: AGENTS.md marks plugin APIs, provider routing, persisted preferences, config/default additions, migrations, setup, startup checks, and fallback behavior as compatibility-sensitive merge risk, which directly applies to this diff. (AGENTS.md:26, 5a6472718da9)
  • Speaker compatibility path: The diff adds speaker compatibility helpers that copy canonical speakerVoice/speakerVoiceId and legacy voice/voiceName/voiceId fields, supporting the rename while preserving fallback behavior. (packages/speech-core/speaker.ts:1, 736a9e915a7e)
  • Voice provider routing change: The diff makes Talk realtime and transcription defaults resolve through agents.defaults.voiceModel and provider-owned model metadata, which is the core auth-provider routing surface that needs maintainer acceptance. (src/gateway/server-methods/talk-shared.ts:122, 736a9e915a7e)
  • Plugin API metadata surface: The diff expands provider plugin types with defaultModel/models fields for speech, realtime transcription, and realtime voice providers, making this a plugin API compatibility surface rather than a local-only refactor. (src/plugins/types.ts:1823, 736a9e915a7e)
  • Dependency graph wiring: The lockfile removes the extensions/speech-core importer and adds packages/speech-core with a workspace dependency on openclaw; comments state this was intentionally allowed as workspace graph wiring for the current head. (pnpm-lock.yaml:1455, 736a9e915a7e)

Likely related people:

  • steipete: Recent current-main history touches plugin catalog/gateway/package-loader surfaces, and the PR discussion shows follow-up fixes, dependency approval comments, and live verification from this handle. (role: recent area contributor and reviewer; confidence: high; commits: ee3efc0152dd, 51b5f75b92f7, d7aa36877632; files: src/plugins/model-catalog-registration.ts, src/plugins/registry.ts, src/gateway/server-methods/talk-shared.ts)
  • vincentkoc: The branch author has recent merged history in plugin and gateway test/runtime paths, including bundled runtime dependency localization and public-surface contract work adjacent to this refactor. (role: feature contributor and adjacent plugin/gateway contributor; confidence: medium; commits: c727388f937f, 141c7f8eaa7a, ca4baaeb922e; files: src/plugins/registry.ts, src/plugins/types.ts, src/gateway/server-methods/talk-shared.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 28, 2026
@vincentkoc vincentkoc force-pushed the feat/voice-model-provider branch from 7d7efdb to c726915 Compare May 28, 2026 22:42
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Dependency Changes Detected

This PR changes dependency-related files. Maintainers should confirm these changes are intentional.

Changed files:

  • extensions/speech-core/package.json
  • packages/speech-core/package.json
  • pnpm-lock.yaml

Maintainer follow-up:

  • Review whether the dependency changes are intentional.
  • Inspect resolved package deltas when lockfile, shrinkwrap, or workspace dependency policy changes are present.
  • Treat package-lock.json and npm-shrinkwrap.json diffs as security-review surfaces.
  • Run pnpm deps:changes:report -- --base-ref origin/main --markdown /tmp/dependency-changes.md --json /tmp/dependency-changes.json locally for detailed release-style evidence.

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Dependency graph change authorized

This PR includes dependency graph changes. A member of @openclaw/openclaw-secops authorized this exact head SHA with /allow-dependencies-change.

  • Approved SHA: 736a9e915a7eb0499b20f9aac16de0733c3437e1
  • Approved by: @steipete
  • Reason: speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

A later push changes the PR head SHA and requires a fresh security approval.

@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

4 similar comments
@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

@steipete

Copy link
Copy Markdown
Contributor

Verification before merge:

Behavior addressed: Speech core is now internalized as @openclaw/speech-core; plugin/provider code keeps provider-specific behavior while shared TTS/speaker/voice-model routing lives in core package code. Added fixes from autoreview for migrated speaker status and explicit model alias precedence.

Real environment tested: local macOS checkout plus GitHub Actions CI on 7363968d0a17ae9c7383d4a54250cc9aacdbbb53; live provider keys were injected from 1Password Molty into the child process only.

Exact steps or command run after this patch:

  • pnpm format:check packages/speech-core/src/tts.ts packages/speech-core/src/tts.test.ts
  • node scripts/run-oxlint.mjs packages/speech-core/src/tts.ts packages/speech-core/src/tts.test.ts
  • pnpm test packages/speech-core/src/tts.test.ts -- --reporter=verbose
  • pnpm check:test-types
  • pnpm build
  • pnpm deadcode:dependencies
  • node --input-type=module -e "const m = await import('@openclaw/speech-core'); const s = await import('@openclaw/speech-core/speaker'); console.log(typeof m.synthesizeSpeech, typeof s.withSpeakerSelectionFallbackCompat);" from packages/speech-core
  • pnpm test packages/speech-core/src/tts.test.ts src/gateway/server-methods/talk.test.ts src/tts/status-config.test.ts -- --reporter=verbose
  • OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 pnpm test:live -- extensions/openai/openai-tts.live.test.ts extensions/elevenlabs/elevenlabs.live.test.ts extensions/minimax/minimax.live.test.ts extensions/google/google.live.test.ts extensions/vydra/vydra.live.test.ts extensions/xai/xai.live.test.ts extensions/xiaomi/xiaomi.live.test.ts --reporter=verbose
  • /Users/steipete/Projects/agent-skills/skills/autoreview/scripts/autoreview --mode branch --base origin/main
  • GitHub manual full CI with Android included: https://github.com/openclaw/openclaw/actions/runs/26615094476

Evidence after fix: autoreview clean with no accepted/actionable findings. Full CI run 26615094476 passed 66/66 jobs. PR required checks pass, including dependency guard after secops override. Live speech matrix passed 7 files, 22 tests passed, 2 Vydra video tests skipped because OPENCLAW_LIVE_VYDRA_VIDEO was intentionally not enabled.

Observed result after fix: TTS synthesis, STT, realtime STT, voice-note transcode paths, speaker compatibility, and voice-model routing all pass focused and live coverage; no CI failures remain.

What was not tested: Vydra video generation modes were not enabled for this speech refactor validation.

@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

2 similar comments
@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

@steipete

Copy link
Copy Markdown
Contributor

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

@steipete

Copy link
Copy Markdown
Contributor

Behavior addressed: Speech core is now an internal package surface (packages/speech-core) instead of a manifest plugin, with shared TTS/speaker/voice-model routing used by core talk/gateway paths and provider plugins reduced to provider-specific implementation/config.

Real environment tested: Local macOS checkout plus GitHub Actions on SHA 736a9e9. Live provider speech matrix ran with Molty 1Password keys for OpenAI, ElevenLabs, MiniMax, Google, xAI, Xiaomi, and Vydra; key values were not printed.

Exact steps or command run after this patch:

  • pnpm build
  • pnpm deadcode:dependencies
  • pnpm check:test-types
  • pnpm test packages/speech-core/src/tts.test.ts src/gateway/server-methods/talk.test.ts src/tts/status-config.test.ts -- --reporter=verbose
  • pnpm lint -- src/commands/doctor-config-flow.test.ts src/agents/models-config.uses-first-github-copilot-profile-env-tokens.test.ts
  • pnpm test src/commands/doctor-config-flow.test.ts -- -t "preserves commitments config on repair" --reporter=verbose
  • OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 pnpm test:live -- extensions/openai/openai-tts.live.test.ts extensions/elevenlabs/elevenlabs.live.test.ts extensions/minimax/minimax.live.test.ts extensions/google/google.live.test.ts extensions/vydra/vydra.live.test.ts extensions/xai/xai.live.test.ts extensions/xiaomi/xiaomi.live.test.ts --reporter=verbose
  • /Users/steipete/Projects/agent-skills/skills/autoreview/scripts/autoreview --mode branch --base origin/main
  • Manual CI: https://github.com/openclaw/openclaw/actions/runs/26616159812
  • Replacement CodeQL network-runtime check for stuck PR checkout job: https://github.com/openclaw/openclaw/actions/runs/26616543271

Evidence after fix: Build, dependency scan, test typecheck, focused speech/gateway/status tests, doctor timeout regression test, live speech provider matrix, manual CI, dependency guard rerun, replacement CodeQL network-runtime check, and autoreview are green.

Observed result after fix: Live speech matrix passed 7 files with 22 tests passed and 2 Vydra video tests skipped; manual CI passed 66/66 jobs; autoreview reported no accepted/actionable findings.

What was not tested: Vydra video live permutations were attempted separately before the speech-only live pass and failed due provider timeout/concurrency (HTTP 429 CONCURRENT_LIMIT), so they were excluded from the speech refactor proof.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui channel: discord Channel integration: discord channel: voice-call Channel integration: voice-call commands Command implementations dependencies-changed PR changes dependency-related files docs Improvements or additions to documentation extensions: elevenlabs extensions: google extensions: inworld Extension: inworld extensions: minimax extensions: openai extensions: tts-local-cli extensions: xiaomi feature: ✨ showcase ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. plugin: azure-speech Azure Speech plugin plugin: google-meet proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. scripts Repository scripts size: XL status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants