refactor(voice): catalog voice models through providers by vincentkoc · Pull Request #87794 · openclaw/openclaw

vincentkoc · 2026-05-28T22:09:11Z

Summary

catalog speech, realtime transcription, and realtime voice provider models as unified kind: "voice" model entries
route TTS, realtime voice, and realtime transcription model selection through provider-owned voice model metadata without letting incompatible voice refs hide explicit providers
rename speaker selection config to speakerVoice / speakerVoiceId, including schema, protocol, Discord, TTS, talk, and doctor migration coverage

Verification

node scripts/run-vitest.mjs src/commands/doctor/shared/legacy-config-migrate.provider-shapes.test.ts extensions/speech-core/src/tts.test.ts src/plugins/capability-provider-runtime.test.ts src/gateway/server-methods/talk.test.ts extensions/discord/src/config-schema.test.ts src/gateway/protocol/index.test.ts - 8 files / 258 tests passed
git diff --check - passed
.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main - clean, no accepted/actionable findings
node scripts/crabbox-wrapper.mjs run --provider aws --target linux --idle-timeout 90m --ttl 240m --timing-json --stop-after always --shell -- "git fetch --deepen=200 origin main || git fetch --unshallow origin main || true; corepack pnpm check:changed" - AWS Crabbox run run_78ebe3e450ed, lease cbx_1530356b8522, exited 0

Real behavior proof

Behavior addressed: Voice-capable provider models are exposed through the model catalog as voice models; TTS/STT/realtime voice model selection follows provider capability metadata; speaker selection uses speakerVoice / speakerVoiceId with migration coverage for legacy voice / voiceName / voiceId config.

Real environment tested: AWS Crabbox Linux c7a.8xlarge, run run_78ebe3e450ed, lease cbx_1530356b8522.

Exact steps or command run after this patch: corepack pnpm check:changed through node scripts/crabbox-wrapper.mjs run --provider aws --target linux --idle-timeout 90m --ttl 240m --timing-json --stop-after always --shell -- "git fetch --deepen=200 origin main || git fetch --unshallow origin main || true; corepack pnpm check:changed".

Evidence after fix: Crabbox run run_78ebe3e450ed exited 0 after changed-surface typecheck, core/extension lint, import-cycle, media/runtime sidecar, webhook, and pairing guard checks.

Observed result after fix: Focused local tests passed 8 files / 258 tests, autoreview reported no accepted/actionable findings, and remote check:changed passed on branch tip 517a7c23f5.

What was not tested: Live paid speech/realtime provider calls were not run; this was validated through provider catalog/config/runtime unit coverage and the repo changed gate.

clawsweeper · 2026-05-28T22:10:38Z

Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 11:34 PM ET / 03:34 UTC.

Summary
The PR catalogs TTS, realtime transcription, and realtime voice models as provider-owned voice models, routes selection through that metadata, renames speaker voice config fields with migration coverage, and moves speech-core to packages/speech-core.

Reproducibility: not applicable. this is a refactor/feature PR rather than a bug report. The relevant verification is the PR's focused tests, live speech matrix, Crabbox check:changed, and current diff inspection.

Review metrics: 2 noteworthy metrics.

Config/default surfaces: 1 added family, 1 renamed family. agents.defaults.voiceModel is new and speaker selection moves from voice/voiceName/voiceId to speakerVoice/speakerVoiceId, which matters for upgrades and operator docs.
Workspace dependency graph: 1 importer removed, 1 importer added. speech-core moves from extensions/speech-core to packages/speech-core with a workspace dependency on openclaw, so the dependency approval must match the final head.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster ✨ media proof bonus
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Maintainer acceptance of the compatibility and auth-provider routing risk at the final head.
Confirm the dependency graph approval still covers the exact head that lands.

Risk before merge

[P1] The PR renames speaker selection config to speakerVoice/speakerVoiceId and adds agents.defaults.voiceModel; doctor migration and compatibility aliases reduce upgrade risk, but this still changes persisted config/default surfaces.
[P1] Voice model refs now influence provider activation and default model selection across TTS, realtime transcription, and realtime voice, so auth-backed provider loading and explicit provider/model precedence need maintainer acceptance.
[P1] Moving speech-core into packages/speech-core changes package exports and the workspace lockfile graph; comments say the delta is intentional workspace wiring, but maintainers still need to accept that packaging boundary before merge.

Maintainer options:

Accept with final-head maintainer gate (recommended)
A maintainer can land the PR after confirming the final head still matches the reviewed config migration, provider routing, and workspace dependency intent.
Request narrower upgrade proof
Maintainers can ask for one more focused upgrade proof if they want direct before/after evidence for existing voice/voiceId configs beyond the tests and doctor coverage already reported.
Split package move from routing
If the combined compatibility and dependency graph change is too broad, pause this branch and split the speech-core package move from voice model routing.

Next step before merge

[P2] The remaining action is human maintainer acceptance of protected compatibility, auth-provider routing, and dependency graph changes, not an automated repair.

Security
Cleared: The diff changes lockfile/workspace package wiring but does not introduce a new third-party package source, lifecycle hook, permission expansion, or unresolved secret-handling concern.

Review details

Best possible solution:

Land this only after a maintainer accepts the compatibility and auth-provider routing changes at the final head, keeping the migration/legacy alias coverage and dependency-graph approval intact.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this is a refactor/feature PR rather than a bug report. The relevant verification is the PR's focused tests, live speech matrix, Crabbox check:changed, and current diff inspection.

Is this the best way to solve the issue?

Yes, with maintainer acceptance: routing voice-capable models through provider-owned catalog metadata is a clean owner-boundary direction, but the config rename and auth-provider selection behavior are compatibility-sensitive.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a6472718da9.

Label changes

Label justifications:

P2: The PR is a normal-priority but broad voice provider/config refactor with bounded blast radius and strong validation, not an emergency regression.
merge-risk: 🚨 compatibility: The diff changes persisted config/default surfaces, migration behavior, package exports, and workspace dependency wiring that can affect upgrades.
merge-risk: 🚨 auth-provider: The diff changes how voice model refs select and activate speech, realtime transcription, and realtime voice providers, which can affect credential-backed provider routing.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
feature: ✨ showcase: ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. Unifying speech, transcription, and realtime voice models under provider-owned catalog metadata is a strategically useful capability for voice workflows and model pickers.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (linked_artifact): The PR includes after-change Crabbox changed-gate proof and maintainer-reported live speech provider tests, full CI, build, type, lint, and import proof for the changed behavior.
proof: sufficient: Contributor real behavior proof is sufficient. The PR includes after-change Crabbox changed-gate proof and maintainer-reported live speech provider tests, full CI, build, type, lint, and import proof for the changed behavior.

Evidence reviewed

What I checked:

Protected live PR state: Public GitHub API shows the PR is open at head 736a9e9, authored by a MEMBER, and labeled maintainer, proof: sufficient, dependencies-changed, merge-risk: compatibility, and merge-risk: auth-provider. (736a9e915a7e)
Repository policy applied: AGENTS.md marks plugin APIs, provider routing, persisted preferences, config/default additions, migrations, setup, startup checks, and fallback behavior as compatibility-sensitive merge risk, which directly applies to this diff. (AGENTS.md:26, 5a6472718da9)
Speaker compatibility path: The diff adds speaker compatibility helpers that copy canonical speakerVoice/speakerVoiceId and legacy voice/voiceName/voiceId fields, supporting the rename while preserving fallback behavior. (packages/speech-core/speaker.ts:1, 736a9e915a7e)
Voice provider routing change: The diff makes Talk realtime and transcription defaults resolve through agents.defaults.voiceModel and provider-owned model metadata, which is the core auth-provider routing surface that needs maintainer acceptance. (src/gateway/server-methods/talk-shared.ts:122, 736a9e915a7e)
Plugin API metadata surface: The diff expands provider plugin types with defaultModel/models fields for speech, realtime transcription, and realtime voice providers, making this a plugin API compatibility surface rather than a local-only refactor. (src/plugins/types.ts:1823, 736a9e915a7e)
Dependency graph wiring: The lockfile removes the extensions/speech-core importer and adds packages/speech-core with a workspace dependency on openclaw; comments state this was intentionally allowed as workspace graph wiring for the current head. (pnpm-lock.yaml:1455, 736a9e915a7e)

Likely related people:

steipete: Recent current-main history touches plugin catalog/gateway/package-loader surfaces, and the PR discussion shows follow-up fixes, dependency approval comments, and live verification from this handle. (role: recent area contributor and reviewer; confidence: high; commits: ee3efc0152dd, 51b5f75b92f7, d7aa36877632; files: src/plugins/model-catalog-registration.ts, src/plugins/registry.ts, src/gateway/server-methods/talk-shared.ts)
vincentkoc: The branch author has recent merged history in plugin and gateway test/runtime paths, including bundled runtime dependency localization and public-surface contract work adjacent to this refactor. (role: feature contributor and adjacent plugin/gateway contributor; confidence: medium; commits: c727388f937f, 141c7f8eaa7a, ca4baaeb922e; files: src/plugins/registry.ts, src/plugins/types.ts, src/gateway/server-methods/talk-shared.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

github-actions · 2026-05-29T02:20:13Z

Dependency Changes Detected

This PR changes dependency-related files. Maintainers should confirm these changes are intentional.

Changed files:

extensions/speech-core/package.json
packages/speech-core/package.json
pnpm-lock.yaml

Maintainer follow-up:

Review whether the dependency changes are intentional.
Inspect resolved package deltas when lockfile, shrinkwrap, or workspace dependency policy changes are present.
Treat package-lock.json and npm-shrinkwrap.json diffs as security-review surfaces.
Run pnpm deps:changes:report -- --base-ref origin/main --markdown /tmp/dependency-changes.md --json /tmp/dependency-changes.json locally for detailed release-style evidence.

github-actions · 2026-05-29T02:20:14Z

Dependency graph change authorized

This PR includes dependency graph changes. A member of @openclaw/openclaw-secops authorized this exact head SHA with /allow-dependencies-change.

Approved SHA: 736a9e915a7eb0499b20f9aac16de0733c3437e1
Approved by: @steipete
Reason: speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

A later push changes the PR head SHA and requires a fresh security approval.

steipete · 2026-05-29T02:21:24Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T02:30:15Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T02:42:17Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T02:48:34Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T02:53:22Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T03:00:33Z

Verification before merge:

Behavior addressed: Speech core is now internalized as @openclaw/speech-core; plugin/provider code keeps provider-specific behavior while shared TTS/speaker/voice-model routing lives in core package code. Added fixes from autoreview for migrated speaker status and explicit model alias precedence.

Real environment tested: local macOS checkout plus GitHub Actions CI on 7363968d0a17ae9c7383d4a54250cc9aacdbbb53; live provider keys were injected from 1Password Molty into the child process only.

Exact steps or command run after this patch:

pnpm format:check packages/speech-core/src/tts.ts packages/speech-core/src/tts.test.ts
node scripts/run-oxlint.mjs packages/speech-core/src/tts.ts packages/speech-core/src/tts.test.ts
pnpm test packages/speech-core/src/tts.test.ts -- --reporter=verbose
pnpm check:test-types
pnpm build
pnpm deadcode:dependencies
node --input-type=module -e "const m = await import('@openclaw/speech-core'); const s = await import('@openclaw/speech-core/speaker'); console.log(typeof m.synthesizeSpeech, typeof s.withSpeakerSelectionFallbackCompat);" from packages/speech-core
pnpm test packages/speech-core/src/tts.test.ts src/gateway/server-methods/talk.test.ts src/tts/status-config.test.ts -- --reporter=verbose
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 pnpm test:live -- extensions/openai/openai-tts.live.test.ts extensions/elevenlabs/elevenlabs.live.test.ts extensions/minimax/minimax.live.test.ts extensions/google/google.live.test.ts extensions/vydra/vydra.live.test.ts extensions/xai/xai.live.test.ts extensions/xiaomi/xiaomi.live.test.ts --reporter=verbose
/Users/steipete/Projects/agent-skills/skills/autoreview/scripts/autoreview --mode branch --base origin/main
GitHub manual full CI with Android included: https://github.com/openclaw/openclaw/actions/runs/26615094476

Evidence after fix: autoreview clean with no accepted/actionable findings. Full CI run 26615094476 passed 66/66 jobs. PR required checks pass, including dependency guard after secops override. Live speech matrix passed 7 files, 22 tests passed, 2 Vydra video tests skipped because OPENCLAW_LIVE_VYDRA_VIDEO was intentionally not enabled.

Observed result after fix: TTS synthesis, STT, realtime STT, voice-note transcode paths, speaker compatibility, and voice-model routing all pass focused and live coverage; no CI failures remain.

What was not tested: Vydra video generation modes were not enabled for this speech refactor validation.

steipete · 2026-05-29T03:02:59Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T03:08:38Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T03:25:57Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T03:28:29Z

/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring.

steipete · 2026-05-29T03:46:39Z

Behavior addressed: Speech core is now an internal package surface (packages/speech-core) instead of a manifest plugin, with shared TTS/speaker/voice-model routing used by core talk/gateway paths and provider plugins reduced to provider-specific implementation/config.

Real environment tested: Local macOS checkout plus GitHub Actions on SHA 736a9e9. Live provider speech matrix ran with Molty 1Password keys for OpenAI, ElevenLabs, MiniMax, Google, xAI, Xiaomi, and Vydra; key values were not printed.

Exact steps or command run after this patch:

pnpm build
pnpm deadcode:dependencies
pnpm check:test-types
pnpm test packages/speech-core/src/tts.test.ts src/gateway/server-methods/talk.test.ts src/tts/status-config.test.ts -- --reporter=verbose
pnpm lint -- src/commands/doctor-config-flow.test.ts src/agents/models-config.uses-first-github-copilot-profile-env-tokens.test.ts
pnpm test src/commands/doctor-config-flow.test.ts -- -t "preserves commitments config on repair" --reporter=verbose
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 pnpm test:live -- extensions/openai/openai-tts.live.test.ts extensions/elevenlabs/elevenlabs.live.test.ts extensions/minimax/minimax.live.test.ts extensions/google/google.live.test.ts extensions/vydra/vydra.live.test.ts extensions/xai/xai.live.test.ts extensions/xiaomi/xiaomi.live.test.ts --reporter=verbose
/Users/steipete/Projects/agent-skills/skills/autoreview/scripts/autoreview --mode branch --base origin/main
Manual CI: https://github.com/openclaw/openclaw/actions/runs/26616159812
Replacement CodeQL network-runtime check for stuck PR checkout job: https://github.com/openclaw/openclaw/actions/runs/26616543271

Evidence after fix: Build, dependency scan, test typecheck, focused speech/gateway/status tests, doctor timeout regression test, live speech provider matrix, manual CI, dependency guard rerun, replacement CodeQL network-runtime check, and autoreview are green.

Observed result after fix: Live speech matrix passed 7 files with 22 tests passed and 2 Vydra video tests skipped; manual CI passed 66/66 jobs; autoreview reported no accepted/actionable findings.

What was not tested: Vydra video live permutations were attempted separately before the speech-only live pass and failed due provider timeout/concurrency (HTTP 429 CONCURRENT_LIMIT), so they were excluded from the speech refactor proof.

vincentkoc force-pushed the feat/voice-model-provider branch from 7d7efdb to c726915 Compare May 28, 2026 22:42

vincentkoc and others added 14 commits May 29, 2026 04:26

refactor(providers): catalog voice models

ca4baae

feat(tts): route speech through voice models

b4aee7c

refactor(tts): rename speaker selection fields

40f2dc0

refactor(tts): mark default speech models

bec2c24

test(tts): type migrated speaker config assertions

9682ae1

refactor(providers): avoid catalog merge map spread

50d34ed

fix(tts): honor voice model fallbacks

530869d

refactor(tts): move speech core into package

329556d

chore(tts): register speech core knip workspace

d75d860

fix(tts): show migrated speaker voice in status

b7cffc1

fix(tts): satisfy speech core lint

fa69c33

fix(tts): preserve explicit model aliases

1809c51

test(tts): narrow provider config assertion

7454c9a

test(doctor): allow slow commitments repair check

736a9e9

Uh oh!

Conversation

vincentkoc commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Dependency Changes Detected

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency graph change authorized

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

steipete commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vincentkoc commented May 28, 2026 •

edited

Loading

clawsweeper Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading