refactor(voice): catalog voice models through providers#87794
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 11:34 PM ET / 03:34 UTC. Summary Reproducibility: not applicable. this is a refactor/feature PR rather than a bug report. The relevant verification is the PR's focused tests, live speech matrix, Crabbox check:changed, and current diff inspection. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land this only after a maintainer accepts the compatibility and auth-provider routing changes at the final head, keeping the migration/legacy alias coverage and dependency-graph approval intact. Do we have a high-confidence way to reproduce the issue? Not applicable; this is a refactor/feature PR rather than a bug report. The relevant verification is the PR's focused tests, live speech matrix, Crabbox check:changed, and current diff inspection. Is this the best way to solve the issue? Yes, with maintainer acceptance: routing voice-capable models through provider-owned catalog metadata is a clean owner-boundary direction, but the config rename and auth-provider selection behavior are compatibility-sensitive. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a6472718da9. Label changesLabel justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
7d7efdb to
c726915
Compare
Dependency Changes DetectedThis PR changes dependency-related files. Maintainers should confirm these changes are intentional. Changed files:
Maintainer follow-up:
|
Dependency graph change authorizedThis PR includes dependency graph changes. A member of
A later push changes the PR head SHA and requires a fresh security approval. |
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
4 similar comments
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
|
Verification before merge: Behavior addressed: Speech core is now internalized as Real environment tested: local macOS checkout plus GitHub Actions CI on Exact steps or command run after this patch:
Evidence after fix: autoreview clean with no accepted/actionable findings. Full CI run Observed result after fix: TTS synthesis, STT, realtime STT, voice-note transcode paths, speaker compatibility, and voice-model routing all pass focused and live coverage; no CI failures remain. What was not tested: Vydra video generation modes were not enabled for this speech refactor validation. |
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
2 similar comments
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
|
/allow-dependencies-change speech-core is now a private workspace package and needs the root openclaw workspace package for its emitted plugin-sdk imports; lockfile delta is only workspace graph wiring. |
|
Behavior addressed: Speech core is now an internal package surface ( Real environment tested: Local macOS checkout plus GitHub Actions on SHA 736a9e9. Live provider speech matrix ran with Molty 1Password keys for OpenAI, ElevenLabs, MiniMax, Google, xAI, Xiaomi, and Vydra; key values were not printed. Exact steps or command run after this patch:
Evidence after fix: Build, dependency scan, test typecheck, focused speech/gateway/status tests, doctor timeout regression test, live speech provider matrix, manual CI, dependency guard rerun, replacement CodeQL network-runtime check, and autoreview are green. Observed result after fix: Live speech matrix passed 7 files with 22 tests passed and 2 Vydra video tests skipped; manual CI passed 66/66 jobs; autoreview reported no accepted/actionable findings. What was not tested: Vydra video live permutations were attempted separately before the speech-only live pass and failed due provider timeout/concurrency (HTTP 429 CONCURRENT_LIMIT), so they were excluded from the speech refactor proof. |
Summary
kind: "voice"model entriesspeakerVoice/speakerVoiceId, including schema, protocol, Discord, TTS, talk, and doctor migration coverageVerification
node scripts/run-vitest.mjs src/commands/doctor/shared/legacy-config-migrate.provider-shapes.test.ts extensions/speech-core/src/tts.test.ts src/plugins/capability-provider-runtime.test.ts src/gateway/server-methods/talk.test.ts extensions/discord/src/config-schema.test.ts src/gateway/protocol/index.test.ts- 8 files / 258 tests passedgit diff --check- passed.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main- clean, no accepted/actionable findingsnode scripts/crabbox-wrapper.mjs run --provider aws --target linux --idle-timeout 90m --ttl 240m --timing-json --stop-after always --shell -- "git fetch --deepen=200 origin main || git fetch --unshallow origin main || true; corepack pnpm check:changed"- AWS Crabbox runrun_78ebe3e450ed, leasecbx_1530356b8522, exited 0Real behavior proof
Behavior addressed: Voice-capable provider models are exposed through the model catalog as voice models; TTS/STT/realtime voice model selection follows provider capability metadata; speaker selection uses
speakerVoice/speakerVoiceIdwith migration coverage for legacyvoice/voiceName/voiceIdconfig.Real environment tested: AWS Crabbox Linux c7a.8xlarge, run
run_78ebe3e450ed, leasecbx_1530356b8522.Exact steps or command run after this patch:
corepack pnpm check:changedthroughnode scripts/crabbox-wrapper.mjs run --provider aws --target linux --idle-timeout 90m --ttl 240m --timing-json --stop-after always --shell -- "git fetch --deepen=200 origin main || git fetch --unshallow origin main || true; corepack pnpm check:changed".Evidence after fix: Crabbox run
run_78ebe3e450edexited 0 after changed-surface typecheck, core/extension lint, import-cycle, media/runtime sidecar, webhook, and pairing guard checks.Observed result after fix: Focused local tests passed 8 files / 258 tests, autoreview reported no accepted/actionable findings, and remote
check:changedpassed on branch tip517a7c23f5.What was not tested: Live paid speech/realtime provider calls were not run; this was validated through provider catalog/config/runtime unit coverage and the repo changed gate.