You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@ Docs: https://docs.openclaw.ai
11
11
- Diagnostics/OTEL: support `OPENCLAW_OTEL_PRELOADED=1` so the plugin can reuse an already-registered OpenTelemetry SDK while keeping OpenClaw diagnostic listeners wired. (#71450) Thanks @vincentkoc and @jlapenna.
12
12
- Control UI: refine the agent Tool Access panel with compact live-tool chips, collapsible tool groups, direct per-tool toggles, and clearer runtime/source provenance. (#71405) Thanks @BunsDev.
13
13
- Memory-core/hybrid search: expose raw `vectorScore` and `textScore` alongside the combined `score` on hybrid memory search results, so callers can inspect vector-versus-text retrieval contribution before temporal decay or MMR reordering. Fixes #68166. (#68286) Thanks @ajfonthemove.
14
+
- Providers/Xiaomi: add MiMo TTS as a bundled speech provider with MP3/WAV output and voice-note Opus transcoding. Fixes #52376. (#55614) Thanks @zoujiejun.
-[xAI Text to Speech](https://docs.x.ai/developers/rest-api-reference/inference/voice#text-to-speech-rest)
@@ -231,6 +234,34 @@ Resolution order is `messages.tts.providers.xai.apiKey` -> `XAI_API_KEY`.
231
234
Current live voices are `ara`, `eve`, `leo`, `rex`, `sal`, and `una`; `eve` is
232
235
the default. `language` accepts a BCP-47 tag or `auto`.
233
236
237
+
### Xiaomi MiMo primary
238
+
239
+
```json5
240
+
{
241
+
messages: {
242
+
tts: {
243
+
auto:"always",
244
+
provider:"xiaomi",
245
+
providers: {
246
+
xiaomi: {
247
+
apiKey:"xiaomi_api_key",
248
+
baseUrl:"https://api.xiaomimimo.com/v1",
249
+
model:"mimo-v2.5-tts",
250
+
voice:"mimo_default",
251
+
format:"mp3",
252
+
style:"Bright, natural, conversational tone.",
253
+
},
254
+
},
255
+
},
256
+
},
257
+
}
258
+
```
259
+
260
+
Xiaomi MiMo TTS uses the same `XIAOMI_API_KEY` path as the bundled Xiaomi model
261
+
provider. The speech provider id is `xiaomi`; `mimo` is accepted as an alias.
262
+
The target text is sent as the assistant message, matching Xiaomi's TTS
263
+
contract. Optional `style` is sent as a user instruction and is not spoken.
264
+
234
265
### OpenRouter primary
235
266
236
267
```json5
@@ -345,7 +376,7 @@ Then run:
345
376
-`tagged` only sends audio when the reply includes `[[tts:key=value]]` directives or a `[[tts:text]]...[[/tts:text]]` block.
346
377
-`enabled`: legacy toggle (doctor migrates this to `auto`).
347
378
-`mode`: `"final"` (default) or `"all"` (includes tool/block replies).
348
-
-`provider`: speech provider id such as `"elevenlabs"`, `"google"`, `"gradium"`, `"microsoft"`, `"minimax"`, `"openai"`, `"vydra"`, or `"xai"` (fallback is automatic).
379
+
-`provider`: speech provider id such as `"elevenlabs"`, `"google"`, `"gradium"`, `"microsoft"`, `"minimax"`, `"openai"`, `"vydra"`, `"xai"`, or `"xiaomi"` (fallback is automatic).
349
380
- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
350
381
- Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
351
382
rewritten to `provider: "microsoft"`.
@@ -359,7 +390,7 @@ Then run:
359
390
-`maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
360
391
-`timeoutMs`: request timeout (ms).
361
392
-`prefsPath`: override the local prefs JSON path (provider/limit/summary).
362
-
-`apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `GEMINI_API_KEY`/`GOOGLE_API_KEY`, `GRADIUM_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`, `VYDRA_API_KEY`, `XAI_API_KEY`).
393
+
-`apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `GEMINI_API_KEY`/`GOOGLE_API_KEY`, `GRADIUM_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`, `VYDRA_API_KEY`, `XAI_API_KEY`, `XIAOMI_API_KEY`).
363
394
-`providers.elevenlabs.baseUrl`: override ElevenLabs API base URL.
364
395
-`providers.openai.baseUrl`: override the OpenAI TTS endpoint.
-`providers.xiaomi.apiKey`: Xiaomi MiMo API key (env: `XIAOMI_API_KEY`).
426
+
-`providers.xiaomi.baseUrl`: override the Xiaomi MiMo API base URL (default `https://api.xiaomimimo.com/v1`, env: `XIAOMI_BASE_URL`).
427
+
-`providers.xiaomi.model`: TTS model (default `mimo-v2.5-tts`, env: `XIAOMI_TTS_MODEL`; `mimo-v2-tts` is also supported).
428
+
-`providers.xiaomi.voice`: MiMo voice id (default `mimo_default`, env: `XIAOMI_TTS_VOICE`).
429
+
-`providers.xiaomi.format`: `mp3` or `wav` (default `mp3`, env: `XIAOMI_TTS_FORMAT`).
430
+
-`providers.xiaomi.style`: optional natural-language style instruction sent as the user message; it is not spoken.
394
431
-`providers.openrouter.apiKey`: OpenRouter API key (env: `OPENROUTER_API_KEY`; can reuse `models.providers.openrouter.apiKey`).
395
432
-`providers.openrouter.baseUrl`: override the OpenRouter TTS base URL (default `https://openrouter.ai/api/v1`; legacy `https://openrouter.ai/v1` is normalized).
396
433
-`providers.openrouter.model`: OpenRouter TTS model id (default `hexgrad/kokoro-82m`; `modelId` is also accepted).
@@ -432,9 +469,9 @@ Here you go.
432
469
433
470
Available directive keys (when enabled):
434
471
435
-
-`provider` (registered speech provider id, for example `openai`, `elevenlabs`, `google`, `gradium`, `minimax`, `microsoft`, `vydra`, or `xai`; requires `allowProvider: true`)
-`pitch` (MiniMax integer pitch, -12 to 12; fractional values are truncated before the MiniMax request)
@@ -498,6 +535,7 @@ These override `messages.tts.*` for that host.
498
535
-**Other channels**: MP3 (`mp3_44100_128` from ElevenLabs, `mp3` from OpenAI).
499
536
- 44.1kHz / 128kbps is the default balance for speech clarity.
500
537
-**MiniMax**: MP3 (`speech-2.8-hd` model, 32kHz sample rate) for normal audio attachments. For voice-note targets such as Feishu and Telegram, OpenClaw transcodes the MiniMax MP3 to 48kHz Opus with `ffmpeg` before delivery.
538
+
-**Xiaomi MiMo**: MP3 by default, or WAV when configured. For voice-note targets such as Feishu and Telegram, OpenClaw transcodes Xiaomi output to 48kHz Opus with `ffmpeg` before delivery.
501
539
-**Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments and returns PCM directly for Talk/telephony. Native Opus voice-note format is not supported by this path.
502
540
-**Gradium**: WAV for audio attachments, Opus for voice-note targets, and `ulaw_8000` at 8 kHz for telephony.
503
541
-**xAI**: MP3 by default; `responseFormat` may be `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`. OpenClaw uses xAI's batch REST TTS endpoint and returns a complete audio attachment; xAI's streaming TTS WebSocket is not used by this provider path. Native Opus voice-note format is not supported by this path.
0 commit comments