You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/gateway/doctor.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -166,7 +166,7 @@ That stages grounded durable candidates into the short-term dreaming store while
166
166
<Accordiontitle="1. Config normalization">
167
167
If the config contains legacy value shapes (for example `messages.ackReaction` without a channel-specific override), doctor normalizes them into the current schema.
168
168
169
-
That includes legacy Talk flat fields. Current public Talk config is `talk.provider` + `talk.providers.<provider>`. Doctor rewrites old `talk.voiceId` / `talk.voiceAliases` / `talk.modelId` / `talk.outputFormat` / `talk.apiKey` shapes into the provider map.
169
+
That includes legacy Talk flat fields. Current public Talk speech config is `talk.provider` + `talk.providers.<provider>`, and realtime voice config is `talk.realtime.*`. Doctor rewrites old `talk.voiceId` / `talk.voiceAliases` / `talk.modelId` / `talk.outputFormat` / `talk.apiKey` shapes into the provider map, and rewrites legacy top-level realtime selectors (`talk.mode`, `talk.transport`, `talk.brain`, `talk.model`, `talk.voice`) into `talk.realtime`.
170
170
171
171
Doctor also warns when `plugins.allow` is non-empty and tool policy uses
172
172
wildcard or plugin-owned tool entries. `tools.allow: ["*"]` only matches tools
@@ -199,6 +199,7 @@ That stages grounded durable candidates into the short-term dreaming store while
@@ -361,8 +362,17 @@ enumeration of `src/gateway/server-methods/*.ts`.
361
362
</Accordion>
362
363
363
364
<Accordiontitle="Talk and TTS">
365
+
- `talk.catalog` returns the read-only Talk provider catalog for speech, streaming transcription, and realtime voice. It includes provider ids, labels, configured state, exposed model/voice ids, canonical modes, transports, brain strategies, and realtime audio/capability flags without returning provider secrets or mutating global config.
- `talk.handoff.create` creates an expiring managed-room handoff for an existing session key. The result contains a room id, room URL, bearer token, optional session-scoped provider/model/voice selection, mode, transport, brain strategy, and expiry for a first-party walkie-talkie client. `brain: "direct-tools"` requires `operator.admin`.
368
+
- `talk.handoff.join` validates a handoff id plus bearer token, emits `session.ready` or `session.replaced` room events as needed, and returns room/session metadata plus recent Talk events without the plaintext token or stored token hash.
369
+
- `talk.handoff.turnStart`, `talk.handoff.turnEnd`, and `talk.handoff.turnCancel` let a first-party managed-room client drive the room turn lifecycle with `turn.started`, `turn.ended`, and `turn.cancelled` Talk events.
370
+
- `talk.handoff.revoke` invalidates an unexpired handoff, emits `session.closed`, and makes later joins fail.
365
371
- `talk.mode` sets/broadcasts the current Talk mode state for WebChat/Control UI clients.
372
+
- `talk.realtime.session` creates a browser realtime session using canonical transports (`webrtc`, `provider-websocket`, or `gateway-relay`). It accepts optional `mode`, `transport`, and `brain` selectors, but currently only public browser `mode: "realtime"` plus `brain: "agent-consult"` is supported; `managed-room` remains reserved for handoff clients until the browser owns a real room client.
373
+
- `talk.realtime.relayAudio`, `talk.realtime.relayCancel`, `talk.realtime.relayMark`, `talk.realtime.relayStop`, and `talk.realtime.relayToolResult` control Gateway-owned realtime relay sessions. Relay cancellation clears provider output and aborts any linked agent consult run.
374
+
- `talk.realtime.toolCall` lets browser-owned realtime transports forward provider tool calls to Gateway policy. The first supported tool is `openclaw_agent_consult`; clients receive a run id and wait for normal chat lifecycle events before submitting the provider-specific tool result. Gateway relay clients include `relaySessionId` so turn cancellation can abort the consult.
375
+
- `talk.transcription.session` creates a transcription-only Gateway relay over the configured streaming STT provider. Clients send PCM frames through `talk.transcription.relayAudio`, cancel an active turn with `talk.transcription.relayCancel`, receive `talk.transcription.relay` events with common Talk envelopes, and close with `talk.transcription.relayStop`.
366
376
- `talk.speak` synthesizes speech through the active Talk speech provider.
367
377
- `tts.status` returns TTS enabled state, active provider, fallback providers, and provider config state.
368
378
- `tts.providers` returns the visible TTS provider inventory.
Copy file name to clipboardExpand all lines: docs/nodes/talk.md
+34-4Lines changed: 34 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,28 @@
1
1
---
2
-
summary: "Talk mode: continuous speech conversations with configured TTS providers"
2
+
summary: "Talk mode: continuous speech conversations across local STT/TTS and realtime voice"
3
3
read_when:
4
4
- Implementing Talk mode on macOS/iOS/Android
5
5
- Changing voice/TTS/interrupt behavior
6
6
title: "Talk mode"
7
7
---
8
8
9
-
Talk mode is a continuous voice conversation loop:
9
+
Talk mode has two runtime shapes:
10
+
11
+
- Native macOS/iOS/Android Talk uses local speech recognition, Gateway chat, and `talk.speak` TTS. Nodes advertise the `talk` capability and declare the `talk.*` commands they support.
12
+
- Browser Talk uses `talk.realtime.session` with canonical transports: `webrtc`, `provider-websocket`, or `gateway-relay`. `managed-room` is reserved for Gateway handoff rooms.
13
+
- Transcription-only clients use `talk.transcription.session` plus `talk.transcription.relayAudio`, `talk.transcription.relayCancel`, and `talk.transcription.relayStop` when they need captions or dictation without an assistant voice response.
14
+
15
+
Native Talk is a continuous voice conversation loop:
10
16
11
17
1. Listen for speech
12
-
2. Send transcript to the model (main session, chat.send)
18
+
2. Send transcript to the model through the active session
13
19
3. Wait for the response
14
20
4. Speak it via the configured Talk provider (`talk.speak`)
15
21
22
+
Browser realtime Talk forwards provider tool calls through `talk.realtime.toolCall`; browser clients do not call `chat.send` directly for realtime consults.
23
+
24
+
Transcription-only Talk emits the same common Talk event envelope as realtime and STT/TTS sessions, but uses `mode: "transcription"` and `brain: "none"`. It is for captions, dictation, and observe-only speech capture; one-shot uploaded voice notes still use the media/audio path.
25
+
16
26
## Behavior (macOS)
17
27
18
28
-**Always-on overlay** while Talk mode is enabled.
@@ -66,6 +76,19 @@ Supported keys:
66
76
speechLocale:"ru-RU",
67
77
silenceTimeoutMs:1500,
68
78
interruptOnSpeech:true,
79
+
realtime: {
80
+
provider:"openai",
81
+
providers: {
82
+
openai: {
83
+
apiKey:"openai_api_key",
84
+
model:"gpt-realtime",
85
+
voice:"alloy",
86
+
},
87
+
},
88
+
mode:"realtime",
89
+
transport:"webrtc",
90
+
brain:"agent-consult",
91
+
},
69
92
},
70
93
}
71
94
```
@@ -79,6 +102,11 @@ Defaults:
79
102
-`providers.elevenlabs.modelId`: defaults to `eleven_v3` when unset.
80
103
-`providers.mlx.modelId`: defaults to `mlx-community/Soprano-80M-bf16` when unset.
81
104
-`providers.elevenlabs.apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available).
105
+
-`realtime.provider`: selects the active browser/server realtime voice provider. Use `openai` for WebRTC, `google` for provider WebSocket, or a bridge-only provider through Gateway relay.
106
+
-`realtime.providers.<provider>` stores provider-owned realtime config. The browser receives only ephemeral or constrained session credentials, never a standard API key.
107
+
-`realtime.brain`: `agent-consult` routes realtime tool calls through Gateway policy; `direct-tools` is owner-only compatibility behavior; `none` is for transcription or external orchestration.
108
+
-`talk.catalog` exposes each provider's valid modes, transports, brain strategies, realtime audio formats, and capability flags so first-party Talk clients can avoid unsupported combinations.
109
+
- Streaming transcription providers are discovered through `talk.catalog.transcription`. The current Gateway relay uses the Voice Call streaming provider config until the dedicated Talk transcription config surface is added.
82
110
-`speechLocale`: optional BCP 47 locale id for on-device Talk speech recognition on iOS/macOS. Leave unset to use the device default.
83
111
-`outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
84
112
@@ -103,7 +131,9 @@ Defaults:
103
131
## Notes
104
132
105
133
- Requires Speech + Microphone permissions.
106
-
- Uses `chat.send` against session key `main`.
134
+
- Native Talk uses the active Gateway session and only falls back to history polling when response events are unavailable.
135
+
- Browser realtime Talk uses `talk.realtime.toolCall` for `openclaw_agent_consult` instead of exposing `chat.send` to provider-owned browser sessions.
136
+
- Transcription-only Talk uses `talk.transcription.session`, `talk.transcription.relayAudio`, `talk.transcription.relayCancel`, and `talk.transcription.relayStop`; clients subscribe to `talk.transcription.relay` events for partial/final transcript updates.
107
137
- The gateway resolves Talk playback through `talk.speak` using the active Talk provider. Android falls back to local system TTS only when that RPC is unavailable.
108
138
- macOS local MLX playback uses the bundled `openclaw-mlx-tts` helper when present, or an executable on `PATH`. Set `OPENCLAW_MLX_TTS_BIN` to point at a custom helper binary during development.
109
139
-`stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
0 commit comments