Skip to content

Commit 11c600c

Browse files
committed
fix: split google meet realtime providers
1 parent 51fea38 commit 11c600c

11 files changed

Lines changed: 338 additions & 21 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ Docs: https://docs.openclaw.ai
4747
- Diagnostics: keep webhook/message OTEL attributes and Prometheus delivery labels low-cardinality and omit raw chat/message IDs from spans, so progress-draft and message-tool modes do not leak high-cardinality messaging identifiers.
4848
- Google Meet: stop advertising legacy `mode: "realtime"` to agents and config UIs, while keeping it as a hidden compatibility alias for `mode: "agent"`, so new joins use the STT -> OpenClaw agent -> TTS path instead of selecting the direct realtime voice fallback.
4949
- Google Meet: add `chrome.audioBufferBytes` for generated command-pair SoX audio commands and lower the default buffer from SoX's 8192 bytes to 4096 bytes to reduce Chrome talk-back latency.
50+
- Google Meet: split realtime provider config into agent-mode transcription and bidi-mode voice providers, and migrate legacy Gemini Live bidi configs with `doctor --fix`, so Gemini Live can back direct bidi fallback without breaking the default OpenClaw agent talk-back path.
5051
- Telegram: render shared interactive reply buttons in reply delivery so plugin approval messages show inline keyboards. (#76238) Thanks @keshavbotagent.
5152
- Agents/cli-runner: drop a saved `claude-cli` resume sessionId at preparation time when its on-disk transcript no longer exists in `~/.claude/projects/`, so a stale binding from a half-installed `update.run` cannot trap follow-up runs (auto-reply / Telegram direct) in a `claude --resume` timeout loop; the run starts fresh and the new sessionId is written back through the existing post-run flow. (#77030; refs #77011) Thanks @openperf.
5253
- Release validation: install the cross-OS TypeScript harness through Windows-safe Node/npm shims so native Windows package checks reach the OpenClaw smoke suites instead of exiting before artifact capture. Thanks @vincentkoc.

docs/plugins/google-meet.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,13 @@ Google Meet participant support for OpenClaw — the plugin is explicit by desig
3131

3232
Install the local audio dependencies and configure a realtime transcription
3333
provider plus regular OpenClaw TTS. OpenAI is the default transcription
34-
provider; Google Gemini Live also works with `realtime.provider: "google"` for
35-
`bidi` mode:
34+
provider; Google Gemini Live also works as a separate `bidi` voice fallback with
35+
`realtime.voiceProvider: "google"`:
3636

3737
```bash
3838
brew install blackhole-2ch sox
3939
export OPENAI_API_KEY=sk-...
40-
# or
40+
# only needed when realtime.voiceProvider is "google" for bidi mode
4141
export GEMINI_API_KEY=...
4242
```
4343

@@ -973,8 +973,9 @@ Workspace Developer Preview Program for Meet media APIs.
973973

974974
The common Chrome agent path only needs the plugin enabled, BlackHole, SoX, a
975975
realtime transcription provider key, and a configured OpenClaw TTS provider.
976-
OpenAI is the default transcription provider; set `realtime.provider: "google"`
977-
to use Google Gemini Live for `bidi` mode:
976+
OpenAI is the default transcription provider; set `realtime.voiceProvider` to
977+
`"google"` and `realtime.model` to use Google Gemini Live for `bidi` mode
978+
without changing the default agent-mode transcription provider:
978979

979980
```bash
980981
brew install blackhole-2ch sox
@@ -1042,8 +1043,13 @@ Defaults:
10421043
realtime voice provider answers participant speech directly and may call
10431044
`openclaw_agent_consult` for deeper/tool-backed answers.
10441045
- `mode: "transcribe"`: observe-only mode without the talk-back bridge.
1045-
- `realtime.provider: "openai"`: provider id used by `agent` mode for realtime
1046-
transcription and by `bidi` mode for realtime voice.
1046+
- `realtime.provider: "openai"`: compatibility fallback used when the scoped
1047+
provider fields below are unset.
1048+
- `realtime.transcriptionProvider: "openai"`: provider id used by `agent` mode
1049+
for realtime transcription.
1050+
- `realtime.voiceProvider`: provider id used by `bidi` mode for direct realtime
1051+
voice. Set this to `"google"` to use Gemini Live while keeping agent-mode
1052+
transcription on OpenAI.
10471053
- `realtime.toolPolicy: "safe-read-only"`
10481054
- `realtime.instructions`: brief spoken replies, with
10491055
`openclaw_agent_consult` for deeper answers
@@ -1089,13 +1095,15 @@ Optional overrides:
10891095
},
10901096
defaultMode: "agent",
10911097
realtime: {
1092-
provider: "google",
1098+
provider: "openai",
1099+
transcriptionProvider: "openai",
1100+
voiceProvider: "google",
1101+
model: "gemini-2.5-flash-native-audio-preview-12-2025",
10931102
agentId: "jay",
10941103
toolPolicy: "owner",
10951104
introMessage: "Say exactly: I'm here.",
10961105
providers: {
10971106
google: {
1098-
model: "gemini-2.5-flash-native-audio-preview-12-2025",
10991107
voice: "Kore",
11001108
},
11011109
},
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
export { legacyConfigRules, normalizeCompatibilityConfig } from "./src/config-compat.js";

extensions/google-meet/index.test.ts

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ import {
2929
convertGoogleMeetTtsAudioForBridge,
3030
extendGoogleMeetOutputEchoSuppression,
3131
isGoogleMeetLikelyAssistantEchoTranscript,
32+
resolveGoogleMeetRealtimeProvider,
33+
resolveGoogleMeetRealtimeTranscriptionProvider,
3234
startCommandAgentAudioBridge,
3335
startCommandRealtimeAudioBridge,
3436
} from "./src/realtime.js";
@@ -385,6 +387,7 @@ describe("google-meet plugin", () => {
385387
realtime: {
386388
strategy: "agent",
387389
provider: "openai",
390+
transcriptionProvider: "openai",
388391
introMessage: "Say exactly: I'm here and listening.",
389392
toolPolicy: "safe-read-only",
390393
},
@@ -395,6 +398,87 @@ describe("google-meet plugin", () => {
395398
expect(resolveGoogleMeetConfig({}).realtime.instructions).toContain("openclaw_agent_consult");
396399
});
397400

401+
it("resolves separate realtime providers for agent transcription and bidi voice", () => {
402+
expect(
403+
resolveGoogleMeetConfig({
404+
realtime: {
405+
provider: "openai",
406+
transcriptionProvider: "openai",
407+
voiceProvider: "google",
408+
model: "gemini-2.5-flash-native-audio-preview-12-2025",
409+
},
410+
}).realtime,
411+
).toMatchObject({
412+
provider: "openai",
413+
transcriptionProvider: "openai",
414+
voiceProvider: "google",
415+
model: "gemini-2.5-flash-native-audio-preview-12-2025",
416+
});
417+
});
418+
419+
it("uses voiceProvider for bidi and transcriptionProvider for agent mode resolution", () => {
420+
const voiceProviders: RealtimeVoiceProviderPlugin[] = [
421+
{
422+
id: "openai",
423+
label: "OpenAI",
424+
autoSelectOrder: 1,
425+
isConfigured: () => true,
426+
createBridge: () => {
427+
throw new Error("unused");
428+
},
429+
},
430+
{
431+
id: "google",
432+
label: "Google",
433+
autoSelectOrder: 2,
434+
resolveConfig: ({ rawConfig }) => rawConfig,
435+
isConfigured: () => true,
436+
createBridge: () => {
437+
throw new Error("unused");
438+
},
439+
},
440+
];
441+
const transcriptionProviders: RealtimeTranscriptionProviderPlugin[] = [
442+
{
443+
id: "openai",
444+
label: "OpenAI",
445+
autoSelectOrder: 1,
446+
isConfigured: () => true,
447+
createSession: () => {
448+
throw new Error("unused");
449+
},
450+
},
451+
];
452+
const config = resolveGoogleMeetConfig({
453+
realtime: {
454+
provider: "openai",
455+
transcriptionProvider: "openai",
456+
voiceProvider: "google",
457+
model: "gemini-2.5-flash-native-audio-preview-12-2025",
458+
},
459+
});
460+
461+
expect(
462+
resolveGoogleMeetRealtimeProvider({
463+
config,
464+
fullConfig: {} as never,
465+
providers: voiceProviders,
466+
}),
467+
).toMatchObject({
468+
provider: { id: "google" },
469+
providerConfig: { model: "gemini-2.5-flash-native-audio-preview-12-2025" },
470+
});
471+
expect(
472+
resolveGoogleMeetRealtimeTranscriptionProvider({
473+
config,
474+
fullConfig: {} as never,
475+
providers: transcriptionProviders,
476+
}),
477+
).toMatchObject({
478+
provider: { id: "openai" },
479+
});
480+
});
481+
398482
it("declares barge-in config metadata in the plugin entry and manifest", () => {
399483
const manifest = JSON.parse(
400484
readFileSync(new URL("./openclaw.plugin.json", import.meta.url), "utf8"),

extensions/google-meet/index.ts

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,15 @@ const googleMeetConfigSchema = {
161161
},
162162
"realtime.provider": {
163163
label: "Speech Provider",
164-
help: "Agent mode uses this for realtime transcription. Bidi mode uses it as the realtime voice provider.",
164+
help: "Compatibility fallback for both realtime transcription and bidi voice. Prefer realtime.transcriptionProvider and realtime.voiceProvider for new configs.",
165+
},
166+
"realtime.transcriptionProvider": {
167+
label: "Realtime Transcription Provider",
168+
help: "Agent mode uses this provider to transcribe meeting audio before regular OpenClaw TTS answers.",
169+
},
170+
"realtime.voiceProvider": {
171+
label: "Bidi Voice Provider",
172+
help: "Bidi mode uses this realtime voice provider. Falls back to realtime.provider when unset.",
165173
},
166174
"realtime.model": {
167175
label: "Bidi Realtime Model",

extensions/google-meet/openclaw.plugin.json

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,15 @@
154154
},
155155
"realtime.provider": {
156156
"label": "Speech Provider",
157-
"help": "Agent mode uses this for realtime transcription. Bidi mode uses it as the realtime voice provider."
157+
"help": "Compatibility fallback for both realtime transcription and bidi voice. Prefer realtime.transcriptionProvider and realtime.voiceProvider for new configs."
158+
},
159+
"realtime.transcriptionProvider": {
160+
"label": "Realtime Transcription Provider",
161+
"help": "Agent mode uses this provider to transcribe meeting audio before regular OpenClaw TTS answers."
162+
},
163+
"realtime.voiceProvider": {
164+
"label": "Bidi Voice Provider",
165+
"help": "Bidi mode uses this realtime voice provider. Falls back to realtime.provider when unset."
158166
},
159167
"realtime.model": {
160168
"label": "Bidi Realtime Model",
@@ -431,6 +439,13 @@
431439
"type": "string",
432440
"default": "openai"
433441
},
442+
"transcriptionProvider": {
443+
"type": "string",
444+
"default": "openai"
445+
},
446+
"voiceProvider": {
447+
"type": "string"
448+
},
434449
"model": {
435450
"type": "string"
436451
},
@@ -501,5 +516,8 @@
501516
}
502517
}
503518
}
519+
},
520+
"configContracts": {
521+
"compatibilityMigrationPaths": ["plugins.entries.google-meet.config.realtime.provider"]
504522
}
505523
}
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
import type { OpenClawConfig } from "openclaw/plugin-sdk/config-types";
2+
import { describe, expect, it } from "vitest";
3+
import {
4+
legacyConfigRules,
5+
migrateGoogleMeetLegacyRealtimeProvider,
6+
normalizeCompatibilityConfig,
7+
} from "./config-compat.js";
8+
9+
describe("google-meet config compatibility", () => {
10+
it("detects legacy Google realtime provider config", () => {
11+
expect(
12+
legacyConfigRules[0]?.match({
13+
provider: "google",
14+
model: "gemini-2.5-flash-native-audio-preview-12-2025",
15+
}),
16+
).toBe(true);
17+
});
18+
19+
it("migrates legacy Google bidi provider intent to scoped realtime providers", () => {
20+
const config = {
21+
plugins: {
22+
entries: {
23+
"google-meet": {
24+
enabled: true,
25+
config: {
26+
defaultMode: "agent",
27+
realtime: {
28+
provider: "google",
29+
model: "gemini-2.5-flash-native-audio-preview-12-2025",
30+
providers: {
31+
google: {
32+
voice: "Kore",
33+
},
34+
},
35+
},
36+
},
37+
},
38+
},
39+
},
40+
} as OpenClawConfig;
41+
42+
const migration = migrateGoogleMeetLegacyRealtimeProvider(config);
43+
44+
expect(migration?.changes).toEqual([
45+
'Moved Google Meet legacy realtime.provider="google" intent to realtime.voiceProvider="google" and realtime.transcriptionProvider="openai".',
46+
]);
47+
expect(
48+
(
49+
migration?.config.plugins?.entries?.["google-meet"] as {
50+
config?: { realtime?: Record<string, unknown> };
51+
}
52+
).config?.realtime,
53+
).toEqual({
54+
provider: "openai",
55+
transcriptionProvider: "openai",
56+
voiceProvider: "google",
57+
model: "gemini-2.5-flash-native-audio-preview-12-2025",
58+
providers: {
59+
google: {
60+
voice: "Kore",
61+
},
62+
},
63+
});
64+
});
65+
66+
it("leaves fully scoped provider configs alone", () => {
67+
const config = {
68+
plugins: {
69+
entries: {
70+
"google-meet": {
71+
config: {
72+
realtime: {
73+
provider: "google",
74+
transcriptionProvider: "custom-stt",
75+
voiceProvider: "custom-voice",
76+
},
77+
},
78+
},
79+
},
80+
},
81+
} as OpenClawConfig;
82+
83+
const migration = normalizeCompatibilityConfig({ cfg: config });
84+
85+
expect(migration.changes).toEqual([]);
86+
expect(
87+
(
88+
migration.config.plugins?.entries?.["google-meet"] as {
89+
config?: { realtime?: Record<string, unknown> };
90+
}
91+
).config?.realtime,
92+
).toEqual({
93+
provider: "google",
94+
transcriptionProvider: "custom-stt",
95+
voiceProvider: "custom-voice",
96+
});
97+
});
98+
});

0 commit comments

Comments
 (0)