Skip to content

Commit 1634f91

Browse files
committed
fix: improve google meet twilio join sequencing
1 parent 59fb9e5 commit 1634f91

7 files changed

Lines changed: 203 additions & 24 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ Docs: https://docs.openclaw.ai
5050
- Gateway/pricing: abort in-flight model pricing catalog fetches when Gateway shutdown stops the refresh loop, and avoid post-stop cache writes or refresh timers. Fixes #72208. Thanks @rzcq.
5151
- Codex/app-server: make startup retry cleanup ownership-aware so concurrent Codex lanes cannot close another lane's freshly restarted shared app-server client. Thanks @vincentkoc.
5252
- Google Meet/Twilio: report missing dial-in details during setup and explain that Twilio cannot join Meet URLs without a phone dial plan.
53+
- Google Meet/Twilio: start the phone leg before sending Meet PIN DTMF, delay intro speech until after the post-connect dial sequence, and log each stage so operators can tell Twilio-leg audio from Meet-room audio.
54+
- Voice Call: accept provider call IDs for gateway speak/continue requests and report ended-call state from history instead of returning a generic "Call not found" for stale calls.
5355
- Control UI/Talk: allow the OpenAI Realtime WebRTC offer endpoint through the Control UI CSP, configure browser sessions with explicit VAD/transcription input settings, and surface OpenAI realtime error/lifecycle events instead of leaving Talk stuck as live with no diagnostic. Fixes #73427.
5456
- Plugins: clarify config-selected duplicate plugin override diagnostics and document manifest schema updates for bundled-plugin forks. Fixes #8582. Thanks @sachah.
5557
- CLI backends/Claude: make live-session JSONL turn caps bounded and configurable via `reliability.outputLimits`, raising the default guard for tool-heavy Claude CLI turns while preserving memory limits. Fixes #75838. Thanks @hcordoba840.

docs/plugins/google-meet.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1548,19 +1548,21 @@ participant:
15481548
- Run `openclaw voicecall tail` and check that Twilio webhooks are arriving at
15491549
the Gateway.
15501550
- Run `openclaw logs --follow` and look for the Twilio Meet sequence: Google
1551-
Meet delegates the join, Voice Call stores pre-connect DTMF TwiML, serves
1552-
that initial TwiML, then serves realtime TwiML and starts the realtime bridge
1553-
with `initialGreeting=queued`.
1551+
Meet delegates the join, Voice Call starts the phone leg, Google Meet waits
1552+
`voiceCall.dtmfDelayMs`, sends DTMF with `voicecall.dtmf`, waits
1553+
`voiceCall.postDtmfSpeechDelayMs`, then requests intro speech with
1554+
`voicecall.speak`.
15541555
- Re-run `openclaw googlemeet setup --transport twilio`; a green setup check is
15551556
required but does not prove the meeting PIN sequence is correct.
15561557
- Confirm the dial-in number belongs to the same Meet invitation and region as
15571558
the PIN.
1558-
- Increase the leading pauses in `--dtmf-sequence` if Meet answers slowly, for
1559-
example `wwww123456#`.
1559+
- Increase `voiceCall.dtmfDelayMs` if Meet answers slowly or the call transcript
1560+
still shows the prompt asking for a PIN after DTMF was sent.
15601561
- If the participant joins but you do not hear the greeting, check
1561-
`openclaw logs --follow` for realtime TwiML, realtime bridge startup, and
1562-
`initialGreeting=queued`. The greeting is generated from the initial
1563-
`voicecall.start` message after the realtime bridge connects.
1562+
`openclaw logs --follow` for the post-DTMF `voicecall.speak` request and
1563+
either media-stream TTS playback or the Twilio `<Say>` fallback. If the call
1564+
transcript still contains "enter the meeting PIN", the phone leg has not joined
1565+
the Meet room yet, so meeting participants will not hear speech.
15641566

15651567
If webhooks do not arrive, debug the Voice Call plugin first: the provider must
15661568
reach `plugins.entries.voice-call.config.publicUrl` or the configured tunnel.

extensions/google-meet/src/runtime.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -491,7 +491,7 @@ export class GoogleMeetRuntime {
491491
session.notes.push(
492492
this.params.config.voiceCall.enabled
493493
? dtmfSequence
494-
? "Twilio transport delegated the call to the voice-call plugin and queued configured DTMF."
494+
? "Twilio transport delegated the phone leg to the voice-call plugin, then sent configured DTMF after connect before speaking."
495495
: "Twilio transport delegated the call to the voice-call plugin without configured DTMF."
496496
: "Twilio transport is an explicit dial plan; voice-call delegation is disabled.",
497497
);

extensions/google-meet/src/voice-call-gateway.test.ts

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,39 +21,59 @@ vi.mock("openclaw/plugin-sdk/gateway-runtime", () => ({
2121

2222
describe("Google Meet voice-call gateway", () => {
2323
beforeEach(() => {
24+
vi.useRealTimers();
2425
gatewayMocks.request.mockReset();
2526
gatewayMocks.request.mockResolvedValue({ callId: "call-1" });
2627
gatewayMocks.stopAndWait.mockClear();
2728
gatewayMocks.startGatewayClientWhenEventLoopReady.mockClear();
2829
});
2930

30-
it("starts Twilio Meet calls with pre-connect DTMF and intro metadata", async () => {
31+
it("starts Twilio Meet calls, sends delayed DTMF, then speaks the intro", async () => {
3132
const config = resolveGoogleMeetConfig({
3233
voiceCall: {
3334
gatewayUrl: "ws://127.0.0.1:18789",
3435
dtmfDelayMs: 1,
36+
postDtmfSpeechDelayMs: 2,
3537
},
3638
realtime: { introMessage: "Say exactly: I'm here and listening." },
3739
});
3840

39-
await joinMeetViaVoiceCallGateway({
41+
const join = joinMeetViaVoiceCallGateway({
4042
config,
4143
dialInNumber: "+15551234567",
4244
dtmfSequence: "123456#",
4345
message: "Say exactly: I'm here and listening.",
4446
});
4547

48+
await join;
49+
4650
expect(gatewayMocks.request).toHaveBeenNthCalledWith(
4751
1,
4852
"voicecall.start",
4953
{
5054
to: "+15551234567",
5155
mode: "conversation",
56+
},
57+
{ timeoutMs: 30_000 },
58+
);
59+
expect(gatewayMocks.request).toHaveBeenNthCalledWith(
60+
2,
61+
"voicecall.dtmf",
62+
{
63+
callId: "call-1",
64+
digits: "123456#",
65+
},
66+
{ timeoutMs: 30_000 },
67+
);
68+
expect(gatewayMocks.request).toHaveBeenNthCalledWith(
69+
3,
70+
"voicecall.speak",
71+
{
72+
callId: "call-1",
5273
message: "Say exactly: I'm here and listening.",
53-
dtmfSequence: "123456#",
5474
},
5575
{ timeoutMs: 30_000 },
5676
);
57-
expect(gatewayMocks.request).toHaveBeenCalledTimes(1);
77+
expect(gatewayMocks.request).toHaveBeenCalledTimes(3);
5878
});
5979
});

extensions/google-meet/src/voice-call-gateway.ts

Lines changed: 64 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,24 @@ type VoiceCallSpeakResult = {
1818
error?: string;
1919
};
2020

21+
type VoiceCallDtmfResult = {
22+
success?: boolean;
23+
error?: string;
24+
};
25+
2126
type VoiceCallMeetJoinResult = {
2227
callId: string;
2328
dtmfSent: boolean;
2429
introSent: boolean;
2530
};
2631

32+
function sleep(ms: number): Promise<void> {
33+
if (ms <= 0) {
34+
return Promise.resolve();
35+
}
36+
return new Promise((resolve) => setTimeout(resolve, ms));
37+
}
38+
2739
async function createConnectedGatewayClient(
2840
config: GoogleMeetConfig,
2941
): Promise<VoiceCallGatewayClient> {
@@ -81,28 +93,74 @@ export async function joinMeetViaVoiceCallGateway(params: {
8193
try {
8294
client = await createConnectedGatewayClient(params.config);
8395
params.logger?.info(
84-
`[google-meet] Delegating Twilio join to Voice Call (dtmf=${params.dtmfSequence ? "yes" : "no"}, intro=${params.message ? "yes" : "no"})`,
96+
`[google-meet] Delegating Twilio join to Voice Call (dtmf=${params.dtmfSequence ? "post-connect" : "none"}, intro=${params.message ? "delayed" : "none"})`,
8597
);
8698
const start = (await client.request(
8799
"voicecall.start",
88100
{
89101
to: params.dialInNumber,
90102
mode: "conversation",
91-
...(params.message ? { message: params.message } : {}),
92-
...(params.dtmfSequence ? { dtmfSequence: params.dtmfSequence } : {}),
93103
},
94104
{ timeoutMs: params.config.voiceCall.requestTimeoutMs },
95105
)) as VoiceCallStartResult;
96106
if (!start.callId) {
97107
throw new Error(start.error || "voicecall.start did not return callId");
98108
}
99109
params.logger?.info(
100-
`[google-meet] Voice Call Twilio join started: callId=${start.callId} dtmf=${params.dtmfSequence ? "yes" : "no"} intro=${params.message ? "yes" : "no"}`,
110+
`[google-meet] Voice Call Twilio phone leg started: callId=${start.callId}`,
101111
);
112+
let dtmfSent = false;
113+
if (params.dtmfSequence) {
114+
const delayMs = params.config.voiceCall.dtmfDelayMs;
115+
params.logger?.info(
116+
`[google-meet] Waiting ${delayMs}ms before sending Meet DTMF for callId=${start.callId}`,
117+
);
118+
await sleep(delayMs);
119+
const dtmf = (await client.request(
120+
"voicecall.dtmf",
121+
{
122+
callId: start.callId,
123+
digits: params.dtmfSequence,
124+
},
125+
{ timeoutMs: params.config.voiceCall.requestTimeoutMs },
126+
)) as VoiceCallDtmfResult;
127+
if (dtmf.success === false) {
128+
throw new Error(dtmf.error || "voicecall.dtmf failed");
129+
}
130+
dtmfSent = true;
131+
params.logger?.info(
132+
`[google-meet] Meet DTMF sent after phone leg connected: callId=${start.callId} digits=${params.dtmfSequence.length}`,
133+
);
134+
}
135+
let introSent = false;
136+
if (params.message) {
137+
const delayMs = params.dtmfSequence ? params.config.voiceCall.postDtmfSpeechDelayMs : 0;
138+
if (delayMs > 0) {
139+
params.logger?.info(
140+
`[google-meet] Waiting ${delayMs}ms after Meet DTMF before speaking intro for callId=${start.callId}`,
141+
);
142+
await sleep(delayMs);
143+
}
144+
const spoken = (await client.request(
145+
"voicecall.speak",
146+
{
147+
callId: start.callId,
148+
message: params.message,
149+
},
150+
{ timeoutMs: params.config.voiceCall.requestTimeoutMs },
151+
)) as VoiceCallSpeakResult;
152+
if (spoken.success === false) {
153+
throw new Error(spoken.error || "voicecall.speak failed");
154+
}
155+
introSent = true;
156+
params.logger?.info(
157+
`[google-meet] Intro speech requested after Meet dial sequence: callId=${start.callId}`,
158+
);
159+
}
102160
return {
103161
callId: start.callId,
104-
dtmfSent: Boolean(params.dtmfSequence),
105-
introSent: Boolean(params.message),
162+
dtmfSent,
163+
introSent,
106164
};
107165
} finally {
108166
await client?.stopAndWait({ timeoutMs: 1_000 });

extensions/voice-call/index.test.ts

Lines changed: 81 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import { createTestPluginApi } from "openclaw/plugin-sdk/plugin-test-api";
66
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
77
import type { OpenClawPluginApi } from "./api.js";
88
import type { VoiceCallRuntime } from "./runtime-entry.js";
9+
import type { CallRecord } from "./src/types.js";
910

1011
let runtimeStub: VoiceCallRuntime;
1112

@@ -52,8 +53,12 @@ function captureStdout() {
5253
}
5354

5455
function createRuntimeStub(callId = "call-1"): VoiceCallRuntime {
56+
const call = createCallRecord({ callId });
5557
return {
56-
config: { toNumber: "+15550001234" } as VoiceCallRuntime["config"],
58+
config: {
59+
toNumber: "+15550001234",
60+
realtime: { enabled: false },
61+
} as VoiceCallRuntime["config"],
5762
provider: {} as VoiceCallRuntime["provider"],
5863
manager: {
5964
initiateCall: vi.fn(async () => ({ callId, success: true })),
@@ -64,17 +69,35 @@ function createRuntimeStub(callId = "call-1"): VoiceCallRuntime {
6469
speak: vi.fn(async () => ({ success: true })),
6570
sendDtmf: vi.fn(async () => ({ success: true })),
6671
endCall: vi.fn(async () => ({ success: true })),
67-
getCall: vi.fn((id: string) => (id === callId ? { callId } : undefined)),
72+
getCall: vi.fn((id: string) => (id === callId ? call : undefined)),
6873
getCallByProviderCallId: vi.fn(() => undefined),
69-
getActiveCalls: vi.fn(() => [{ callId }]),
74+
getActiveCalls: vi.fn(() => [call]),
75+
getCallHistory: vi.fn(async () => []),
7076
} as unknown as VoiceCallRuntime["manager"],
71-
webhookServer: {} as VoiceCallRuntime["webhookServer"],
77+
webhookServer: {
78+
speakRealtime: vi.fn(() => ({ success: false, error: "No active realtime bridge for call" })),
79+
} as unknown as VoiceCallRuntime["webhookServer"],
7280
webhookUrl: "http://127.0.0.1:3334/voice/webhook",
7381
publicUrl: null,
7482
stop: vi.fn(async () => {}),
7583
};
7684
}
7785

86+
function createCallRecord(overrides: Partial<CallRecord> = {}): CallRecord {
87+
return {
88+
callId: "call-1",
89+
provider: "mock",
90+
direction: "outbound",
91+
state: "active",
92+
from: "+15550001111",
93+
to: "+15550001234",
94+
startedAt: Date.UTC(2026, 4, 2, 9, 0, 0),
95+
transcript: [],
96+
processedEventIds: [],
97+
...overrides,
98+
};
99+
}
100+
78101
function createServiceContext(): Parameters<NonNullable<Registered["service"]>["start"]>[0] {
79102
return {
80103
config: {},
@@ -397,6 +420,60 @@ describe("voice-call plugin", () => {
397420
expect(respond.mock.calls[0]).toEqual([true, { success: true }]);
398421
});
399422

423+
it("normalizes provider call ids before speaking", async () => {
424+
runtimeStub.manager.getCall = vi.fn(() => undefined);
425+
runtimeStub.manager.getCallByProviderCallId = vi.fn(() =>
426+
createCallRecord({
427+
callId: "call-1",
428+
providerCallId: "CA123",
429+
}),
430+
);
431+
const { methods } = setup({ provider: "mock" });
432+
const handler = methods.get("voicecall.speak") as
433+
| ((ctx: {
434+
params: Record<string, unknown>;
435+
respond: ReturnType<typeof vi.fn>;
436+
}) => Promise<void>)
437+
| undefined;
438+
const respond = vi.fn();
439+
440+
await handler?.({ params: { callId: "CA123", message: "hello" }, respond });
441+
442+
expect(runtimeStub.manager.speak).toHaveBeenCalledWith("call-1", "hello");
443+
expect(respond.mock.calls[0]).toEqual([true, { success: true }]);
444+
});
445+
446+
it("reports ended call history when speaking to a stale call", async () => {
447+
runtimeStub.manager.getCall = vi.fn(() => undefined);
448+
runtimeStub.manager.getCallByProviderCallId = vi.fn(() => undefined);
449+
runtimeStub.manager.getCallHistory = vi.fn(async () => [
450+
createCallRecord({
451+
callId: "call-1",
452+
providerCallId: "CA123",
453+
state: "completed",
454+
endReason: "completed",
455+
endedAt: Date.UTC(2026, 4, 2, 9, 18, 23),
456+
}),
457+
]);
458+
const { methods } = setup({ provider: "mock" });
459+
const handler = methods.get("voicecall.speak") as
460+
| ((ctx: {
461+
params: Record<string, unknown>;
462+
respond: ReturnType<typeof vi.fn>;
463+
}) => Promise<void>)
464+
| undefined;
465+
const respond = vi.fn();
466+
467+
await handler?.({ params: { callId: "CA123", message: "hello" }, respond });
468+
469+
const [ok, , error] = respond.mock.calls[0] ?? [];
470+
expect(ok).toBe(false);
471+
expect(error.message).toContain("call is not active");
472+
expect(error.message).toContain("last state=completed");
473+
expect(error.message).toContain("endReason=completed");
474+
expect(runtimeStub.manager.speak).not.toHaveBeenCalled();
475+
});
476+
400477
it("normalizes legacy config through runtime creation and warns to run doctor", async () => {
401478
const { methods } = setup({
402479
enabled: true,

extensions/voice-call/index.ts

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,14 +302,34 @@ export default definePluginEntry({
302302
respondError(respond, formatErrorMessage(err));
303303
};
304304

305+
const describeHistoricalCall = async (rt: VoiceCallRuntime, callId: string) => {
306+
const history = await rt.manager.getCallHistory(100);
307+
const call = history
308+
.toReversed()
309+
.find((candidate) => candidate.callId === callId || candidate.providerCallId === callId);
310+
if (!call) {
311+
return undefined;
312+
}
313+
const details = [
314+
`last state=${call.state}`,
315+
call.endReason ? `endReason=${call.endReason}` : undefined,
316+
call.endedAt ? `endedAt=${new Date(call.endedAt).toISOString()}` : undefined,
317+
].filter(Boolean);
318+
return `call is not active (${details.join(", ")})`;
319+
};
320+
305321
const resolveCallMessageRequest = async (params: GatewayRequestHandlerOptions["params"]) => {
306322
const callId = normalizeOptionalString(params?.callId) ?? "";
307323
const message = normalizeOptionalString(params?.message) ?? "";
308324
if (!callId || !message) {
309325
return { error: "callId and message required" } as const;
310326
}
311327
const rt = await ensureRuntime();
312-
return { rt, callId, message } as const;
328+
const activeCall = rt.manager.getCall(callId) ?? rt.manager.getCallByProviderCallId(callId);
329+
if (activeCall) {
330+
return { rt, callId: activeCall.callId, message } as const;
331+
}
332+
return { error: (await describeHistoricalCall(rt, callId)) ?? "Call not found" } as const;
313333
};
314334

315335
const initiateCallAndRespond = async (params: {

0 commit comments

Comments
 (0)