Skip to content

Commit 17dcdea

Browse files
authored
fix: gate discord realtime voice by wake name (#85915)
1 parent c074d09 commit 17dcdea

11 files changed

Lines changed: 296 additions & 29 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Docs: https://docs.openclaw.ai
1010
- Gateway/perf: cache stable install-record, channel-catalog, bundled-channel, and Telegram session-store metadata during process-local hot paths to reduce repeated JSON and manifest reads.
1111
- Gateway/perf: reuse immutable plugin metadata snapshots across startup, config, model, channel, setup, and secret metadata readers so hot paths avoid repeated plugin file stats and manifest registry reloads.
1212
- Talk/realtime: let WebUI and Discord voice callers ask for active OpenClaw run status, cancel, steer, or queue follow-up work while a consult is still running. (#84231) Thanks @Solvely-Colin.
13+
- Discord/voice: add realtime wake-name gating with agent-name defaults and raise profile bootstrap context budget for longer `USER.md`/`SOUL.md` files.
1314
- Gateway/perf: lazy-load startup-idle plugin work, core gateway method handlers, and the embedded ACPX runtime so Gateway health and ready signals no longer wait on unused handler trees or ACPX probes.
1415
- Gateway/perf: cache plugin SDK public-surface alias maps and skip irrelevant macOS Linuxbrew PATH probes so Gateway startup avoids repeated filesystem walks and slow missing-directory stats.
1516
- Image tool: add adaptive model-aware image compression with an `agents.defaults.imageQuality` preference for choosing token-efficient, balanced, or high-detail media handling.

docs/channels/discord.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1234,6 +1234,7 @@ Notes:
12341234
- In `stt-tts` mode, STT uses `tools.media.audio`; `voice.model` does not affect transcription.
12351235
- In realtime modes, `voice.realtime.provider`, `voice.realtime.model`, and `voice.realtime.voice` configure the realtime audio session. For OpenAI Realtime 2 plus the Codex brain, use `voice.realtime.model: "gpt-realtime-2"` and `voice.model: "openai-codex/gpt-5.5"`.
12361236
- Realtime voice modes include small `IDENTITY.md`, `USER.md`, and `SOUL.md` profile files in the realtime provider instructions by default so fast direct turns keep the same identity, user grounding, and persona as the routed OpenClaw agent. Set `voice.realtime.bootstrapContextFiles` to a subset to customize this, or `[]` to disable it. The supported realtime bootstrap files are limited to those profile files; `AGENTS.md` stays in the normal agent context. The injected profile context does not replace `openclaw_agent_consult` for workspace work, current facts, memory lookup, or tool-backed actions.
1237+
- In OpenAI `agent-proxy` realtime mode, set `voice.realtime.requireWakeName: true` to keep Discord realtime voice silent until a transcript contains a wake name. If `voice.realtime.wakeNames` is unset, OpenClaw uses the routed agent `name`, falling back to the agent id. Wake-name gating disables realtime provider auto-response and routes accepted turns through the OpenClaw agent consult path.
12371238
- The OpenAI realtime provider accepts current Realtime 2 event names and legacy Codex-compatible aliases for output audio and transcript events, so compatible provider snapshots can drift without dropping assistant audio.
12381239
- `voice.realtime.bargeIn` controls whether Discord speaker-start events interrupt active realtime playback. If unset, it follows the realtime provider's input-audio interruption setting.
12391240
- `voice.realtime.minBargeInAudioEndMs` controls the minimum assistant playback duration before an OpenAI realtime barge-in truncates audio. Default: `250`. Set `0` for immediate interruption in low-echo rooms, or raise it for echo-heavy speaker setups.

extensions/discord/src/config-schema.test.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,8 @@ describe("discord config schema", () => {
202202
voice: "cedar",
203203
toolPolicy: "safe-read-only",
204204
consultPolicy: "always",
205+
requireWakeName: true,
206+
wakeNames: ["Molty"],
205207
bootstrapContextFiles: ["IDENTITY.md", "USER.md", "SOUL.md"],
206208
bargeIn: true,
207209
minBargeInAudioEndMs: 500,
@@ -224,6 +226,8 @@ describe("discord config schema", () => {
224226
expect(cfg.voice?.realtime?.voice).toBe("cedar");
225227
expect(cfg.voice?.realtime?.toolPolicy).toBe("safe-read-only");
226228
expect(cfg.voice?.realtime?.consultPolicy).toBe("always");
229+
expect(cfg.voice?.realtime?.requireWakeName).toBe(true);
230+
expect(cfg.voice?.realtime?.wakeNames).toEqual(["Molty"]);
227231
expect(cfg.voice?.realtime?.bootstrapContextFiles).toEqual([
228232
"IDENTITY.md",
229233
"USER.md",
@@ -240,6 +244,7 @@ describe("discord config schema", () => {
240244
{ mode: "bidi", realtime: { toolPolicy: "dangerous" } },
241245
{ mode: "agent-proxy", realtime: { consultPolicy: "substantive" } },
242246
{ mode: "bidi", realtime: { bootstrapContextFiles: ["AGENTS.md"] } },
247+
{ mode: "agent-proxy", realtime: { wakeNames: [""] } },
243248
{ mode: "agent-proxy", realtime: { debounceMs: 10_001 } },
244249
{ mode: "agent-proxy", realtime: { minBargeInAudioEndMs: -1 } },
245250
{ mode: "agent-proxy", realtime: { minBargeInAudioEndMs: 10_001 } },

extensions/discord/src/config-ui-hints.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,14 @@ export const discordChannelConfigUiHints = {
233233
label: "Discord Realtime Consult Policy",
234234
help: "Use always to strongly prefer the OpenClaw agent brain for substantive realtime turns. agent-proxy defaults to always.",
235235
},
236+
"voice.realtime.requireWakeName": {
237+
label: "Discord Realtime Require Wake Name",
238+
help: "Require a configured wake name before OpenAI agent-proxy Discord realtime voice responds. If wakeNames is unset, the routed agent name is used, falling back to the agent id.",
239+
},
240+
"voice.realtime.wakeNames": {
241+
label: "Discord Realtime Wake Names",
242+
help: "Names that allow OpenAI agent-proxy Discord realtime voice to respond when requireWakeName is enabled.",
243+
},
236244
"voice.realtime.bootstrapContextFiles": {
237245
label: "Discord Realtime Bootstrap Context Files",
238246
help: "Agent profile bootstrap files included in realtime provider instructions for direct voice identity/persona grounding. Defaults to IDENTITY.md, USER.md, and SOUL.md; set [] to disable.",

extensions/discord/src/voice/manager.e2e.test.ts

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2431,6 +2431,149 @@ describe("DiscordVoiceManager", () => {
24312431
expectUserMessageIncludes("normal answer");
24322432
});
24332433

2434+
it("requires the agent wake name before realtime agent-proxy consults", async () => {
2435+
agentCommandMock.mockResolvedValueOnce({ payloads: [{ text: "wake answer" }] });
2436+
const manager = createManager(
2437+
{
2438+
groupPolicy: "open",
2439+
voice: {
2440+
enabled: true,
2441+
mode: "agent-proxy",
2442+
realtime: { provider: "openai", consultPolicy: "auto", requireWakeName: true },
2443+
},
2444+
},
2445+
undefined,
2446+
{
2447+
agents: {
2448+
list: [{ id: "agent-1", identity: { name: "Molty" } }],
2449+
},
2450+
},
2451+
);
2452+
2453+
await manager.join({ guildId: "g1", channelId: "1001" });
2454+
const entry = getSessionEntry(manager) as {
2455+
realtime?: {
2456+
beginSpeakerTurn: (
2457+
context: { extraSystemPrompt?: string; senderIsOwner: boolean; speakerLabel: string },
2458+
userId: string,
2459+
) => { close: () => void; sendInputAudio: (audio: Buffer) => void };
2460+
};
2461+
};
2462+
const bridgeParams = lastRealtimeBridgeParams() as
2463+
| {
2464+
audioSink?: { sendAudio: (audio: Buffer) => void };
2465+
autoRespondToAudio?: boolean;
2466+
interruptResponseOnInputAudio?: boolean;
2467+
onTranscript?: (role: "user" | "assistant", text: string, isFinal: boolean) => void;
2468+
}
2469+
| undefined;
2470+
2471+
expect(bridgeParams?.autoRespondToAudio).toBe(false);
2472+
expect(bridgeParams?.interruptResponseOnInputAudio).toBe(false);
2473+
bridgeParams?.audioSink?.sendAudio(Buffer.alloc(48_000));
2474+
2475+
const guestTurn = entry.realtime?.beginSpeakerTurn(
2476+
{ extraSystemPrompt: undefined, senderIsOwner: false, speakerLabel: "Guest" },
2477+
"u-guest",
2478+
);
2479+
guestTurn?.sendInputAudio(Buffer.alloc(8));
2480+
bridgeParams?.onTranscript?.("user", "agent-1 status of PR 123", true);
2481+
await new Promise((resolve) => setTimeout(resolve, 260));
2482+
2483+
expect(controlRealtimeVoiceAgentRunMock).not.toHaveBeenCalled();
2484+
expect(agentCommandMock).not.toHaveBeenCalled();
2485+
expect(realtimeSessionMock.handleBargeIn).not.toHaveBeenCalled();
2486+
2487+
const ownerTurn = entry.realtime?.beginSpeakerTurn(
2488+
{ extraSystemPrompt: undefined, senderIsOwner: true, speakerLabel: "Owner" },
2489+
"u-owner",
2490+
);
2491+
ownerTurn?.sendInputAudio(Buffer.alloc(8));
2492+
bridgeParams?.onTranscript?.("user", "Hey, Molty, status of PR 123", true);
2493+
await new Promise((resolve) => setTimeout(resolve, 260));
2494+
2495+
expect(controlRealtimeVoiceAgentRunMock).toHaveBeenCalledWith({
2496+
sessionKey: "discord:g1:c1",
2497+
text: "status of PR 123",
2498+
});
2499+
expect(lastAgentCommandArgs().message).toContain("status of PR 123");
2500+
expect(lastAgentCommandArgs().message).not.toContain("Molty");
2501+
expect(lastAgentCommandArgs().message).not.toContain("Hey");
2502+
expect(lastAgentCommandArgs().userId).toBe("u-owner");
2503+
expectUserMessageIncludes("wake answer");
2504+
});
2505+
2506+
it("leaves non-OpenAI agent-proxy realtime auto-response enabled when wake names are requested", async () => {
2507+
resolveConfiguredRealtimeVoiceProviderMock.mockReturnValueOnce({
2508+
provider: { id: "google" },
2509+
providerConfig: { model: "gemini-live", voice: "default" },
2510+
});
2511+
const manager = createManager({
2512+
groupPolicy: "open",
2513+
voice: {
2514+
enabled: true,
2515+
mode: "agent-proxy",
2516+
realtime: { provider: "google", consultPolicy: "auto", requireWakeName: true },
2517+
},
2518+
});
2519+
2520+
await manager.join({ guildId: "g1", channelId: "1001" });
2521+
const bridgeParams = lastRealtimeBridgeParams() as
2522+
| {
2523+
autoRespondToAudio?: boolean;
2524+
interruptResponseOnInputAudio?: boolean;
2525+
}
2526+
| undefined;
2527+
2528+
expect(bridgeParams?.autoRespondToAudio).toBe(true);
2529+
expect(bridgeParams?.interruptResponseOnInputAudio).toBe(true);
2530+
});
2531+
2532+
it("uses configured wake names before realtime agent-proxy consults", async () => {
2533+
agentCommandMock.mockResolvedValueOnce({ payloads: [{ text: "configured wake answer" }] });
2534+
const manager = createManager({
2535+
groupPolicy: "open",
2536+
voice: {
2537+
enabled: true,
2538+
mode: "agent-proxy",
2539+
realtime: {
2540+
provider: "openai",
2541+
consultPolicy: "auto",
2542+
requireWakeName: true,
2543+
wakeNames: ["Claw", "Claw Bot"],
2544+
},
2545+
},
2546+
});
2547+
2548+
await manager.join({ guildId: "g1", channelId: "1001" });
2549+
const entry = getSessionEntry(manager) as {
2550+
realtime?: {
2551+
beginSpeakerTurn: (
2552+
context: { extraSystemPrompt?: string; senderIsOwner: boolean; speakerLabel: string },
2553+
userId: string,
2554+
) => { close: () => void; sendInputAudio: (audio: Buffer) => void };
2555+
};
2556+
};
2557+
const turn = entry.realtime?.beginSpeakerTurn(
2558+
{ extraSystemPrompt: undefined, senderIsOwner: true, speakerLabel: "Owner" },
2559+
"u-owner",
2560+
);
2561+
turn?.sendInputAudio(Buffer.alloc(8));
2562+
const bridgeParams = lastRealtimeBridgeParams() as
2563+
| {
2564+
onTranscript?: (role: "user" | "assistant", text: string, isFinal: boolean) => void;
2565+
}
2566+
| undefined;
2567+
2568+
bridgeParams?.onTranscript?.("user", "Claw Bot, ship it", true);
2569+
await new Promise((resolve) => setTimeout(resolve, 260));
2570+
2571+
expect(lastAgentCommandArgs().message).toContain("ship it");
2572+
expect(lastAgentCommandArgs().message).not.toContain("Claw");
2573+
expect(lastAgentCommandArgs().message).not.toContain("Bot");
2574+
expectUserMessageIncludes("configured wake answer");
2575+
});
2576+
24342577
it("lets status questions fall back to normal realtime handling when no run is active", async () => {
24352578
agentCommandMock.mockResolvedValueOnce({ payloads: [{ text: "status answer" }] });
24362579
controlRealtimeVoiceAgentRunMock.mockResolvedValueOnce({
@@ -3319,6 +3462,7 @@ describe("DiscordVoiceManager", () => {
33193462
voice: "cedar",
33203463
toolPolicy: "safe-read-only",
33213464
consultPolicy: "always",
3465+
requireWakeName: true,
33223466
providers: {
33233467
openai: {
33243468
interruptResponseOnInputAudio: false,

0 commit comments

Comments
 (0)