Skip to content

Commit befb0f3

Browse files
feat(discord): follow configured users in voice
Summary: - Adds Discord voice followUsers/followUsersEnabled config, metadata, docs, and changelog coverage. - Makes Discord voice follow configured users across joins, moves, disconnects, admin moves, handoff, bounded reconciliation, transient REST failures, destroy cleanup, and DAVE recovery. - Adds focused Discord voice/config regression tests and refreshes generated config docs metadata. Verification: - node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts extensions/discord/src/voice/manager.e2e.test.ts - node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-discord.config.ts extensions/discord/src/config-schema.test.ts - pnpm config:channels:check - pnpm config:docs:check - pnpm config:schema:check - pnpm exec oxfmt --check --threads=1 docs/channels/discord.md extensions/discord/src/voice/manager.ts extensions/discord/src/voice/manager.e2e.test.ts src/config/bundled-channel-config-metadata.generated.ts CHANGELOG.md - git diff --check - pnpm build - pnpm check:test-types - Mac Studio config validate + gateway:watch proof on cf67023; Discord provider started and gateway ready - Autoreview passed after two actionable findings were fixed CI notes: - PR-specific proof is green: check-docs, config-boundary, real behavior proof, check-test-types, OpenGrep, CodeQL, no-tabs, security-fast. - Remaining broad CI reds match current main failures/noise on unrelated fs-safe Python helper, Windows ACL locale, managed media staging, and dependency guardrail surfaces. Co-authored-by: FullerStackDev <263060202+fuller-stack-dev@users.noreply.github.com>
1 parent d147036 commit befb0f3

10 files changed

Lines changed: 1135 additions & 36 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Docs: https://docs.openclaw.ai
66

77
### Changes
88

9+
- Discord: let voice sessions follow configured Discord users into voice channels, with allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation. (#84264) Thanks @fuller-stack-dev.
910
- Dependencies: bump the bundled Codex harness to `@openai/codex` `0.132.0` and refresh the app-server model-list docs for the new catalog.
1011
- CLI/policy: add the bundled Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. (#80407) Thanks @giodl73-repo.
1112
- Agents/config: allow `agents.list[].experimental.localModelLean` so lean local-model mode can be enabled for one configured agent instead of globally.
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
8c611014328719fc3a7faf4f6e1eb8e053c2e920aa8988d616de3ae555d40bad config-baseline.json
1+
19aca41ca61bf3b94ce98c7cd63e915f231fca52d4ee708584dde30bc310f5f7 config-baseline.json
22
58496af646ee58bce473dda064d43e1e383ef7c6e726d12d837c1fef12a303b0 config-baseline.core.json
3-
e068db276fdff1727939d4f3a8001376e550c444bdff3e3443ab26812e2f8c5d config-baseline.channel.json
4-
a87fc4c9bc6499c5fb9d9343b8c1c4f0c3381a6afbdb0a676dc8ba9e03ff5755 config-baseline.plugin.json
3+
05e2ef29d9d18d57f9e1c2a9af375cc725e8a01a7497abc98f8da358f653acb6 config-baseline.channel.json
4+
d455f53b424976f99990330503692728121cbbeff04014fb50b5eed23aae59d4 config-baseline.plugin.json

docs/channels/discord.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1226,6 +1226,7 @@ Notes:
12261226
- `voice.mode` controls the conversation path. The default is `agent-proxy`: a realtime voice front end handles turn timing, interruption, and playback, delegates substantive work to the routed OpenClaw agent through `openclaw_agent_consult`, and treats the result like a typed Discord prompt from that speaker. `stt-tts` keeps the older batch STT plus TTS flow. `bidi` lets the realtime model converse directly while exposing `openclaw_agent_consult` for the OpenClaw brain.
12271227
- `voice.agentSession` controls which OpenClaw conversation receives voice turns. Leave it unset for the voice channel's own session, or set `{ mode: "target", target: "channel:<text-channel-id>" }` to make the voice channel act as the microphone/speaker extension of an existing Discord text channel session such as `#maintainers`.
12281228
- `voice.model` overrides the OpenClaw agent brain for Discord voice responses and realtime consults. Leave it unset to inherit the routed agent model. It is separate from `voice.realtime.model`.
1229+
- `voice.followUsers` lets the bot join, move, and leave Discord voice with selected users. See [Follow users in voice](#follow-users-in-voice) for behavior rules and examples.
12291230
- `agent-proxy` routes speech through `discord-voice`, which preserves normal owner/tool authorization for the speaker and target session but hides the agent `tts` tool because Discord voice owns playback. By default, `agent-proxy` gives the consult full owner-equivalent tool access for owner speakers (`voice.realtime.toolPolicy: "owner"`) and strongly prefers consulting the OpenClaw agent before substantive answers (`voice.realtime.consultPolicy: "always"`). In that default `always` mode, the realtime layer does not auto-speak filler before the consult answer; it captures and transcribes speech, then speaks the routed OpenClaw answer. If multiple forced consult answers finish while Discord is still playing the first answer, later exact-speech answers are queued until playback idles instead of replacing speech mid-sentence.
12301231
- In `stt-tts` mode, STT uses `tools.media.audio`; `voice.model` does not affect transcription.
12311232
- In realtime modes, `voice.realtime.provider`, `voice.realtime.model`, and `voice.realtime.voice` configure the realtime audio session. For OpenAI Realtime 2 plus the Codex brain, use `voice.realtime.model: "gpt-realtime-2"` and `voice.model: "openai-codex/gpt-5.5"`.
@@ -1254,6 +1255,47 @@ Notes:
12541255
- Verbose Discord voice logs include a bounded one-line STT transcript preview for each accepted speaker segment, so debugging shows both the user side and the agent reply side without dumping unbounded transcript text.
12551256
- In `agent-proxy` mode, forced consult fallback skips likely incomplete transcript fragments such as text ending in `...` or a trailing connector like `and`, plus obvious non-actionable closings like “be right back” or “bye”. Logs show `forced agent consult skipped reason=...` when this prevents a stale queued answer.
12561257

1258+
### Follow users in voice
1259+
1260+
Use `voice.followUsers` when you want the Discord voice bot to stay with one or more known Discord users instead of joining a fixed channel at startup or waiting for `/vc join`.
1261+
1262+
```json5
1263+
{
1264+
channels: {
1265+
discord: {
1266+
voice: {
1267+
enabled: true,
1268+
followUsersEnabled: true,
1269+
followUsers: ["discord:123456789012345678"],
1270+
allowedChannels: [
1271+
{
1272+
guildId: "123456789012345678",
1273+
channelId: "234567890123456789",
1274+
},
1275+
],
1276+
},
1277+
},
1278+
},
1279+
}
1280+
```
1281+
1282+
Behavior:
1283+
1284+
- `followUsers` accepts raw Discord user IDs and `discord:<id>` values. OpenClaw normalizes both forms before matching voice-state events.
1285+
- `followUsersEnabled` defaults to `true` when `followUsers` is configured. Set it to `false` to keep the saved list but stop automatic voice following.
1286+
- When a followed user joins an allowed voice channel, OpenClaw joins that channel. When the user moves, OpenClaw moves with them. When the active followed user disconnects, OpenClaw leaves.
1287+
- If multiple followed users are in the same guild and the active followed user leaves, OpenClaw moves to another tracked followed user's channel before leaving the guild. If several followed users move at once, the latest observed voice-state event wins.
1288+
- `allowedChannels` still applies. A followed user in a disallowed channel is ignored, and a follow-owned session moves to another followed user or leaves.
1289+
- OpenClaw reconciles missed voice-state events on startup and at a bounded interval. Reconciliation samples configured guilds and caps REST lookups per run, so very large `followUsers` lists may take more than one interval to converge.
1290+
- If Discord or an admin moves the bot while it is following a user, OpenClaw rebuilds the voice session and preserves follow ownership when the destination is allowed. If the bot is moved outside `allowedChannels`, OpenClaw leaves and rejoins the configured target when one exists.
1291+
- DAVE receive recovery may leave and rejoin the same channel after repeated decrypt failures. Follow-owned sessions keep their follow ownership through that recovery path, so a later followed-user disconnect still leaves the channel.
1292+
1293+
Choose between the join modes:
1294+
1295+
- Use `followUsers` for personal or operator setups where the bot should automatically be in voice when you are.
1296+
- Use `autoJoin` for fixed-room bots that should be present even when no tracked user is in voice.
1297+
- Use `/vc join` for one-off joins or rooms where automatic voice presence would be surprising.
1298+
12571299
Native opus setup for source checkouts:
12581300

12591301
```bash
@@ -1288,6 +1330,8 @@ Default agent-proxy voice-channel session example:
12881330
voice: {
12891331
enabled: true,
12901332
model: "openai-codex/gpt-5.5",
1333+
followUsersEnabled: true,
1334+
followUsers: ["123456789012345678"],
12911335
realtime: {
12921336
provider: "openai",
12931337
model: "gpt-realtime-2",

extensions/discord/src/config-schema.test.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,8 @@ describe("discord config schema", () => {
194194
voice: {
195195
mode: "agent-proxy",
196196
model: "openai-codex/gpt-5.5",
197+
followUsersEnabled: true,
198+
followUsers: ["58398277829140480"],
197199
realtime: {
198200
provider: "openai",
199201
model: "gpt-realtime-2",
@@ -214,6 +216,8 @@ describe("discord config schema", () => {
214216

215217
expect(cfg.voice?.mode).toBe("agent-proxy");
216218
expect(cfg.voice?.model).toBe("openai-codex/gpt-5.5");
219+
expect(cfg.voice?.followUsersEnabled).toBe(true);
220+
expect(cfg.voice?.followUsers).toEqual(["58398277829140480"]);
217221
expect(cfg.voice?.realtime?.provider).toBe("openai");
218222
expect(cfg.voice?.realtime?.model).toBe("gpt-realtime-2");
219223
expect(cfg.voice?.realtime?.voice).toBe("cedar");
@@ -233,6 +237,7 @@ describe("discord config schema", () => {
233237
{ mode: "agent-proxy", realtime: { minBargeInAudioEndMs: -1 } },
234238
{ mode: "agent-proxy", realtime: { minBargeInAudioEndMs: 10_001 } },
235239
{ agentSession: { mode: "target" } },
240+
{ followUsers: [""] },
236241
]) {
237242
expectInvalidDiscordConfig({ voice });
238243
}

extensions/discord/src/config-ui-hints.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,14 @@ export const discordChannelConfigUiHints = {
201201
label: "Discord Voice Agent Session Target",
202202
help: 'Discord target used when voice.agentSession.mode="target", for example channel:123.',
203203
},
204+
"voice.followUsersEnabled": {
205+
label: "Discord Voice Follow Users Enabled",
206+
help: "Toggle Discord voice follow-users behavior without removing the saved voice.followUsers list. Defaults to true when followUsers is configured.",
207+
},
208+
"voice.followUsers": {
209+
label: "Discord Voice Follow Users",
210+
help: "Discord user IDs to follow into voice channels. The bot joins when a followed user joins or moves, and leaves when that user disconnects.",
211+
},
204212
"voice.realtime.provider": {
205213
label: "Discord Realtime Provider",
206214
help: "Realtime voice provider for agent-proxy or bidi Discord voice modes, such as openai.",

0 commit comments

Comments
 (0)