feat(msteams): Teams live voice support with .NET media worker by lupuletic · Pull Request #57511 · openclaw/openclaw

lupuletic · 2026-03-30T06:09:15Z

Summary

Adds three-tier MS Teams voice support: text bot (anywhere), transcript mode (anywhere), live voice (requires Windows Teams Voice Worker)
TS agent plane with capability negotiation, compliance gate, per-speaker unmixed audio pipeline, streaming STT, cut-through TTS, and gRPC bridge
.NET 6 media worker scaffolding using Microsoft Graph Communications SDK with unmixed ReceiveUnmixedMeetingAudio for per-speaker audio capture
Teams app manifest template (schema 1.24 with RSC permissions)
Full docs page at docs/channels/msteams-voice.md

Architecture

OpenClaw (TS, any OS) ←gRPC→ .NET Media Worker (Windows) ←RTP→ MS Teams

TS control plane: capability negotiation, compliance gate, STT/TTS pipeline, agent routing
.NET media worker: owns call lifecycle via Graph Communications SDK, unmixed audio capture, playback injection
Three capability tiers: live_voice (worker available) → transcript_mode (post-meeting artifacts) → text_only

Files

TS voice module (11 new files in extensions/msteams/src/voice/):
types, config, compliance-gate, worker-bridge, manager, streaming-stt, cut-through-tts, own-voice-filter, audio-pipeline, transcript-fallback

.NET media worker (10 new files in extensions/msteams/media-worker/):
CallHandler, ComplianceGate, UnmixedAudioCapture, AudioPlayback, QoEMonitor, WorkerRegistry, BridgeService, bridge.proto

Other: manifest template, docs page, config types, plugin schema

Test plan

pnpm check passes (verified)
Unit tests for compliance gate, VTT parser, config parsing
Mock gRPC worker integration test for TS pipeline
.NET media worker builds with dotnet build
E2E test with real Teams meeting via Azure Windows VM worker
RSC permission spike: validate Calls.AccessMedia.Chat for app-hosted media

🤖 Generated with Claude Code

…chitecture Three-tier Teams voice: text bot (anywhere), transcript mode (anywhere), and live voice (requires Windows Teams Voice Worker). Adds TS agent plane with capability negotiation, compliance gate, per-speaker unmixed audio pipeline, streaming STT, cut-through TTS, and gRPC bridge to .NET media worker. Includes .NET 6 media worker scaffolding, Teams app manifest template (schema 1.24 with RSC), and docs page.

steipete · 2026-04-25T03:32:14Z

Codex maintainer review: valuable direction, but not mergeable in this shape.

Main concerns:

Scope is too large for one PR: Teams plugin config, TS voice pipeline, .NET media worker, Graph/RSC docs, SDK surface, STT/TTS routing, and compliance behavior all land together. Split into contract/config/docs first, then worker integration, then live audio pipeline.
The test plan still has unchecked critical gates (dotnet build, mock gRPC integration, real Teams/Azure VM E2E, RSC permission spike). Live voice cannot land without proof for the external worker and permission model.
src/plugin-sdk/msteams.ts adds a public SDK seam. That needs explicit contract review/versioning and should be justified by a generic plugin need, not only this implementation.
Compliance/recording behavior is high-risk. The PR correctly mentions updateRecordingStatus, but the landing bar needs tests or a documented live proof that audio processing is blocked until compliance is active.

I would keep the idea alive, but ask for a much narrower first PR: documented capability tiers + plugin config schema + no-op capability negotiation, with worker/audio code following after the external build and permission proof exists.

steipete

Codex deep review: requesting changes. The direction is interesting, but this PR is not safe or functional enough to merge.

Findings:

WorkerBridge resolves the proto from the wrong path.

extensions/msteams/src/voice/worker-bridge.ts sets PROTO_PATH to ../../../media-worker/Protos/bridge.proto from extensions/msteams/src/voice. That resolves to extensions/media-worker/Protos/bridge.proto, not extensions/msteams/media-worker/Protos/bridge.proto. The first live connect() will fail before it can load the gRPC service. From the source layout this should be two levels up, or the proto needs to be copied into the built package and resolved via a packaged asset path.

Remote worker transport is unauthenticated/plaintext while carrying Teams app secrets.

The .NET worker listens on all interfaces (Program.cs uses ListenAnyIP(grpcPort)), and the TS bridge connects with grpc.credentials.createInsecure(). joinMeeting then sends tenantId, appId, and appSecret over that channel. The docs show remote worker addresses/FQDNs, so this is not purely loopback. This needs mTLS or at least an explicit shared-token/TLS story before any remote worker support can land. At minimum, default to loopback-only and reject non-loopback worker addresses unless a secure transport/auth option is configured.

SecretRef app passwords are silently dropped.

extensions/msteams/src/voice/manager.ts resolves appSecret only when msteamsConfig.appPassword is a string; if the normal config contains a SecretRef, the manager sends an empty secret to the worker. The rest of the msteams plugin already has secret-input helpers (extensions/msteams/src/token.ts, secret-input.ts) for this exact contract. Voice must use the same secret resolution path and fail early with a useful error if unresolved.

TTS output is sent to the worker as raw PCM, but textToSpeech() does not guarantee raw 16kHz PCM bytes.

extensions/msteams/src/voice/manager.ts reads ttsResult.audioPath and passes those file bytes directly to WorkerBridge.playAudio; AudioPlayback.cs treats the bytes as 16kHz mono signed PCM frames. If the configured TTS provider emits WAV/MP3/Opus or another sample rate, the worker will inject encoded/container bytes as PCM noise. The code needs an explicit decode/resample step to the worker's required format, or the worker protocol should carry MIME/container metadata and decode there.

gRPC stream subscriptions leak delegates.

BridgeService.SubscribeUnmixedAudio and SubscribeEvents add callbacks to session.AudioSubscribers / session.EventSubscribers but never remove them in finally. ConcurrentBag has no removal path, so every disconnected TS client leaves a dead subscriber attached for the lifetime of the call. Use a removable subscription collection or per-subscriber channel registry and remove on stream completion/cancellation.

Speaker identity is currently fabricated.

CallHandler.ResolveSpeaker() iterates participants but never maps the media speaker id to a participant; it always returns AadUserId = speakerId.ToString() and DisplayName = Speaker-{speakerId}. That means the TS prompt path will attribute real participants to synthetic speaker labels. If per-speaker unmixed audio is part of the value prop, this needs a real mapping or the user-visible docs should not claim identified speakers yet.

The public SDK seam is not justified by this PR.

src/plugin-sdk/msteams.ts exports MSTeamsVoiceConfig and MSTeamsVoicePermissionMode, creating a public SDK contract for an implementation that has not proven the worker protocol, permission model, or transport security. Keep the first pass plugin-private, or land the SDK addition separately with explicit versioned contract review.

Suggested split: first land a tiny config/status capability tier PR with tests. Then a worker protocol PR with secure transport and build proof. Then audio/STT/TTS integration with real format conversion and a live worker smoke. This PR currently bundles all of those risk surfaces together.

openclaw-barnacle · 2026-04-30T04:40:35Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

clawsweeper · 2026-05-01T00:43:34Z

Codex review: found issues before merge.

Summary
This draft PR adds Microsoft Teams live voice docs/config/schema, a TypeScript voice manager and gRPC worker bridge, a Teams manifest template, and a Windows .NET media-worker scaffold.

Reproducibility: yes. for the review blockers: static inspection of the PR head reproduces the insecure bridge, undeclared gRPC imports, raw TTS-file-to-PCM playback path, receive-only audio socket, and floating NuGet ranges. No live Teams E2E reproduction is available from the supplied context, and the PR test plan leaves that proof unchecked.

Next step before merge
Maintainer next action is product/security/ownership review and PR decomposition, not an automated repair, because the blockers span bridge security, public config/SDK contract, .NET worker dependencies, compliance policy, and live Microsoft validation.

Security
Needs attention: The PR introduces an unauthenticated insecure media-worker bridge and floating .NET dependency ranges that must be fixed before merge.

Review findings

[P1] Authenticate the worker bridge before sending secrets — extensions/msteams/src/voice/worker-bridge.ts:121
[P2] Declare the gRPC client dependencies — extensions/msteams/src/voice/worker-bridge.ts:28-38
[P2] Convert TTS output before streaming it as PCM — extensions/msteams/src/voice/manager.ts:486-489

Review details

Best possible solution:

Land a narrower owner-reviewed sequence: capability-tier docs and config first, then an authenticated worker bridge, deterministic .NET project, compliance-gated audio path, and live/RSC proof in separate PRs.

Do we have a high-confidence way to reproduce the issue?

Yes for the review blockers: static inspection of the PR head reproduces the insecure bridge, undeclared gRPC imports, raw TTS-file-to-PCM playback path, receive-only audio socket, and floating NuGet ranges. No live Teams E2E reproduction is available from the supplied context, and the PR test plan leaves that proof unchecked.

Is this the best way to solve the issue?

No. The current all-in-one PR is not the best way to solve the feature because it combines contract, worker infrastructure, security-sensitive media control, compliance behavior, and realtime audio before the dependency and permission contracts are proven.

Full review comments:

[P1] Authenticate the worker bridge before sending secrets — extensions/msteams/src/voice/worker-bridge.ts:121
connect() creates an insecure gRPC channel, while JoinMeetingRequest carries appSecret and the worker listens on all interfaces. Require TLS/mTLS or a scoped bridge token before sending credentials or call/audio control over this port.
Confidence: 0.92
[P2] Declare the gRPC client dependencies — extensions/msteams/src/voice/worker-bridge.ts:28-38
The new bridge imports @grpc/grpc-js and @grpc/proto-loader, but the Teams plugin package and lockfile do not declare them. Typecheck/package builds and voice.enabled startup will fail before capability negotiation can run.
Confidence: 0.95
[P2] Convert TTS output before streaming it as PCM — extensions/msteams/src/voice/manager.ts:486-489
textToSpeech() returns an audio file in the provider output format, but this code reads those bytes and sends them to a worker API documented as raw 16 kHz mono PCM. Use telephony PCM output or an explicit transcode step before playback.
Confidence: 0.88
[P2] Open a send-capable audio socket for playback — extensions/msteams/media-worker/CallHandler.cs:123-126
The worker creates the call audio socket with StreamDirection.Recvonly, then later uses the same socket for AudioPlayback.Send(). A receive-only media stream cannot satisfy the advertised listen-and-speak live voice path.
Confidence: 0.82
[P2] Pin the media worker package versions — extensions/msteams/media-worker/TeamsMediaWorker.csproj:11-15
The .NET project restores wildcard package ranges such as 1.2.*, 2.*, and 3.*. Pin exact versions so worker builds are reproducible and Graph/gRPC runtime behavior cannot change without a source diff.
Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.91

Security concerns:

[high] Unauthenticated insecure worker bridge — extensions/msteams/src/voice/worker-bridge.ts:121
The TypeScript client uses insecure gRPC while JoinMeetingRequest includes the app secret and call/audio control methods. If the documented worker port is reachable, this can expose credentials, meeting audio, and call control.
Confidence: 0.92
[medium] Floating NuGet package ranges — extensions/msteams/media-worker/TeamsMediaWorker.csproj:11
The media worker uses wildcard package versions for Microsoft Graph Communications, gRPC, protobuf, and Grpc.Tools, weakening supply-chain reproducibility and allowing build/runtime behavior to change without a source diff.
Confidence: 0.9

What I checked:

current main lacks Teams voice docs: The current Microsoft Teams docs describe text, attachments, files, polls, and message actions; the status line does not advertise live voice, transcript mode, or a Windows media worker. Public docs: docs/channels/msteams.md. (docs/channels/msteams.md:8, cc8a8f1df1cd)
current main has no Teams voice config schema: The current Teams plugin manifest still has an empty configSchema.properties object, so the proposed voice configuration is not implemented on main. (extensions/msteams/openclaw.plugin.json:10, cc8a8f1df1cd)
current main monitor only starts the Teams webhook path: The monitor currently registers the Bot Framework message endpoint and starts the HTTP server; there is no voice manager, worker bridge, capability negotiation, or auto-join path in this surface. (extensions/msteams/src/monitor.ts:317, cc8a8f1df1cd)
bridge sends app secret over insecure gRPC: The PR head creates the gRPC client with createInsecure() and JoinMeetingRequest includes appSecret, so the worker bridge needs authentication/TLS before merge. (extensions/msteams/src/voice/worker-bridge.ts:121, 52deda769eb7)
gRPC runtime dependencies are undeclared: The PR imports @grpc/grpc-js and @grpc/proto-loader, while the current Teams plugin package dependencies do not include either package and the lockfile search found no matching entries. (extensions/msteams/package.json:9, cc8a8f1df1cd)
media worker restores floating package ranges: The PR head uses wildcard NuGet versions for Microsoft Graph Communications, gRPC, protobuf, and Grpc.Tools, making the Windows worker build nondeterministic. (extensions/msteams/media-worker/TeamsMediaWorker.csproj:11, 52deda769eb7)

Likely related people:

steipete: Recent main history shows multiple Teams docs/plugin maintenance commits, and the existing maintainer review on this PR names the split/proof requirements for the Teams voice direction. (role: recent maintainer and reviewer; confidence: high; commits: 3f002b10d281, 08c4af0ddf62, dd098596cf34; files: extensions/msteams/src/monitor.ts, docs/channels/msteams.md, extensions/msteams/openclaw.plugin.json)
SidU: Authored the current Teams SDK and AI UX migration that underlies monitor, SDK, JWT validation, streaming/status behavior, and the current Teams channel runtime surface this PR extends. (role: introduced current Teams SDK runtime base; confidence: medium; commits: cd90130877f1; files: extensions/msteams/src/monitor.ts, extensions/msteams/src/sdk.ts, docs/channels/msteams.md)
BradGroux: Recent history shows substantial Teams plugin work and coauthorship around Teams config, streaming, auth, and federated credential support, which are adjacent to this PR's Teams voice/auth surface. (role: adjacent Teams plugin maintainer; confidence: medium; commits: 03c64df39fe7, fce81fccd859, 6b0e74000d9f; files: extensions/msteams/src/monitor.ts, src/config/types.msteams.ts, docs/channels/msteams.md)
HDYA: Authored the federated credential support that expanded the Teams auth/config contract; this PR's media worker and permission model would need to fit that owner-reviewed authentication surface. (role: adjacent Teams authentication owner; confidence: medium; commits: 26f633b604fd; files: src/config/types.msteams.ts, docs/channels/msteams.md, extensions/msteams/src/token.ts)

Remaining risk / open question:

The PR is a draft and its own checklist still lacks dotnet build, mock gRPC integration, live Teams/Azure VM E2E, and RSC permission proof.
The proposed bridge currently exposes call/audio control and app secrets over an unauthenticated insecure channel if the worker port is reachable.
The feature spans public config/SDK contract, external Windows infrastructure, compliance policy, realtime media, and provider audio behavior, so a narrow automated repair would not be safe.

Codex review notes: model gpt-5.5, reasoning high; reviewed against cc8a8f1df1cd.

openclaw-barnacle Bot added docs Improvements or additions to documentation channel: msteams Channel integration: msteams size: XL labels Mar 30, 2026

lupuletic force-pushed the feature/msteams-voice-calls branch from 702abc0 to 52deda7 Compare April 5, 2026 13:49

steipete requested changes Apr 25, 2026

View reviewed changes

BingqingLyu mentioned this pull request Apr 27, 2026

feat(msteams): Teams live voice support with .NET media worker BingqingLyu/openclaw#1669

Open

6 tasks

BradGroux mentioned this pull request Apr 29, 2026

WORKING: All Microsoft Issues and PRs (refresh) #74163

Draft

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 30, 2026

openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(msteams): Teams live voice support with .NET media worker#57511

feat(msteams): Teams live voice support with .NET media worker#57511
lupuletic wants to merge 1 commit intoopenclaw:mainfrom
lupuletic:feature/msteams-voice-calls

lupuletic commented Mar 30, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete left a comment

Uh oh!

openclaw-barnacle Bot commented Apr 30, 2026

Uh oh!

clawsweeper Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lupuletic commented Mar 30, 2026

Summary

Architecture

Files

Test plan

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete left a comment

Choose a reason for hiding this comment

Uh oh!

openclaw-barnacle Bot commented Apr 30, 2026

Uh oh!

clawsweeper Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 1, 2026 •

edited

Loading