feat(azure-speech): add realtime transcription provider for voice-call by ottodeng · Pull Request #73456 · openclaw/openclaw

ottodeng · 2026-04-28T09:38:10Z

Why

OpenClaw's voice-call plugin can stream live call audio to a realtime
transcription provider, and it already has a clean pluggable provider
interface (added by #68697). Today five providers are
registered out of the box — Deepgram, ElevenLabs, Mistral, OpenAI, and
xAI — but Azure Speech is missing, even though OpenClaw already ships an
azure-speech plugin for TTS. That forces users who want to standardize on
Microsoft's speech stack (or who already have Azure Speech keys provisioned)
to depend on a non-Microsoft service for STT.

This PR closes that gap by registering Azure Speech as a realtime
transcription provider and reusing the existing azure-speech plugin's
config and env-var fallbacks.

What

New provider in extensions/azure-speech/:
- realtime-transcription-provider.ts — registry contract, config
  normalization, and provider builder.
- realtime-transcription-session.ts — the recognizer session: lazy
  connect, partial/final transcript routing, NoMatch handling, error
  propagation, audio queueing with overflow protection, idempotent
  teardown.
- realtime-transcription-types.ts — minimal structural typing of the
  microsoft-cognitiveservices-speech-sdk API surface used here.
index.ts calls api.registerRealtimeTranscriptionProvider(...) next to
the existing api.registerSpeechProvider(...).
openclaw.plugin.json declares the new
realtimeTranscriptionProviders: ["azure-speech", "azure"] contract and
documents the new schema fields.
package.json adds microsoft-cognitiveservices-speech-sdk as a runtime
dependency. The SDK is lazy-loaded on first session creation, so users
who only use Azure Speech TTS pay no extra startup cost.

Why use the official SDK and not a hand-rolled WebSocket?

Azure Speech's continuous-recognition wire protocol (USP) is not publicly
documented. The official SDK already handles the parts that matter for a
streaming use case — connection setup, USP framing, partial vs final
results, automatic reconnects, end-of-utterance detection — and is the
pattern Microsoft's docs recommend. Lazy import keeps the cost off the
critical path for non-STT users (mirrors the pattern used by the existing
microsoft-speech plugin with node-edge-tts).

Configuration

Reuses the existing AZURE_SPEECH_KEY / AZURE_SPEECH_API_KEY /
SPEECH_KEY and AZURE_SPEECH_REGION / SPEECH_REGION env-var fallbacks.
Voice Call config:

{
  plugins: {
    entries: {
      "voice-call": {
        config: {
          streaming: {
            enabled: true,
            provider: "azure-speech",
            streamPath: "/voice/stream",
            providers: {
              "azure-speech": {
                apiKey: "${AZURE_SPEECH_KEY}",
                region: "${AZURE_SPEECH_REGION}", // e.g. eastus, southeastasia
                language: "en-US",
                encoding: "mulaw",                 // pcm | mulaw | alaw
                sampleRate: 8000,                  // matches Twilio media stream
                endSilenceTimeoutMs: 800,
                // Optional Azure auto language detection
                // candidateLanguages: ["en-US", "zh-CN"],
              },
            },
          },
        },
      },
    },
  },
}

Sovereign clouds and private endpoints can use endpoint instead of
region.

Tests

33 unit tests, all green:
- Config normalization: env fallbacks, providers.<id> sub-config,
  encoding aliases (linear16 → pcm, g711_ulaw → mulaw, etc.),
  invalid encoding rejection, candidate-language parsing.
- Registry contract: id / aliases, isConfigured for the
  region+key, endpoint+key, missing-key, and missing-location cases,
  createSession error messages.
- Session lifecycle: connect via fromSubscription and fromEndpoint,
  auto-detect via SpeechRecognizer.FromConfig, partial/final
  transcripts, NoMatch handling, error vs EndOfStream cancellation,
  start-failure rejection, audio buffer overflow, lazy connect on
  first audio frame, idempotent close, ignored-after-close audio.
Live integration test in azure-speech.live.test.ts (skipped
without AZURE_SPEECH_KEY/AZURE_SPEECH_REGION): synthesizes a short
µ-law clip with the existing TTS provider and feeds it through the new
STT provider in 20 ms frames, verifying the round-trip transcript.
Full pnpm vitest run extensions/azure-speech/ extensions/voice-call/:
328 tests across 43 files, all green.
pnpm tsgo:extensions, pnpm tsgo:extensions:test,
pnpm lint:extensions, pnpm config:docs:check,
pnpm config:schema:check, pnpm check:loc: all clean.

Docs

docs/providers/azure-speech.md — adds the realtime transcription config
table, a "Realtime transcription" accordion entry, and updates the page
summary.
docs/plugins/voice-call.md — adds Azure Speech to the bundled
realtime transcription provider list and a full Azure Speech tab in the
streaming provider examples.

P1 — First audio frame dropped on lazy connect (realtime-transcription-session.ts:281–289): sendAudio fires connect() asynchronously and immediately calls the inner sendAudio in the same tick. Since pushStream is always undefined at that point, the guard triggers onError(\"Azure Speech push stream is not initialized\") and discards the audio. The advertised lazy-connect behavior silently eats the triggering frame. The corresponding test only asserts startSpy was called, not that the audio arrived.

Confidence Score: 3/5

Not safe to merge as-is: lazy connect silently drops the first audio frame and fires an unexpected onError on every session that skips explicit connect().

One P1 data-loss bug (first audio frame discarded on lazy connect) with no test coverage catching it, capping confidence at 4; the pattern affects a core advertised feature path, pulling it down to 3.

extensions/azure-speech/realtime-transcription-session.ts (lazy-connect sendAudio race), extensions/azure-speech/realtime-transcription-lifecycle.test.ts (missing assertions on audio delivery and onError in lazy-connect test)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/azure-speech/realtime-transcription-session.ts
Line: 281-289

Comment:
**First audio frame dropped on lazy connect**

When `sendAudio` is invoked before `connect()` has resolved, it fires `connect()` asynchronously and then immediately calls the inner `sendAudio`. Because `pushStream` is still `undefined` at that synchronous moment, the inner guard triggers:

```ts
if (!pushStream) {
  handleError(new Error("Azure Speech push stream is not initialized"));
  return;
}
```

The audio from that triggering frame is silently discarded and `onError` fires unexpectedly on every lazy-connect caller. The test at line 353 (`lazily connects on the first audio frame`) only asserts that `startSpy` was called; it never checks `pushStream.writeSpy` or `onError`, so this data-loss path goes undetected.

If parity with sibling providers requires queuing audio until the stream is ready, the first frame should be buffered and flushed once `connect()` resolves rather than being forwarded before `pushStream` exists.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/azure-speech/realtime-transcription-provider.ts
Line: 217-223

Comment:
**Redundant re-normalization inside `isConfigured`**

`providerConfig` passed to `isConfigured` is already the resolved output of `resolveConfig` (which returns an `AzureSpeechRealtimeProviderConfig`). Calling `normalizeAzureSpeechRealtimeProviderConfig` on it again is wasteful and relies on the normalization being idempotent. The check can use the typed fields directly:

```ts
isConfigured: ({ providerConfig }) => {
  const cfg = providerConfig as AzureSpeechRealtimeProviderConfig;
  if (!cfg.apiKey) return false;
  return Boolean(cfg.endpoint || cfg.region);
},
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat(azure-speech): add realtime transcr..." | Re-trigger Greptile}

greptile-apps · 2026-04-28T09:40:40Z

+    sendAudio(audio: Buffer) {
+      if (!connected && !connecting) {
+        // Lazy connect on first audio frame for parity with sibling providers.
+        connect().catch((error) =>
+          handleError(error instanceof Error ? error : new Error(String(error))),
+        );
+      }
+      sendAudio(audio);
+    },


First audio frame dropped on lazy connect

When sendAudio is invoked before connect() has resolved, it fires connect() asynchronously and then immediately calls the inner sendAudio. Because pushStream is still undefined at that synchronous moment, the inner guard triggers:

if (!pushStream) { handleError(new Error("Azure Speech push stream is not initialized")); return; }

The audio from that triggering frame is silently discarded and onError fires unexpectedly on every lazy-connect caller. The test at line 353 (lazily connects on the first audio frame) only asserts that startSpy was called; it never checks pushStream.writeSpy or onError, so this data-loss path goes undetected.

If parity with sibling providers requires queuing audio until the stream is ready, the first frame should be buffered and flushed once connect() resolves rather than being forwarded before pushStream exists.

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/azure-speech/realtime-transcription-session.ts Line: 281-289 Comment: **First audio frame dropped on lazy connect** When `sendAudio` is invoked before `connect()` has resolved, it fires `connect()` asynchronously and then immediately calls the inner `sendAudio`. Because `pushStream` is still `undefined` at that synchronous moment, the inner guard triggers: ```ts if (!pushStream) { handleError(new Error("Azure Speech push stream is not initialized")); return; } ``` The audio from that triggering frame is silently discarded and `onError` fires unexpectedly on every lazy-connect caller. The test at line 353 (`lazily connects on the first audio frame`) only asserts that `startSpy` was called; it never checks `pushStream.writeSpy` or `onError`, so this data-loss path goes undetected. If parity with sibling providers requires queuing audio until the stream is ready, the first frame should be buffered and flushed once `connect()` resolves rather than being forwarded before `pushStream` exists. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-28T09:40:41Z

+    isConfigured: ({ providerConfig }) => {
+      const normalized = normalizeAzureSpeechRealtimeProviderConfig(providerConfig);
+      if (!normalized.apiKey) {
+        return false;
+      }
+      return Boolean(normalized.endpoint || normalized.region);
+    },


Redundant re-normalization inside isConfigured

providerConfig passed to isConfigured is already the resolved output of resolveConfig (which returns an AzureSpeechRealtimeProviderConfig). Calling normalizeAzureSpeechRealtimeProviderConfig on it again is wasteful and relies on the normalization being idempotent. The check can use the typed fields directly:

isConfigured: ({ providerConfig }) => { const cfg = providerConfig as AzureSpeechRealtimeProviderConfig; if (!cfg.apiKey) return false; return Boolean(cfg.endpoint || cfg.region); },

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/azure-speech/realtime-transcription-provider.ts Line: 217-223 Comment: **Redundant re-normalization inside `isConfigured`** `providerConfig` passed to `isConfigured` is already the resolved output of `resolveConfig` (which returns an `AzureSpeechRealtimeProviderConfig`). Calling `normalizeAzureSpeechRealtimeProviderConfig` on it again is wasteful and relies on the normalization being idempotent. The check can use the typed fields directly: ```ts isConfigured: ({ providerConfig }) => { const cfg = providerConfig as AzureSpeechRealtimeProviderConfig; if (!cfg.apiKey) return false; return Boolean(cfg.endpoint || cfg.region); }, ``` How can I resolve this? If you propose a fix, please make it concise.

clawsweeper · 2026-04-28T09:42:24Z

Codex review: needs real behavior proof before merge.

Summary
The PR adds an Azure Speech realtime transcription provider to the azure-speech plugin, including provider/session code, config/docs/tests, and the Microsoft Speech SDK runtime dependency.

Reproducibility: not applicable. this is a feature PR, not a bug report. The merge blocker is source-reproducible by comparing the PR head manifest with the PR head generated plugin docs, and real behavior proof is absent from the discussion.

Real behavior proof
Needs real behavior proof before merge: Missing: the PR includes unit tests and a skipped-unless-env live test definition, but no after-fix terminal output, logs, screenshot, or linked artifact showing Azure Speech realtime transcription working.

Next step before merge
Contributor action is needed for real behavior proof, and automation should not run a repair loop on this PR while that external proof gate is unsatisfied.

Security
Cleared: Cleared: the diff adds the official Microsoft Speech SDK dependency with lockfile entries and does not touch workflows, permissions, scripts, secret handling, or package sources in a concerning way.

Review findings

[P2] Regenerate the plugin inventory docs — extensions/azure-speech/openclaw.plugin.json:29

Review details

Best possible solution:

Keep Azure STT in the azure-speech plugin and merge after the generated plugin docs are regenerated and the contributor attaches redacted live transcription output or logs from a real Azure Speech run.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a feature PR, not a bug report. The merge blocker is source-reproducible by comparing the PR head manifest with the PR head generated plugin docs, and real behavior proof is absent from the discussion.

Is this the best way to solve the issue?

Mostly yes: the existing realtime transcription registry and azure-speech plugin are the right boundary for this provider. The current patch is not merge-ready until generated plugin docs are synchronized and after-fix real behavior proof is supplied.

Full review comments:

[P2] Regenerate the plugin inventory docs — extensions/azure-speech/openclaw.plugin.json:29
Adding realtimeTranscriptionProviders changes the generated plugin surface, but the PR head still publishes Azure Speech as contracts: speechProviders only in the plugin inventory and generated Azure Speech reference page. pnpm plugins:inventory:check compares these docs to manifest metadata and will fail until pnpm plugins:inventory:gen is run and committed.
Confidence: 0.97

Overall correctness: patch is incorrect
Overall confidence: 0.92

Acceptance criteria:

pnpm plugins:inventory:check
pnpm config:docs:check
pnpm test extensions/azure-speech
OPENCLAW_LIVE_TEST=1 pnpm test extensions/azure-speech/azure-speech.live.test.ts

What I checked:

Current main is TTS-only: At current main, the azure-speech entrypoint registers only the speech provider and does not register a realtime transcription provider. (extensions/azure-speech/index.ts:9, 9fa685e3b3e4)
Current main manifest lacks realtime contract: At current main, the azure-speech manifest declares contracts.speechProviders only. (extensions/azure-speech/openclaw.plugin.json:25, 9fa685e3b3e4)
PR head adds realtime contract: The PR head manifest adds contracts.realtimeTranscriptionProviders for azure-speech and azure aliases. (extensions/azure-speech/openclaw.plugin.json:29, 27f2d616581e)
Generated inventory remains stale: The PR head plugin inventory still lists Azure Speech as contracts: speechProviders only, omitting realtimeTranscriptionProviders. Public docs: docs/plugins/plugin-inventory.md. (docs/plugins/plugin-inventory.md:59, 27f2d616581e)
Generated reference remains stale: The PR head Azure Speech generated reference page still shows Surface as contracts: speechProviders only. Public docs: docs/plugins/reference/azure-speech.md. (docs/plugins/reference/azure-speech.md:19, 27f2d616581e)
Inventory check covers generated docs: The plugin inventory checker compares generated docs against manifest metadata and exits stale with instructions to run pnpm plugins:inventory:gen. (scripts/generate-plugin-inventory-doc.mjs:615, 678323d01390)

Likely related people:

steipete: Introduced the Azure Speech provider, added the generated plugin reference pages, and worked on the realtime transcription SDK seam that this PR extends. (role: feature-history owner; confidence: high; commits: 5b80d0c15e87, 2244ba87b362, 0e7bcf7588d2; files: extensions/azure-speech/index.ts, extensions/azure-speech/openclaw.plugin.json, docs/plugins/plugin-inventory.md)
vincentkoc: The provided review context and timeline route this maintainer into the Voice Call/realtime transcription review area, though the strongest concrete local history for this exact patch points to steipete. (role: adjacent owner; confidence: low; files: extensions/voice-call/src/webhook.ts, src/realtime-transcription/provider-registry.ts, extensions/azure-speech/openclaw.plugin.json)

Remaining risk / open question:

External Azure Speech realtime behavior remains unproven from PR materials; the live test exists but no successful live run output or redacted logs are attached.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9fa685e3b3e4.

Register Azure Speech as a realtime transcription provider, joining Deepgram, ElevenLabs, Mistral, OpenAI, and xAI in the bundled provider list. Voice Call streaming can now select `azure-speech` (or alias `azure`) instead of being limited to non-Microsoft transcription backends. The provider uses the official `microsoft-cognitiveservices-speech-sdk` package over Azure's documented continuous-recognition WebSocket protocol, which already handles partial results, automatic reconnects, and final-utterance detection. The SDK is loaded lazily on first use, so installations that only use Azure Speech TTS pay no extra startup cost. Configuration lives under `plugins.entries.voice-call.config.streaming.providers.azure-speech.*` and reuses the existing `AZURE_SPEECH_KEY` / `AZURE_SPEECH_REGION` / `SPEECH_KEY` / `SPEECH_REGION` env-var fallbacks. New options: - `language` (default en-US, BCP-47) - `sampleRate` (default 8000, matches telephony media streams) - `encoding` (`pcm` / `mulaw` / `alaw`, default mulaw) - `initialSilenceTimeoutMs` / `endSilenceTimeoutMs` - `candidateLanguages` (auto language detection) - `endpoint` (sovereign cloud / private endpoint) Tests: 33 unit tests cover config normalization (env fallbacks, provider sub-config, encoding aliases), the registry contract (`isConfigured` / `createSession`), session lifecycle (connect via subscription/endpoint/auto-detect, partial vs final transcripts, NoMatch handling, error propagation, audio overflow, lazy connect, graceful close). A live integration test synthesizes telephony audio with the existing TTS provider and feeds it back through the new STT provider end-to-end (skipped without `AZURE_SPEECH_KEY`). Docs: `docs/providers/azure-speech.md` adds a STT section and config table; `docs/plugins/voice-call.md` adds an Azure Speech tab to the streaming provider examples.

… isConfigured Address Greptile review on PR openclaw#73456: P1 — first audio frame dropped on lazy connect: When sendAudio() ran before connect() resolved, the inner sendAudio was invoked synchronously while pushStream was still undefined, triggering 'push stream is not initialized' onError and silently discarding the triggering frame. Now buffer pending audio frames (capped at AZURE_SPEECH_REALTIME_MAX_QUEUED_BYTES), kick off connect() once, and flush them after the push stream is ready. The lazy-connect lifecycle test now also asserts the buffered frame is delivered to the push stream and that no unexpected onError fires. P2 — redundant re-normalization in isConfigured: providerConfig is already the resolved output of resolveConfig, so use the typed AzureSpeechRealtimeProviderConfig fields directly instead of calling normalize again.

ottodeng · 2026-04-29T02:22:10Z

Thanks for the review! Pushed 869a3237 addressing both Greptile findings:

P1 — first audio frame dropped on lazy connect. Now buffer pending audio frames in a small queue (capped at the existing AZURE_SPEECH_REALTIME_MAX_QUEUED_BYTES budget so it can't grow unbounded), kick off connect() once, and flush them after the push stream is ready. The lazy-connect lifecycle test now asserts the triggering frame actually reaches pushStream.write and that onError does not fire — so the silent-drop / spurious onError path is regression-tested.

P2 — redundant re-normalization in isConfigured. providerConfig is already the resolved output of resolveConfig, so it now uses the typed AzureSpeechRealtimeProviderConfig fields directly.

All 33 azure-speech unit tests still pass locally. The remaining red CI checks on this PR (build-artifacts, checks-fast-bundled, checks-node-agentic-*, build-smoke, etc.) look like pre-existing main instability — e.g. doctor-bundled-plugin-runtime-deps.test.ts reports missing grammy@1.37.0 for telegram, which is unrelated to azure-speech. Happy to rebase onto a green main if it would help, but the rebase across the recent large refactor commit is fairly conflict-heavy, so I'd prefer to wait for maintainer guidance before doing that.

…ealtime-stt # Conflicts: # docs/.generated/config-baseline.sha256

Resolve conflict in docs/.generated/config-baseline.sha256 by taking the regenerated baseline from main; this PR does not introduce new config schema rows so the latest main hashes are correct.

…ealtime-stt

Voice Call invokes session.connect() in the background, which awaits the async SDK load and recognizer start. If the websocket closes during that window, close() previously set closing=true but the in-flight chain continued to allocate the push stream and recognizer, leaving them and the upstream Azure socket alive past close(). Now the connect chain checks closing after each await and tears down any freshly created push stream / recognizer / started recognition. teardown() also bails when connect() is still pending and nothing has been allocated yet, leaving cleanup to the connect chain itself.

ottodeng · 2026-05-02T03:29:17Z

Addressed the three P2 clawsweeper findings:

1. Abort connection when session is closed (realtime-transcription-session.ts)
The connect chain now checks closing after each await and tears down anything it allocated:

After loadSdk() resolves (bail early, no Azure resources allocated)
After createPushStream() (close push stream, undefined refs)
After recognizer creation (close recognizer + push stream)
After startContinuousRecognitionAsync resolves (call stopContinuousRecognitionAsync + close)

teardown() now bails when connect() is still pending and nothing has been allocated yet, leaving cleanup to the in-flight chain. Added 2 unit tests covering close-during-loadSdk and close-during-start. Commit a79665d.

2. Regenerated config-doc baseline — pnpm config:docs:gen updated 3 of 4 hashes (config-baseline.json, config-baseline.core.json, config-baseline.plugin.json) to reflect the merged-from-main schema. pnpm config:docs:check is green. Commit 75ac033.

3. CHANGELOG entry added under Unreleased \u203a Changes. Commit 6b1de3f.

Validation

extensions/azure-speech vitest: 45/45 passing (4 files, 5.47s) including the 2 new lifecycle tests
pnpm check:changed: clean (typecheck + lint + import cycles + sidecar loaders + duplicate-scan + wildcard-reexport guards all green)
pnpm config:docs:check: OK

HEAD: 6b1de3f

…ealtime-stt # Conflicts: # CHANGELOG.md # docs/.generated/config-baseline.sha256

…ealtime-stt # Conflicts: # CHANGELOG.md

ottodeng · 2026-05-03T00:45:56Z

Merged latest upstream/main (CHANGELOG conflict resolved by keeping both sides). Re-pushed.

…installs Address clawsweeper [P1]: extension-local declaration of microsoft-cognitiveservices-speech-sdk does not survive packaged bundled installs (postinstall does not install plugin package dependencies, bundled runtime staging skips plugin node_modules). Hoisting the SDK to the root manifest makes it resolvable from packaged installs while keeping the provider extension-owned. The dependency stays a runtime dependency (not devDependencies) since the Azure Speech provider lazily loads it at provider-runtime time when realtime transcription is requested.

ottodeng · 2026-05-03T05:15:57Z

Addressed clawsweeper [P1] in a1e47f6a94:

Bug: microsoft-cognitiveservices-speech-sdk was declared only at the extension level (extensions/azure-speech/package.json). For packaged/bundled installs the postinstall path does not install plugin package dependencies, and bundled runtime staging skips plugin node_modules, so the lazy-loaded SDK would fail to resolve at provider runtime when Voice Call requested realtime transcription.

Surface: bundled distribution installs (NPM tarball, Docker image, macOS bundle) loading the Azure Speech extension's realtime transcription provider.

Fix: hoisted microsoft-cognitiveservices-speech-sdk: ^1.49.0 into the root package.json dependencies block (same version range the extension already declared). Kept it as a runtime dependency (not devDependencies) because the provider lazy-loads it at runtime, not at build time. Lockfile regenerated via pnpm install --lockfile-only.

Why best: the alternative (moving Azure Speech behind an explicit downloadable plugin install path) is a larger architectural change that affects all SDK-heavy providers; hoisting the dep matches the existing pattern used by @aws-sdk/client-bedrock, @google/genai, and other provider SDKs already declared at root. Keeps the provider extension-owned per the core/extension boundary rule.

…ealtime-stt # Conflicts: # CHANGELOG.md

ottodeng · 2026-05-05T11:47:41Z

Closing — this grew into an XL change (size: XL, +1818) without the Real behavior proof maintainers expect for external PRs of this scope. Will rework as a smaller, scoped PR (provider config + minimal wiring) with end-to-end transcription evidence on a real device before reopening.

openclaw-barnacle Bot added docs Improvements or additions to documentation plugin: azure-speech Azure Speech plugin size: XL labels Apr 28, 2026

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

This was referenced Apr 28, 2026

fix(gateway/command-auth): memoize ownerAllowFrom list per raw array (#50289) #73440

Open

fix(whatsapp): detect group @mentions when self is in allowFrom (#49317) #73453

Merged

ottodeng force-pushed the feat/azure-speech-realtime-stt branch from 1739205 to dd7626b Compare April 28, 2026 14:51

ottodeng force-pushed the feat/azure-speech-realtime-stt branch from 472d372 to 879e61c Compare April 28, 2026 18:06

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

065f0e1

…ealtime-stt # Conflicts: # docs/.generated/config-baseline.sha256

openclaw-barnacle Bot added channel: line Channel integration: line commands Command implementations labels Apr 29, 2026

BradGroux mentioned this pull request Apr 29, 2026

WORKING: All Microsoft Issues and PRs (refresh) #74163

Draft

Merge branch 'main' into feat/azure-speech-realtime-stt

6599de5

Resolve conflict in docs/.generated/config-baseline.sha256 by taking the regenerated baseline from main; this PR does not introduce new config schema rows so the latest main hashes are correct.

github-actions Bot mentioned this pull request Apr 30, 2026

🦞 OpenClaw Ecosystem Digest 2026-04-30 borq168/big_model_radar#84

Open

ottodeng added 4 commits May 2, 2026 11:10

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

6a506cf

…ealtime-stt

chore(azure-speech): regenerate config-doc baseline

75ac033

docs(changelog): note Azure Speech realtime transcription provider

6b1de3f

ottodeng added 2 commits May 2, 2026 20:58

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

7cd11c6

…ealtime-stt # Conflicts: # CHANGELOG.md # docs/.generated/config-baseline.sha256

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

20ccf3a

…ealtime-stt # Conflicts: # CHANGELOG.md

openclaw-barnacle Bot removed the commands Command implementations label May 3, 2026

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

59c7fc5

…ealtime-stt # Conflicts: # CHANGELOG.md

openclaw-barnacle Bot added the agents Agent runtime and tooling label May 3, 2026

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

07e1459

…ealtime-stt # Conflicts: # CHANGELOG.md

openclaw-barnacle Bot added the commands Command implementations label May 4, 2026

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

8bde0bc

…ealtime-stt # Conflicts: # CHANGELOG.md

openclaw-barnacle Bot removed commands Command implementations agents Agent runtime and tooling labels May 4, 2026

ottodeng force-pushed the feat/azure-speech-realtime-stt branch from 75ba3ff to 07e2403 Compare May 4, 2026 20:36

chore: sync upstream/main, resolve CHANGELOG

a8809d3

ottodeng force-pushed the feat/azure-speech-realtime-stt branch from 07e2403 to a8809d3 Compare May 4, 2026 22:41

ottodeng added 2 commits May 5, 2026 10:49

chore: sync upstream/main, resolve CHANGELOG

c2babdd

Merge remote-tracking branch 'upstream/main' into feat/azure-speech-r…

27f2d61

…ealtime-stt # Conflicts: # CHANGELOG.md

openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 5, 2026

ottodeng closed this May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(azure-speech): add realtime transcription provider for voice-call#73456

feat(azure-speech): add realtime transcription provider for voice-call#73456
ottodeng wants to merge 17 commits intoopenclaw:mainfrom
ottodeng:feat/azure-speech-realtime-stt

ottodeng commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

ottodeng commented Apr 29, 2026

Uh oh!

ottodeng commented May 2, 2026

Uh oh!

ottodeng commented May 3, 2026

Uh oh!

ottodeng commented May 3, 2026

Uh oh!

ottodeng commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ottodeng commented Apr 28, 2026

Why

What

Why use the official SDK and not a hand-rolled WebSocket?

Configuration

Tests

Docs

Related

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

clawsweeper Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ottodeng commented Apr 29, 2026

Uh oh!

ottodeng commented May 2, 2026

Uh oh!

ottodeng commented May 3, 2026

Uh oh!

ottodeng commented May 3, 2026

Uh oh!

ottodeng commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading