Skip to content

feat(azure-speech): add realtime transcription provider for voice-call#73456

Closed
ottodeng wants to merge 17 commits intoopenclaw:mainfrom
ottodeng:feat/azure-speech-realtime-stt
Closed

feat(azure-speech): add realtime transcription provider for voice-call#73456
ottodeng wants to merge 17 commits intoopenclaw:mainfrom
ottodeng:feat/azure-speech-realtime-stt

Conversation

@ottodeng
Copy link
Copy Markdown
Contributor

Why

OpenClaw's voice-call plugin can stream live call audio to a realtime
transcription provider, and it already has a clean pluggable provider
interface (added by #68697). Today five providers are
registered out of the box — Deepgram, ElevenLabs, Mistral, OpenAI, and
xAI — but Azure Speech is missing, even though OpenClaw already ships an
azure-speech plugin for TTS. That forces users who want to standardize on
Microsoft's speech stack (or who already have Azure Speech keys provisioned)
to depend on a non-Microsoft service for STT.

This PR closes that gap by registering Azure Speech as a realtime
transcription provider and reusing the existing azure-speech plugin's
config and env-var fallbacks.

What

  • New provider in extensions/azure-speech/:
    • realtime-transcription-provider.ts — registry contract, config
      normalization, and provider builder.
    • realtime-transcription-session.ts — the recognizer session: lazy
      connect, partial/final transcript routing, NoMatch handling, error
      propagation, audio queueing with overflow protection, idempotent
      teardown.
    • realtime-transcription-types.ts — minimal structural typing of the
      microsoft-cognitiveservices-speech-sdk API surface used here.
  • index.ts calls api.registerRealtimeTranscriptionProvider(...) next to
    the existing api.registerSpeechProvider(...).
  • openclaw.plugin.json declares the new
    realtimeTranscriptionProviders: ["azure-speech", "azure"] contract and
    documents the new schema fields.
  • package.json adds microsoft-cognitiveservices-speech-sdk as a runtime
    dependency. The SDK is lazy-loaded on first session creation, so users
    who only use Azure Speech TTS pay no extra startup cost.

Why use the official SDK and not a hand-rolled WebSocket?

Azure Speech's continuous-recognition wire protocol (USP) is not publicly
documented. The official SDK already handles the parts that matter for a
streaming use case — connection setup, USP framing, partial vs final
results, automatic reconnects, end-of-utterance detection — and is the
pattern Microsoft's docs recommend. Lazy import keeps the cost off the
critical path for non-STT users (mirrors the pattern used by the existing
microsoft-speech plugin with node-edge-tts).

Configuration

Reuses the existing AZURE_SPEECH_KEY / AZURE_SPEECH_API_KEY /
SPEECH_KEY and AZURE_SPEECH_REGION / SPEECH_REGION env-var fallbacks.
Voice Call config:

{
  plugins: {
    entries: {
      "voice-call": {
        config: {
          streaming: {
            enabled: true,
            provider: "azure-speech",
            streamPath: "/voice/stream",
            providers: {
              "azure-speech": {
                apiKey: "${AZURE_SPEECH_KEY}",
                region: "${AZURE_SPEECH_REGION}", // e.g. eastus, southeastasia
                language: "en-US",
                encoding: "mulaw",                 // pcm | mulaw | alaw
                sampleRate: 8000,                  // matches Twilio media stream
                endSilenceTimeoutMs: 800,
                // Optional Azure auto language detection
                // candidateLanguages: ["en-US", "zh-CN"],
              },
            },
          },
        },
      },
    },
  },
}

Sovereign clouds and private endpoints can use endpoint instead of
region.

Tests

  • 33 unit tests, all green:
    • Config normalization: env fallbacks, providers.<id> sub-config,
      encoding aliases (linear16pcm, g711_ulawmulaw, etc.),
      invalid encoding rejection, candidate-language parsing.
    • Registry contract: id / aliases, isConfigured for the
      region+key, endpoint+key, missing-key, and missing-location cases,
      createSession error messages.
    • Session lifecycle: connect via fromSubscription and fromEndpoint,
      auto-detect via SpeechRecognizer.FromConfig, partial/final
      transcripts, NoMatch handling, error vs EndOfStream cancellation,
      start-failure rejection, audio buffer overflow, lazy connect on
      first audio frame, idempotent close, ignored-after-close audio.
  • Live integration test in azure-speech.live.test.ts (skipped
    without AZURE_SPEECH_KEY/AZURE_SPEECH_REGION): synthesizes a short
    µ-law clip with the existing TTS provider and feeds it through the new
    STT provider in 20 ms frames, verifying the round-trip transcript.
  • Full pnpm vitest run extensions/azure-speech/ extensions/voice-call/:
    328 tests across 43 files, all green.
  • pnpm tsgo:extensions, pnpm tsgo:extensions:test,
    pnpm lint:extensions, pnpm config:docs:check,
    pnpm config:schema:check, pnpm check:loc: all clean.

Docs

  • docs/providers/azure-speech.md — adds the realtime transcription config
    table, a "Realtime transcription" accordion entry, and updates the page
    summary.
  • docs/plugins/voice-call.md — adds Azure Speech to the bundled
    realtime transcription provider list and a full Azure Speech tab in the
    streaming provider examples.

Related

  • Builds on the realtime transcription provider registry (Pluggable STT Providers for voice-call Plugin #68697,
    commit c866820fed0).
  • Complements the existing Azure Speech TTS provider so a user can run
    outbound voice notes and inbound transcription against the same Azure
    Speech resource.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation plugin: azure-speech Azure Speech plugin size: XL labels Apr 28, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR adds Azure Speech as a realtime transcription provider for the voice-call plugin, reusing existing credentials/config patterns and lazy-loading the SDK. The implementation is well-structured overall, but there is one data-loss bug in the lazy-connect path.

  • P1 — First audio frame dropped on lazy connect (realtime-transcription-session.ts:281–289): sendAudio fires connect() asynchronously and immediately calls the inner sendAudio in the same tick. Since pushStream is always undefined at that point, the guard triggers onError(\"Azure Speech push stream is not initialized\") and discards the audio. The advertised lazy-connect behavior silently eats the triggering frame. The corresponding test only asserts startSpy was called, not that the audio arrived.

Confidence Score: 3/5

Not safe to merge as-is: lazy connect silently drops the first audio frame and fires an unexpected onError on every session that skips explicit connect().

One P1 data-loss bug (first audio frame discarded on lazy connect) with no test coverage catching it, capping confidence at 4; the pattern affects a core advertised feature path, pulling it down to 3.

extensions/azure-speech/realtime-transcription-session.ts (lazy-connect sendAudio race), extensions/azure-speech/realtime-transcription-lifecycle.test.ts (missing assertions on audio delivery and onError in lazy-connect test)

Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/azure-speech/realtime-transcription-session.ts
Line: 281-289

Comment:
**First audio frame dropped on lazy connect**

When `sendAudio` is invoked before `connect()` has resolved, it fires `connect()` asynchronously and then immediately calls the inner `sendAudio`. Because `pushStream` is still `undefined` at that synchronous moment, the inner guard triggers:

```ts
if (!pushStream) {
  handleError(new Error("Azure Speech push stream is not initialized"));
  return;
}
```

The audio from that triggering frame is silently discarded and `onError` fires unexpectedly on every lazy-connect caller. The test at line 353 (`lazily connects on the first audio frame`) only asserts that `startSpy` was called; it never checks `pushStream.writeSpy` or `onError`, so this data-loss path goes undetected.

If parity with sibling providers requires queuing audio until the stream is ready, the first frame should be buffered and flushed once `connect()` resolves rather than being forwarded before `pushStream` exists.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/azure-speech/realtime-transcription-provider.ts
Line: 217-223

Comment:
**Redundant re-normalization inside `isConfigured`**

`providerConfig` passed to `isConfigured` is already the resolved output of `resolveConfig` (which returns an `AzureSpeechRealtimeProviderConfig`). Calling `normalizeAzureSpeechRealtimeProviderConfig` on it again is wasteful and relies on the normalization being idempotent. The check can use the typed fields directly:

```ts
isConfigured: ({ providerConfig }) => {
  const cfg = providerConfig as AzureSpeechRealtimeProviderConfig;
  if (!cfg.apiKey) return false;
  return Boolean(cfg.endpoint || cfg.region);
},
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(azure-speech): add realtime transcr..." | Re-trigger Greptile

Comment on lines +281 to +289
sendAudio(audio: Buffer) {
if (!connected && !connecting) {
// Lazy connect on first audio frame for parity with sibling providers.
connect().catch((error) =>
handleError(error instanceof Error ? error : new Error(String(error))),
);
}
sendAudio(audio);
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 First audio frame dropped on lazy connect

When sendAudio is invoked before connect() has resolved, it fires connect() asynchronously and then immediately calls the inner sendAudio. Because pushStream is still undefined at that synchronous moment, the inner guard triggers:

if (!pushStream) {
  handleError(new Error("Azure Speech push stream is not initialized"));
  return;
}

The audio from that triggering frame is silently discarded and onError fires unexpectedly on every lazy-connect caller. The test at line 353 (lazily connects on the first audio frame) only asserts that startSpy was called; it never checks pushStream.writeSpy or onError, so this data-loss path goes undetected.

If parity with sibling providers requires queuing audio until the stream is ready, the first frame should be buffered and flushed once connect() resolves rather than being forwarded before pushStream exists.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/azure-speech/realtime-transcription-session.ts
Line: 281-289

Comment:
**First audio frame dropped on lazy connect**

When `sendAudio` is invoked before `connect()` has resolved, it fires `connect()` asynchronously and then immediately calls the inner `sendAudio`. Because `pushStream` is still `undefined` at that synchronous moment, the inner guard triggers:

```ts
if (!pushStream) {
  handleError(new Error("Azure Speech push stream is not initialized"));
  return;
}
```

The audio from that triggering frame is silently discarded and `onError` fires unexpectedly on every lazy-connect caller. The test at line 353 (`lazily connects on the first audio frame`) only asserts that `startSpy` was called; it never checks `pushStream.writeSpy` or `onError`, so this data-loss path goes undetected.

If parity with sibling providers requires queuing audio until the stream is ready, the first frame should be buffered and flushed once `connect()` resolves rather than being forwarded before `pushStream` exists.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +217 to +223
isConfigured: ({ providerConfig }) => {
const normalized = normalizeAzureSpeechRealtimeProviderConfig(providerConfig);
if (!normalized.apiKey) {
return false;
}
return Boolean(normalized.endpoint || normalized.region);
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant re-normalization inside isConfigured

providerConfig passed to isConfigured is already the resolved output of resolveConfig (which returns an AzureSpeechRealtimeProviderConfig). Calling normalizeAzureSpeechRealtimeProviderConfig on it again is wasteful and relies on the normalization being idempotent. The check can use the typed fields directly:

isConfigured: ({ providerConfig }) => {
  const cfg = providerConfig as AzureSpeechRealtimeProviderConfig;
  if (!cfg.apiKey) return false;
  return Boolean(cfg.endpoint || cfg.region);
},
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/azure-speech/realtime-transcription-provider.ts
Line: 217-223

Comment:
**Redundant re-normalization inside `isConfigured`**

`providerConfig` passed to `isConfigured` is already the resolved output of `resolveConfig` (which returns an `AzureSpeechRealtimeProviderConfig`). Calling `normalizeAzureSpeechRealtimeProviderConfig` on it again is wasteful and relies on the normalization being idempotent. The check can use the typed fields directly:

```ts
isConfigured: ({ providerConfig }) => {
  const cfg = providerConfig as AzureSpeechRealtimeProviderConfig;
  if (!cfg.apiKey) return false;
  return Boolean(cfg.endpoint || cfg.region);
},
```

How can I resolve this? If you propose a fix, please make it concise.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: needs real behavior proof before merge.

Summary
The PR adds an Azure Speech realtime transcription provider to the azure-speech plugin, including provider/session code, config/docs/tests, and the Microsoft Speech SDK runtime dependency.

Reproducibility: not applicable. this is a feature PR, not a bug report. The merge blocker is source-reproducible by comparing the PR head manifest with the PR head generated plugin docs, and real behavior proof is absent from the discussion.

Real behavior proof
Needs real behavior proof before merge: Missing: the PR includes unit tests and a skipped-unless-env live test definition, but no after-fix terminal output, logs, screenshot, or linked artifact showing Azure Speech realtime transcription working.

Next step before merge
Contributor action is needed for real behavior proof, and automation should not run a repair loop on this PR while that external proof gate is unsatisfied.

Security
Cleared: Cleared: the diff adds the official Microsoft Speech SDK dependency with lockfile entries and does not touch workflows, permissions, scripts, secret handling, or package sources in a concerning way.

Review findings

  • [P2] Regenerate the plugin inventory docs — extensions/azure-speech/openclaw.plugin.json:29
Review details

Best possible solution:

Keep Azure STT in the azure-speech plugin and merge after the generated plugin docs are regenerated and the contributor attaches redacted live transcription output or logs from a real Azure Speech run.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a feature PR, not a bug report. The merge blocker is source-reproducible by comparing the PR head manifest with the PR head generated plugin docs, and real behavior proof is absent from the discussion.

Is this the best way to solve the issue?

Mostly yes: the existing realtime transcription registry and azure-speech plugin are the right boundary for this provider. The current patch is not merge-ready until generated plugin docs are synchronized and after-fix real behavior proof is supplied.

Full review comments:

  • [P2] Regenerate the plugin inventory docs — extensions/azure-speech/openclaw.plugin.json:29
    Adding realtimeTranscriptionProviders changes the generated plugin surface, but the PR head still publishes Azure Speech as contracts: speechProviders only in the plugin inventory and generated Azure Speech reference page. pnpm plugins:inventory:check compares these docs to manifest metadata and will fail until pnpm plugins:inventory:gen is run and committed.
    Confidence: 0.97

Overall correctness: patch is incorrect
Overall confidence: 0.92

Acceptance criteria:

  • pnpm plugins:inventory:check
  • pnpm config:docs:check
  • pnpm test extensions/azure-speech
  • OPENCLAW_LIVE_TEST=1 pnpm test extensions/azure-speech/azure-speech.live.test.ts

What I checked:

Likely related people:

  • steipete: Introduced the Azure Speech provider, added the generated plugin reference pages, and worked on the realtime transcription SDK seam that this PR extends. (role: feature-history owner; confidence: high; commits: 5b80d0c15e87, 2244ba87b362, 0e7bcf7588d2; files: extensions/azure-speech/index.ts, extensions/azure-speech/openclaw.plugin.json, docs/plugins/plugin-inventory.md)
  • vincentkoc: The provided review context and timeline route this maintainer into the Voice Call/realtime transcription review area, though the strongest concrete local history for this exact patch points to steipete. (role: adjacent owner; confidence: low; files: extensions/voice-call/src/webhook.ts, src/realtime-transcription/provider-registry.ts, extensions/azure-speech/openclaw.plugin.json)

Remaining risk / open question:

  • External Azure Speech realtime behavior remains unproven from PR materials; the live test exists but no successful live run output or redacted logs are attached.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9fa685e3b3e4.

Register Azure Speech as a realtime transcription provider, joining
Deepgram, ElevenLabs, Mistral, OpenAI, and xAI in the bundled provider
list. Voice Call streaming can now select `azure-speech` (or alias
`azure`) instead of being limited to non-Microsoft transcription
backends.

The provider uses the official `microsoft-cognitiveservices-speech-sdk`
package over Azure's documented continuous-recognition WebSocket
protocol, which already handles partial results, automatic reconnects,
and final-utterance detection. The SDK is loaded lazily on first use,
so installations that only use Azure Speech TTS pay no extra startup
cost.

Configuration lives under
`plugins.entries.voice-call.config.streaming.providers.azure-speech.*`
and reuses the existing `AZURE_SPEECH_KEY` / `AZURE_SPEECH_REGION` /
`SPEECH_KEY` / `SPEECH_REGION` env-var fallbacks. New options:

  - `language` (default en-US, BCP-47)
  - `sampleRate` (default 8000, matches telephony media streams)
  - `encoding` (`pcm` / `mulaw` / `alaw`, default mulaw)
  - `initialSilenceTimeoutMs` / `endSilenceTimeoutMs`
  - `candidateLanguages` (auto language detection)
  - `endpoint` (sovereign cloud / private endpoint)

Tests: 33 unit tests cover config normalization (env fallbacks,
provider sub-config, encoding aliases), the registry contract
(`isConfigured` / `createSession`), session lifecycle (connect via
subscription/endpoint/auto-detect, partial vs final transcripts,
NoMatch handling, error propagation, audio overflow, lazy connect,
graceful close). A live integration test synthesizes telephony audio
with the existing TTS provider and feeds it back through the new STT
provider end-to-end (skipped without `AZURE_SPEECH_KEY`).

Docs: `docs/providers/azure-speech.md` adds a STT section and config
table; `docs/plugins/voice-call.md` adds an Azure Speech tab to the
streaming provider examples.
@ottodeng ottodeng force-pushed the feat/azure-speech-realtime-stt branch from 472d372 to 879e61c Compare April 28, 2026 18:06
… isConfigured

Address Greptile review on PR openclaw#73456:

P1 — first audio frame dropped on lazy connect:
  When sendAudio() ran before connect() resolved, the inner sendAudio
  was invoked synchronously while pushStream was still undefined,
  triggering 'push stream is not initialized' onError and silently
  discarding the triggering frame. Now buffer pending audio frames
  (capped at AZURE_SPEECH_REALTIME_MAX_QUEUED_BYTES), kick off
  connect() once, and flush them after the push stream is ready.

  The lazy-connect lifecycle test now also asserts the buffered frame
  is delivered to the push stream and that no unexpected onError
  fires.

P2 — redundant re-normalization in isConfigured:
  providerConfig is already the resolved output of resolveConfig, so
  use the typed AzureSpeechRealtimeProviderConfig fields directly
  instead of calling normalize again.
@ottodeng
Copy link
Copy Markdown
Contributor Author

Thanks for the review! Pushed 869a3237 addressing both Greptile findings:

P1 — first audio frame dropped on lazy connect. Now buffer pending audio frames in a small queue (capped at the existing AZURE_SPEECH_REALTIME_MAX_QUEUED_BYTES budget so it can't grow unbounded), kick off connect() once, and flush them after the push stream is ready. The lazy-connect lifecycle test now asserts the triggering frame actually reaches pushStream.write and that onError does not fire — so the silent-drop / spurious onError path is regression-tested.

P2 — redundant re-normalization in isConfigured. providerConfig is already the resolved output of resolveConfig, so it now uses the typed AzureSpeechRealtimeProviderConfig fields directly.

All 33 azure-speech unit tests still pass locally. The remaining red CI checks on this PR (build-artifacts, checks-fast-bundled, checks-node-agentic-*, build-smoke, etc.) look like pre-existing main instability — e.g. doctor-bundled-plugin-runtime-deps.test.ts reports missing grammy@1.37.0 for telegram, which is unrelated to azure-speech. Happy to rebase onto a green main if it would help, but the rebase across the recent large refactor commit is fairly conflict-heavy, so I'd prefer to wait for maintainer guidance before doing that.

…ealtime-stt

# Conflicts:
#	docs/.generated/config-baseline.sha256
@openclaw-barnacle openclaw-barnacle Bot added channel: line Channel integration: line commands Command implementations labels Apr 29, 2026
Resolve conflict in docs/.generated/config-baseline.sha256 by taking the
regenerated baseline from main; this PR does not introduce new config
schema rows so the latest main hashes are correct.
ottodeng added 4 commits May 2, 2026 11:10
Voice Call invokes session.connect() in the background, which awaits the
async SDK load and recognizer start. If the websocket closes during that
window, close() previously set closing=true but the in-flight chain
continued to allocate the push stream and recognizer, leaving them and
the upstream Azure socket alive past close().

Now the connect chain checks closing after each await and tears down any
freshly created push stream / recognizer / started recognition. teardown()
also bails when connect() is still pending and nothing has been
allocated yet, leaving cleanup to the connect chain itself.
@ottodeng
Copy link
Copy Markdown
Contributor Author

ottodeng commented May 2, 2026

Addressed the three P2 clawsweeper findings:

1. Abort connection when session is closed (realtime-transcription-session.ts)
The connect chain now checks closing after each await and tears down anything it allocated:

  • After loadSdk() resolves (bail early, no Azure resources allocated)
  • After createPushStream() (close push stream, undefined refs)
  • After recognizer creation (close recognizer + push stream)
  • After startContinuousRecognitionAsync resolves (call stopContinuousRecognitionAsync + close)

teardown() now bails when connect() is still pending and nothing has been allocated yet, leaving cleanup to the in-flight chain. Added 2 unit tests covering close-during-loadSdk and close-during-start. Commit a79665d.

2. Regenerated config-doc baselinepnpm config:docs:gen updated 3 of 4 hashes (config-baseline.json, config-baseline.core.json, config-baseline.plugin.json) to reflect the merged-from-main schema. pnpm config:docs:check is green. Commit 75ac033.

3. CHANGELOG entry added under Unreleased \u203a Changes. Commit 6b1de3f.

Validation

  • extensions/azure-speech vitest: 45/45 passing (4 files, 5.47s) including the 2 new lifecycle tests
  • pnpm check:changed: clean (typecheck + lint + import cycles + sidecar loaders + duplicate-scan + wildcard-reexport guards all green)
  • pnpm config:docs:check: OK

HEAD: 6b1de3f

ottodeng added 2 commits May 2, 2026 20:58
…ealtime-stt

# Conflicts:
#	CHANGELOG.md
#	docs/.generated/config-baseline.sha256
@ottodeng
Copy link
Copy Markdown
Contributor Author

ottodeng commented May 3, 2026

Merged latest upstream/main (CHANGELOG conflict resolved by keeping both sides). Re-pushed.

@openclaw-barnacle openclaw-barnacle Bot removed the commands Command implementations label May 3, 2026
…installs

Address clawsweeper [P1]: extension-local declaration of microsoft-cognitiveservices-speech-sdk does not survive packaged bundled installs (postinstall does not install plugin package dependencies, bundled runtime staging skips plugin node_modules). Hoisting the SDK to the root manifest makes it resolvable from packaged installs while keeping the provider extension-owned.

The dependency stays a runtime dependency (not devDependencies) since the Azure Speech provider lazily loads it at provider-runtime time when realtime transcription is requested.
@ottodeng
Copy link
Copy Markdown
Contributor Author

ottodeng commented May 3, 2026

Addressed clawsweeper [P1] in a1e47f6a94:

Bug: microsoft-cognitiveservices-speech-sdk was declared only at the extension level (extensions/azure-speech/package.json). For packaged/bundled installs the postinstall path does not install plugin package dependencies, and bundled runtime staging skips plugin node_modules, so the lazy-loaded SDK would fail to resolve at provider runtime when Voice Call requested realtime transcription.

Surface: bundled distribution installs (NPM tarball, Docker image, macOS bundle) loading the Azure Speech extension's realtime transcription provider.

Fix: hoisted microsoft-cognitiveservices-speech-sdk: ^1.49.0 into the root package.json dependencies block (same version range the extension already declared). Kept it as a runtime dependency (not devDependencies) because the provider lazy-loads it at runtime, not at build time. Lockfile regenerated via pnpm install --lockfile-only.

Why best: the alternative (moving Azure Speech behind an explicit downloadable plugin install path) is a larger architectural change that affects all SDK-heavy providers; hoisting the dep matches the existing pattern used by @aws-sdk/client-bedrock, @google/genai, and other provider SDKs already declared at root. Keeps the provider extension-owned per the core/extension boundary rule.

@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label May 3, 2026
@openclaw-barnacle openclaw-barnacle Bot added the commands Command implementations label May 4, 2026
@openclaw-barnacle openclaw-barnacle Bot removed commands Command implementations agents Agent runtime and tooling labels May 4, 2026
@ottodeng ottodeng force-pushed the feat/azure-speech-realtime-stt branch from 75ba3ff to 07e2403 Compare May 4, 2026 20:36
@ottodeng ottodeng force-pushed the feat/azure-speech-realtime-stt branch from 07e2403 to a8809d3 Compare May 4, 2026 22:41
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 5, 2026
@ottodeng
Copy link
Copy Markdown
Contributor Author

ottodeng commented May 5, 2026

Closing — this grew into an XL change (size: XL, +1818) without the Real behavior proof maintainers expect for external PRs of this scope. Will rework as a smaller, scoped PR (provider config + minimal wiring) with end-to-end transcription evidence on a real device before reopening.

@ottodeng ottodeng closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: line Channel integration: line docs Improvements or additions to documentation plugin: azure-speech Azure Speech plugin size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant