[Feature Request] Add server-side STT option for webchat voice input

## Summary

Add a server-side speech-to-text option for webchat, allowing voice input to be processed by local Whisper (or other configured STT backends) instead of relying on browser's Web Speech API.

## Problem

The current webchat voice input uses the browser's native `SpeechRecognition` API (Web Speech API), which has significant limitations:

- **Safari**: Limited/broken support, especially on macOS
- **Privacy**: Chrome's Web Speech API sends audio to Google servers
- **Network dependency**: Doesn't work offline or in restricted network environments
- **Inconsistent**: Behavior varies across browsers and versions

## Proposed Solution

Add a second voice input button (or toggle) that:

1. Uses `MediaRecorder` API to capture audio locally (widely supported, including Safari)
2. Sends the audio blob to Gateway via WebSocket or HTTP upload
3. Gateway transcribes using the configured `tools.media.audio` backend (e.g., local Whisper)
4. Returns transcribed text to the input field

### UI Suggestion

- Keep existing browser STT button (for users who prefer it)
- Add new "Server STT" button with distinct icon (e.g., server + mic)
- Or: single button with config option to choose backend

### Backend

Already have `tools.media.audio` config that supports local Whisper CLI:

```json5
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          { type: "cli", command: "whisper", args: ["{{MediaPath}}", "--model", "tiny", "--language", "zh"] }
        ]
      }
    }
  }
}
```

Just need a new endpoint to accept audio uploads from webchat and return transcription.

## Benefits

- Works consistently across all browsers (Safari, Firefox, Chrome)
- Privacy-friendly (audio stays on user's server)
- Works offline/air-gapped
- Leverages existing media understanding infrastructure
- Users can choose accuracy vs speed (tiny/base/medium models)

## Environment

- OpenClaw version: 2026.3.8
- Browser: Safari (macOS)
- Server: Ubuntu ARM64 with Whisper installed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add server-side STT option for webchat voice input #47311

Summary

Problem

Proposed Solution

UI Suggestion

Backend

Benefits

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature Request] Add server-side STT option for webchat voice input #47311

Description

Summary

Problem

Proposed Solution

UI Suggestion

Backend

Benefits

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions