[Bug] Voice STT: empty moonshine transcripts passed as raw JSON to LLM, clogging serialized processing queue

## Summary

When using moonshine-tiny-en for Discord voice STT, empty/noisy transcripts are passed as raw JSON strings to the LLM instead of being filtered out. This wastes ~4 seconds and ~24k input tokens per empty segment and clogs the serialized processing queue, making the bot appear unresponsive in voice.

## Reproduction

1. Configure OpenClaw with `voice.mode = "stt-tts"` and moonshine-tiny-en as the STT model
2. Join a voice channel with background noise or short utterances
3. Observe that short/noisy audio segments produce empty transcripts: `{"lang": "", "emotion": "", "event": "", "text": "", "timestamps": [], "durations": [], "tokens":[], "ys_log_probs": [], "words": []}`
4. These empty JSON strings are sent to the LLM as "transcripts" instead of being filtered
5. The LLM returns NO_REPLY (correct behavior), but each call wastes ~4s and ~24k tokens
6. The serialized processing queue (`entry.processingQueue`) blocks until each call completes
7. With ~35% of segments being empty JSON, the pipeline appears to "stop" responding

## Root Cause

In `manager.runtime`, `transcribeVoiceAudio()` calls `normalizeOptionalString()` on the STT result, which returns `undefined` for empty strings. However, the sherpa-onnx CLI output includes the entire JSON object on the last line, and the `mediaUnderstanding.transcribeAudioFile()` result appears to include the full JSON string as `text` even when the `"text"` field within it is empty.

The check at line ~1441 (`if (!transcript)`) catches `undefined` but NOT the full JSON string with an empty `"text"` field. So `{"text": "", ...}` passes through as a non-empty string transcript.

## Evidence

Session logs show:
```
Voice transcript from speaker "[CK] Alex the 'guin":
{"lang": "", "emotion": "", "event": "", "text": "", "timestamps": [], "durations": [], "tokens":[], "ys_log_probs": [], "words": []}
```

100% of NO_REPLY responses (8 out of 8 in a recent session) were triggered by these empty JSON transcripts. The bot responded correctly to all real transcripts but was blocked during empty JSON processing.

52 segment files accumulated in 10 minutes. Only 10 TTS outputs were generated. The pipeline was processing empty JSON ~35% of the time.

## Expected Behavior

1. When the STT model returns `"text": ""` (or equivalent empty transcript), the segment should be skipped entirely — no LLM call needed
2. The serialized processing queue should have a max depth or stale-segment discard mechanism to prevent pipeline stalls

## Environment

- OpenClaw 2026.5.18
- sherpa-onnx moonshine-tiny-en (int8)
- Discord voice mode: stt-tts
- Platform: Linode 4 vCPU, 8GB RAM

## Workaround

Reducing `captureSilenceGraceMs` (from 1500 to 1000) and `timeoutSeconds` (from 300 to 120) helps marginally, plus periodic cleanup of stale `/tmp/openclaw/discord-voice-*/segment.wav` files. But the core issue is that empty transcripts should be filtered before reaching the LLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Voice STT: empty moonshine transcripts passed as raw JSON to LLM, clogging serialized processing queue #84660

Summary

Reproduction

Root Cause

Evidence

Expected Behavior

Environment

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Voice STT: empty moonshine transcripts passed as raw JSON to LLM, clogging serialized processing queue #84660

Description

Summary

Reproduction

Root Cause

Evidence

Expected Behavior

Environment

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions