-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug] Voice STT: empty moonshine transcripts passed as raw JSON to LLM, clogging serialized processing queue #84660
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
When using moonshine-tiny-en for Discord voice STT, empty/noisy transcripts are passed as raw JSON strings to the LLM instead of being filtered out. This wastes ~4 seconds and ~24k input tokens per empty segment and clogs the serialized processing queue, making the bot appear unresponsive in voice.
Reproduction
voice.mode = "stt-tts"and moonshine-tiny-en as the STT model{"lang": "", "emotion": "", "event": "", "text": "", "timestamps": [], "durations": [], "tokens":[], "ys_log_probs": [], "words": []}entry.processingQueue) blocks until each call completesRoot Cause
In
manager.runtime,transcribeVoiceAudio()callsnormalizeOptionalString()on the STT result, which returnsundefinedfor empty strings. However, the sherpa-onnx CLI output includes the entire JSON object on the last line, and themediaUnderstanding.transcribeAudioFile()result appears to include the full JSON string astexteven when the"text"field within it is empty.The check at line ~1441 (
if (!transcript)) catchesundefinedbut NOT the full JSON string with an empty"text"field. So{"text": "", ...}passes through as a non-empty string transcript.Evidence
Session logs show:
100% of NO_REPLY responses (8 out of 8 in a recent session) were triggered by these empty JSON transcripts. The bot responded correctly to all real transcripts but was blocked during empty JSON processing.
52 segment files accumulated in 10 minutes. Only 10 TTS outputs were generated. The pipeline was processing empty JSON ~35% of the time.
Expected Behavior
"text": ""(or equivalent empty transcript), the segment should be skipped entirely — no LLM call neededEnvironment
Workaround
Reducing
captureSilenceGraceMs(from 1500 to 1000) andtimeoutSeconds(from 300 to 120) helps marginally, plus periodic cleanup of stale/tmp/openclaw/discord-voice-*/segment.wavfiles. But the core issue is that empty transcripts should be filtered before reaching the LLM.