feat(media): add parakeet-mlx as local audio transcription provider#7552
feat(media): add parakeet-mlx as local audio transcription provider#7552mac-110 wants to merge 1 commit intoopenclaw:mainfrom
Conversation
- Add resolveParakeetEntry() for auto-detecting parakeet-mlx CLI - Add resolveParakeetOutputPath() for reading transcription output - Prioritize parakeet-mlx over other local STT providers parakeet-mlx (Nvidia Parakeet V3) provides: - Better accuracy than Whisper (6% WER vs 10-12%) - Faster transcription (3386x realtime on Apple Silicon) - No API costs (runs locally) - Native MLX support for M-series Macs Install: uv tool install parakeet-mlx Fixes openclaw#6130
|
|
||
| async function resolveParakeetEntry(): Promise<MediaUnderstandingModelConfig | null> { | ||
| if (!(await hasBinary("parakeet-mlx"))) { | ||
| return null; | ||
| } | ||
| return { | ||
| type: "cli", | ||
| command: "parakeet-mlx", | ||
| args: ["{{MediaPath}}", "--output-format", "txt", "--output-dir", "{{OutputDir}}"], | ||
| }; | ||
| } |
There was a problem hiding this comment.
[P1] Parakeet output file name may not match path.parse(mediaPath).name
resolveParakeetOutputPath assumes the CLI writes ${path.parse(mediaPath).name}.txt into --output-dir, but some CLIs derive output names from the input file name including extension or from an internal temp name. In this runner, mediaPath comes from cache.getPath() and may be a temp file name that doesn’t match the original attachment name, so output detection can fail and we’ll fall back to trimming stdout (which may be empty if the CLI writes only to file). This would manifest as “empty output” skips even when the transcription succeeded.
If parakeet-mlx supports an explicit output filename flag, prefer using that (so output resolution is deterministic), or adapt resolution to parakeet’s actual naming scheme.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/runner.ts
Line: 326:336
Comment:
[P1] Parakeet output file name may not match `path.parse(mediaPath).name`
`resolveParakeetOutputPath` assumes the CLI writes `${path.parse(mediaPath).name}.txt` into `--output-dir`, but some CLIs derive output names from the *input file name including extension* or from an internal temp name. In this runner, `mediaPath` comes from `cache.getPath()` and may be a temp file name that doesn’t match the original attachment name, so output detection can fail and we’ll fall back to trimming `stdout` (which may be empty if the CLI writes only to file). This would manifest as “empty output” skips even when the transcription succeeded.
If parakeet-mlx supports an explicit output filename flag, prefer using that (so output resolution is deterministic), or adapt resolution to parakeet’s actual naming scheme.
How can I resolve this? If you propose a fix, please make it concise.| function resolveParakeetOutputPath(args: string[], mediaPath: string): string | null { | ||
| const outputFormat = findArgValue(args, ["--output-format"]); | ||
| const outputDir = findArgValue(args, ["--output-dir"]); | ||
| if (!outputDir || outputFormat !== "txt") { | ||
| return null; | ||
| } | ||
| const base = path.parse(mediaPath).name; | ||
| return path.join(outputDir, `${base}.txt`); |
There was a problem hiding this comment.
[P3] resolveParakeetOutputPath is stricter than the Whisper resolver
This returns null unless --output-format is exactly txt. If parakeet-mlx ever defaults to txt when --output-format is omitted (or accepts aliases like text), we’d miss the output file even though it exists. Consider treating missing --output-format as “txt” (since resolveParakeetEntry always sets it anyway) or loosening the check to accept common equivalents.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/runner.ts
Line: 663:670
Comment:
[P3] `resolveParakeetOutputPath` is stricter than the Whisper resolver
This returns `null` unless `--output-format` is exactly `txt`. If parakeet-mlx ever defaults to txt when `--output-format` is omitted (or accepts aliases like `text`), we’d miss the output file even though it exists. Consider treating missing `--output-format` as “txt” (since `resolveParakeetEntry` always sets it anyway) or loosening the check to accept common equivalents.
How can I resolve this? If you propose a fix, please make it concise.
Update: Working out of the box in 2026.2.2?After updating to OpenClaw 2026.2.2-3, I noticed that parakeet-mlx transcription is working without this PR being merged. 🤔 What we observed:
However, analyzing the code:
Possible explanations:
Could a maintainer clarify if there's new generic CLI output handling in 2026.2.2 that we missed? If so, this PR might be redundant. If not, our transcription might be using a fallback without us realizing it. Either way, keeping this PR open for explicit parakeet-mlx support seems valuable for documentation and reliability. |
Add resolveParakeetOutputPath to read transcripts from parakeet-mlx
output files. Parakeet writes to --output-dir/basename.txt similar
to whisper's --output_dir but with hyphen instead of underscore.
This enables local speech-to-text via parakeet-mlx without requiring
a wrapper script.
Config example:
```json
{
"tools": {
"media": {
"audio": {
"models": [{
"type": "cli",
"command": "parakeet-mlx",
"args": ["{{MediaPath}}", "--output-format", "txt", "--output-dir", "{{OutputDir}}"]
}]
}
}
}
}
```
Fixes: Previous issue openclaw#7552 was incorrectly closed as duplicate of
an unrelated Windows path bug (openclaw#7536).
|
Closing as duplicate of #9177. If this is incorrect, comment and we can reopen. |
Summary
Adds support for parakeet-mlx (Nvidia Parakeet V3) as a local audio transcription option.
Changes
resolveParakeetEntry()for auto-detecting parakeet-mlx CLIresolveParakeetOutputPath()for reading transcription outputWhy parakeet-mlx?
Installation
uv tool install parakeet-mlx # or: pip install parakeet-mlxRelated
Helps address #6130 by providing a reliable local STT option that avoids sending audio binaries to the LLM.
Testing
Tested manually on M4 Mac Mini — transcribes voice messages in real-time with excellent quality.
Greptile Overview
Greptile Summary
This PR adds support for the
parakeet-mlxCLI as a local audio transcription option in the media-understanding runner.Concretely, it:
resolveParakeetEntry()to auto-detect theparakeet-mlxbinary and wire it as aclimodel entry.resolveParakeetOutputPath()and hooks it intoresolveCliOutput()so the runner can read the transcription from the CLI’s output file when available.The change fits into the existing “auto local CLI selection” path (
resolveLocalAudioEntry+runCliEntry) by adding another CLI-backed provider that uses the same temp output directory and post-run output-file readback logic used by other CLIs.Confidence Score: 4/5
(2/5) Greptile learns from your feedback when you react with thumbs up/down!
Context used:
dashboard- CLAUDE.md (source)dashboard- AGENTS.md (source)