Skip to content

feat(media): add parakeet-mlx as local audio transcription provider#7552

Closed
mac-110 wants to merge 1 commit intoopenclaw:mainfrom
mac-110:feature/parakeet-mlx-local-stt
Closed

feat(media): add parakeet-mlx as local audio transcription provider#7552
mac-110 wants to merge 1 commit intoopenclaw:mainfrom
mac-110:feature/parakeet-mlx-local-stt

Conversation

@mac-110
Copy link

@mac-110 mac-110 commented Feb 3, 2026

Summary

Adds support for parakeet-mlx (Nvidia Parakeet V3) as a local audio transcription option.

Changes

  • Add resolveParakeetEntry() for auto-detecting parakeet-mlx CLI
  • Add resolveParakeetOutputPath() for reading transcription output
  • Prioritize parakeet-mlx in the local STT provider chain

Why parakeet-mlx?

Metric Parakeet V3 OpenAI Whisper
WER 6.05% 10-12%
Speed 3386x RTFx ~216x RTFx
Cost FREE (local) $0.006/min
  • Native MLX support for Apple Silicon (M1/M2/M3/M4)
  • Better accuracy than Whisper
  • Significantly faster transcription
  • No API costs

Installation

uv tool install parakeet-mlx
# or: pip install parakeet-mlx

Related

Helps address #6130 by providing a reliable local STT option that avoids sending audio binaries to the LLM.

Testing

Tested manually on M4 Mac Mini — transcribes voice messages in real-time with excellent quality.

Greptile Overview

Greptile Summary

This PR adds support for the parakeet-mlx CLI as a local audio transcription option in the media-understanding runner.

Concretely, it:

  • Introduces resolveParakeetEntry() to auto-detect the parakeet-mlx binary and wire it as a cli model entry.
  • Prioritizes parakeet in the local audio provider selection chain (ahead of sherpa-onnx and whisper-based fallbacks).
  • Adds resolveParakeetOutputPath() and hooks it into resolveCliOutput() so the runner can read the transcription from the CLI’s output file when available.

The change fits into the existing “auto local CLI selection” path (resolveLocalAudioEntry + runCliEntry) by adding another CLI-backed provider that uses the same temp output directory and post-run output-file readback logic used by other CLIs.

Confidence Score: 4/5

  • This PR is likely safe to merge, with a small risk of parakeet output detection failing on some installations.
  • The change is narrow (single file) and follows the existing CLI-provider pattern, but correctness depends on parakeet-mlx’s output naming/behavior matching the resolver; if it differs, transcriptions may be skipped despite successful CLI execution.
  • src/media-understanding/runner.ts

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Context used:

  • Context from dashboard - CLAUDE.md (source)
  • Context from dashboard - AGENTS.md (source)

- Add resolveParakeetEntry() for auto-detecting parakeet-mlx CLI
- Add resolveParakeetOutputPath() for reading transcription output
- Prioritize parakeet-mlx over other local STT providers

parakeet-mlx (Nvidia Parakeet V3) provides:
- Better accuracy than Whisper (6% WER vs 10-12%)
- Faster transcription (3386x realtime on Apple Silicon)
- No API costs (runs locally)
- Native MLX support for M-series Macs

Install: uv tool install parakeet-mlx

Fixes openclaw#6130
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 326 to +336

async function resolveParakeetEntry(): Promise<MediaUnderstandingModelConfig | null> {
if (!(await hasBinary("parakeet-mlx"))) {
return null;
}
return {
type: "cli",
command: "parakeet-mlx",
args: ["{{MediaPath}}", "--output-format", "txt", "--output-dir", "{{OutputDir}}"],
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Parakeet output file name may not match path.parse(mediaPath).name

resolveParakeetOutputPath assumes the CLI writes ${path.parse(mediaPath).name}.txt into --output-dir, but some CLIs derive output names from the input file name including extension or from an internal temp name. In this runner, mediaPath comes from cache.getPath() and may be a temp file name that doesn’t match the original attachment name, so output detection can fail and we’ll fall back to trimming stdout (which may be empty if the CLI writes only to file). This would manifest as “empty output” skips even when the transcription succeeded.

If parakeet-mlx supports an explicit output filename flag, prefer using that (so output resolution is deterministic), or adapt resolution to parakeet’s actual naming scheme.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/runner.ts
Line: 326:336

Comment:
[P1] Parakeet output file name may not match `path.parse(mediaPath).name`

`resolveParakeetOutputPath` assumes the CLI writes `${path.parse(mediaPath).name}.txt` into `--output-dir`, but some CLIs derive output names from the *input file name including extension* or from an internal temp name. In this runner, `mediaPath` comes from `cache.getPath()` and may be a temp file name that doesn’t match the original attachment name, so output detection can fail and we’ll fall back to trimming `stdout` (which may be empty if the CLI writes only to file). This would manifest as “empty output” skips even when the transcription succeeded.

If parakeet-mlx supports an explicit output filename flag, prefer using that (so output resolution is deterministic), or adapt resolution to parakeet’s actual naming scheme.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +663 to +670
function resolveParakeetOutputPath(args: string[], mediaPath: string): string | null {
const outputFormat = findArgValue(args, ["--output-format"]);
const outputDir = findArgValue(args, ["--output-dir"]);
if (!outputDir || outputFormat !== "txt") {
return null;
}
const base = path.parse(mediaPath).name;
return path.join(outputDir, `${base}.txt`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] resolveParakeetOutputPath is stricter than the Whisper resolver

This returns null unless --output-format is exactly txt. If parakeet-mlx ever defaults to txt when --output-format is omitted (or accepts aliases like text), we’d miss the output file even though it exists. Consider treating missing --output-format as “txt” (since resolveParakeetEntry always sets it anyway) or loosening the check to accept common equivalents.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/runner.ts
Line: 663:670

Comment:
[P3] `resolveParakeetOutputPath` is stricter than the Whisper resolver

This returns `null` unless `--output-format` is exactly `txt`. If parakeet-mlx ever defaults to txt when `--output-format` is omitted (or accepts aliases like `text`), we’d miss the output file even though it exists. Consider treating missing `--output-format` as “txt” (since `resolveParakeetEntry` always sets it anyway) or loosening the check to accept common equivalents.

How can I resolve this? If you propose a fix, please make it concise.

@mac-110
Copy link
Author

mac-110 commented Feb 4, 2026

Update: Working out of the box in 2026.2.2?

After updating to OpenClaw 2026.2.2-3, I noticed that parakeet-mlx transcription is working without this PR being merged. 🤔

What we observed:

  • Config uses parakeet-mlx with --output-format txt --output-dir {{OutputDir}}
  • Voice messages are being transcribed correctly
  • No patches applied

However, analyzing the code:

  • resolveCliOutput() only has special handling for whisper, whisper-cli, gemini, sherpa-onnx-offline
  • For parakeet-mlx, it should fall back to stdout.trim()
  • But parakeet only outputs "transcription complete..." to stdout, not the actual transcript

Possible explanations:

  1. There's a fallback to OpenAI Whisper API when CLI output doesn't look like a transcript?
  2. Generic OutputDir file reading was added somewhere?
  3. Something else we're missing?

Could a maintainer clarify if there's new generic CLI output handling in 2026.2.2 that we missed? If so, this PR might be redundant. If not, our transcription might be using a fallback without us realizing it.

Either way, keeping this PR open for explicit parakeet-mlx support seems valuable for documentation and reliability.

mac-110 pushed a commit to mac-110/openclaw that referenced this pull request Feb 4, 2026
Add resolveParakeetOutputPath to read transcripts from parakeet-mlx
output files. Parakeet writes to --output-dir/basename.txt similar
to whisper's --output_dir but with hyphen instead of underscore.

This enables local speech-to-text via parakeet-mlx without requiring
a wrapper script.

Config example:
```json
{
  "tools": {
    "media": {
      "audio": {
        "models": [{
          "type": "cli",
          "command": "parakeet-mlx",
          "args": ["{{MediaPath}}", "--output-format", "txt", "--output-dir", "{{OutputDir}}"]
        }]
      }
    }
  }
}
```

Fixes: Previous issue openclaw#7552 was incorrectly closed as duplicate of
an unrelated Windows path bug (openclaw#7536).
@sebslight
Copy link
Member

Closing as duplicate of #9177. If this is incorrect, comment and we can reopen.

@sebslight sebslight closed this Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants