stt is an MLX-first local speech-to-text CLI for macOS, designed for agents and automation.
It answers one practical question cleanly:
Which local transcription backend should I use for this file right now, and then how do I transcribe it?
It is intentionally focused on transcription only:
- recommend the best local backend
- run the transcript
- benchmark the local backends
- inspect the current runtime
No summarization layer. No “media router”. No prompt framework.
Local transcription on Apple Silicon is fragmented:
parakeet-mlxis great for fast local English transcription and subtitle-style workflows- direct
mlx-audioParakeet can be even faster on short clips Qwen3-ASRis better for Spanish and multilingual audio
Agents need a small, deterministic MLX-first CLI that tells them which one to use, then runs it.
That is what stt does.
Models:
mlx-community/parakeet-tdt-0.6b-v3mlx-community/Qwen3-ASR-0.6B-8bitmlx-community/Qwen3-ASR-1.7B-8bit
Open-source projects:
mlx-audiofor direct MLX-backed inferenceparakeet-mlxfor the CLI subtitle/transcription pathffmpegfor media conversion and probing
- English, short clip, speed matters:
mlx-parakeet - English, subtitles or long audio:
parakeet-mlx - Spanish / multilingual / unknown language:
qwen3-asr-0.6b - Higher multilingual accuracy:
qwen3-asr-1.7b
These defaults are based on local benchmarking logic built into the tool.
This gives you:
- the
sttCLI - an isolated runtime under
~/Library/Application Support/mlx-stt mlx-audioparakeet-mlx- pre-downloaded core models
curl -fsSL https://raw.githubusercontent.com/nachoal/mlx-stt/main/install.sh | bashIf you prefer to start from Homebrew-managed tools:
brew install uv ffmpeg
uv tool install git+https://github.com/nachoal/mlx-stt
stt setup --download-models coreuv tool install git+https://github.com/nachoal/mlx-sttFor local development:
uv tool install --force -e .Then create the isolated runtime:
stt setup --download-models coreThis creates a dedicated runtime and stores its paths in ~/Library/Application Support/mlx-stt/config.json.
If you already have your own MLX Python environment and want to use that instead:
export STT_SHARED_PYTHON=/path/to/python-with-mlx-audiostt doctor --json will show exactly which runtime is active.
stt recommend /path/to/file.wav --language english --speed --jsonExample output:
{
"backend": "mlx-parakeet",
"model": "mlx-community/parakeet-tdt-0.6b-v3"
}stt transcribe /path/to/file.wav --language english --speed --jsonstt transcribe automatically normalizes video inputs and compressed audio that benefits from ffmpeg preprocessing, including Telegram-style .ogg/Opus voice notes, into mono 16 kHz WAV before handing the file to the selected backend.
Force a backend:
stt transcribe /path/to/file.wav --backend qwen3-asr-0.6b --json
stt transcribe /path/to/file.wav --backend mlx-parakeet --json
stt transcribe /path/to/file.wav --backend parakeet-mlx --output-format srt --jsonWrite output files:
stt transcribe /path/to/file.wav --output-dir ./out --output-name transcript --jsonSingle file:
stt benchmark /path/to/file.wav --reference-text "expected transcript" --language spanish --jsonFixture-based suite:
export STT_SAMPLE_ENGLISH=/path/to/english.wav
export STT_SAMPLE_ENGLISH_TEXT="Hello. This is a test."
export STT_SAMPLE_SPANISH=/path/to/spanish.wav
export STT_SAMPLE_SPANISH_TEXT="..."
stt benchmark --suite repo-samples --jsonstt doctor --jsonThis reports:
- whether
ffmpegis installed - whether
parakeet-mlxis inPATH - which Python runtime will be used for
mlx-audio - detected versions for
parakeet-mlx,mlx-audio, andtransformers
stt setup --download-models coreOptions:
--download-models none|core|all--install-ffmpeg--runtime-dir /custom/path
STT_SHARED_PYTHON: Python executable withmlx-audioinstalledSTT_SAMPLE_ENGLISH: optional benchmark fixtureSTT_SAMPLE_ENGLISH_TEXT: optional reference transcriptSTT_SAMPLE_SPANISH: optional benchmark fixtureSTT_SAMPLE_SPANISH_TEXT: optional reference transcript
Run unit tests:
uv run --with pytest pytestThe tests cover:
- backend recommendation logic
- benchmark row shaping
- fixture-driven benchmark suite behavior
This project is optimized for agent ergonomics:
- small command surface
- JSON-first output
- explicit backend recommendation
- deterministic local execution
It is inspired by the thin, shell-friendly CLI style used in several of steipete’s OSS tools, while staying Python-native because the actual local MLX/Qwen inference stack is Python-first.