stt

stt is an MLX-first local speech-to-text CLI for macOS, designed for agents and automation.

It answers one practical question cleanly:

Which local transcription backend should I use for this file right now, and then how do I transcribe it?

It is intentionally focused on transcription only:

recommend the best local backend
run the transcript
benchmark the local backends
inspect the current runtime

No summarization layer. No “media router”. No prompt framework.

Why This Exists

Local transcription on Apple Silicon is fragmented:

parakeet-mlx is great for fast local English transcription and subtitle-style workflows
direct mlx-audio Parakeet can be even faster on short clips
Qwen3-ASR is better for Spanish and multilingual audio

Agents need a small, deterministic MLX-first CLI that tells them which one to use, then runs it.

That is what stt does.

What It Uses

Models:

Open-source projects:

mlx-audio for direct MLX-backed inference
parakeet-mlx for the CLI subtitle/transcription path
ffmpeg for media conversion and probing

Best Local Defaults

English, short clip, speed matters: mlx-parakeet
English, subtitles or long audio: parakeet-mlx
Spanish / multilingual / unknown language: qwen3-asr-0.6b
Higher multilingual accuracy: qwen3-asr-1.7b

These defaults are based on local benchmarking logic built into the tool.

Installation

Fastest install

This gives you:

the stt CLI
an isolated runtime under ~/Library/Application Support/mlx-stt
mlx-audio
parakeet-mlx
pre-downloaded core models

curl -fsSL https://raw.githubusercontent.com/nachoal/mlx-stt/main/install.sh | bash

Homebrew-oriented install

If you prefer to start from Homebrew-managed tools:

brew install uv ffmpeg
uv tool install git+https://github.com/nachoal/mlx-stt
stt setup --download-models core

Manual install

uv tool install git+https://github.com/nachoal/mlx-stt

For local development:

uv tool install --force -e .

Then create the isolated runtime:

stt setup --download-models core

This creates a dedicated runtime and stores its paths in ~/Library/Application Support/mlx-stt/config.json.

If you already have your own MLX Python environment and want to use that instead:

export STT_SHARED_PYTHON=/path/to/python-with-mlx-audio

stt doctor --json will show exactly which runtime is active.

Commands

Recommend

stt recommend /path/to/file.wav --language english --speed --json

Example output:

{
  "backend": "mlx-parakeet",
  "model": "mlx-community/parakeet-tdt-0.6b-v3"
}

Transcribe

stt transcribe /path/to/file.wav --language english --speed --json

stt transcribe automatically normalizes video inputs and compressed audio that benefits from ffmpeg preprocessing, including Telegram-style .ogg/Opus voice notes, into mono 16 kHz WAV before handing the file to the selected backend.

Force a backend:

stt transcribe /path/to/file.wav --backend qwen3-asr-0.6b --json
stt transcribe /path/to/file.wav --backend mlx-parakeet --json
stt transcribe /path/to/file.wav --backend parakeet-mlx --output-format srt --json

Write output files:

stt transcribe /path/to/file.wav --output-dir ./out --output-name transcript --json

Benchmark

Single file:

stt benchmark /path/to/file.wav --reference-text "expected transcript" --language spanish --json

Fixture-based suite:

export STT_SAMPLE_ENGLISH=/path/to/english.wav
export STT_SAMPLE_ENGLISH_TEXT="Hello. This is a test."
export STT_SAMPLE_SPANISH=/path/to/spanish.wav
export STT_SAMPLE_SPANISH_TEXT="..."
stt benchmark --suite repo-samples --json

Doctor

stt doctor --json

This reports:

whether ffmpeg is installed
whether parakeet-mlx is in PATH
which Python runtime will be used for mlx-audio
detected versions for parakeet-mlx, mlx-audio, and transformers

Setup

stt setup --download-models core

Options:

--download-models none|core|all
--install-ffmpeg
--runtime-dir /custom/path

Environment

STT_SHARED_PYTHON: Python executable with mlx-audio installed
STT_SAMPLE_ENGLISH: optional benchmark fixture
STT_SAMPLE_ENGLISH_TEXT: optional reference transcript
STT_SAMPLE_SPANISH: optional benchmark fixture
STT_SAMPLE_SPANISH_TEXT: optional reference transcript

Tests

Run unit tests:

uv run --with pytest pytest

The tests cover:

backend recommendation logic
benchmark row shaping
fixture-driven benchmark suite behavior

Design Notes

This project is optimized for agent ergonomics:

small command surface
JSON-first output
explicit backend recommendation
deterministic local execution

It is inspired by the thin, shell-friendly CLI style used in several of steipete’s OSS tools, while staying Python-native because the actual local MLX/Qwen inference stack is Python-first.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
stt		stt
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stt

Why This Exists

What It Uses

Best Local Defaults

Installation

Fastest install

Homebrew-oriented install

Manual install

Commands

Recommend

Transcribe

Benchmark

Doctor

Setup

Environment

Tests

Design Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stt

Why This Exists

What It Uses

Best Local Defaults

Installation

Fastest install

Homebrew-oriented install

Manual install

Commands

Recommend

Transcribe

Benchmark

Doctor

Setup

Environment

Tests

Design Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages