A macOS menu bar app for on-device speech-to-text. Hold a hotkey, speak, release — transcribed text is pasted into the active application automatically.
All processing runs locally using Qwen3 ASR models via Apple's MLX framework. No audio leaves your machine.
- Push-to-talk hotkey — configurable custom key combinations (including left/right modifiers) with global detection via a CGEvent tap
- Multiple model options — Qwen3 ASR 0.6B (8-bit), 1.7B (8-bit), and 1.7B (4-bit) with on-demand downloading and per-model cache management
- Smart paste — transcribed text is written to the pasteboard, Cmd+V is simulated via Accessibility, and the original clipboard contents are restored afterward; a space is prepended when the cursor follows non-whitespace
- Visual feedback — animated floating overlay with a MeshGradient whose speed responds to real-time audio level
- Menu bar UI — model selector with download/delete controls, permission status indicators, inline hotkey capture, and run-on-startup toggle
- Privacy-first — fully offline inference, no network calls after model download
- macOS (Apple Silicon recommended for MLX performance)
- Xcode (for building from source)
- Microphone permission
- Accessibility permission (for simulating paste keystrokes and detecting cursor context)
-
Open
whisper.xcodeprojin Xcode. Go to the whisper target, then Signing & Capabilities, enable Automatically manage signing, and select your Team (Personal Team works for local use). -
Build a Release app bundle:
xcodebuild -project whisper.xcodeproj -scheme whisper -configuration Release -derivedDataPath build clean build -
Copy the built
.appinto/Applications:cp -R "build/Build/Products/Release/whisper.app" /Applications/ -
Launch from
/Applications(not from DerivedData):open /Applications/whisper.app -
Grant Microphone and Accessibility permissions when prompted.
Why
/Applicationsmatters — the Run on Startup toggle usesSMAppService.mainApp, which works most reliably when the app is installed in/Applicationsand properly signed.
If macOS blocks launch — right-click the app and choose Open, or remove quarantine:
xattr -dr com.apple.quarantine /Applications/whisper.app
- A global CGEvent tap listens for the configured key combination (left/right modifier aware).
- On key-down,
AVAudioEnginebegins capturing microphone input at the native sample rate. - On key-up, recording stops. Audio is resampled to 16 kHz and passed to the Qwen3 ASR model running on-device via MLX.
- The transcribed text is placed on the pasteboard, a Cmd+V keystroke is simulated through the Accessibility API, and the original pasteboard contents are restored.
whisperApp.swift App entry point, hotkey wiring, lifecycle
AppState.swift Observable state machine (idle/recording/transcribing/pasting/error)
Services/
TranscriptionService Actor-isolated ML inference, model download & cache
AudioRecorder AVAudioEngine capture, RMS level, 16 kHz resampling
PasteController Pasteboard snapshot/restore, Cmd+V simulation
Views/
MenuBarView Dropdown menu (models, permissions, settings)
RecordingOverlay Animated MeshGradient circle
OverlayManager Overlay lifecycle
OverlayPanel Non-activating transparent NSPanel
Models/
STTModelDefinition Model registry (name, HuggingFace repo, quantization)
Hotkey/
HotkeyDefinitions CGEvent tap, custom key combos, UserDefaults persistence + legacy migration
| Package | Requirement | Products used | Purpose |
|---|---|---|---|
| mlx-audio-swift | revision: cc3b3880be05caf908970729e15ec209d018f06d |
MLXAudioSTT, MLXAudioCore |
On-device speech-to-text and audio ML pipeline |
See LICENSE for details.