Push-to-talk voice-to-text that works on any desktop. Hold a key, speak, release. Your words appear at the cursor.
$ voxtype
[INFO] Voxtype v0.6.0 starting...
[INFO] Using model: base.en
[INFO] Hotkey: SCROLLLOCK
[INFO] Ready! Hold SCROLLLOCK to record.
# User holds ScrollLock and speaks...
[INFO] Recording started...
[INFO] Recording stopped (2.1s)
[INFO] Transcribing...
[INFO] Transcribed: "Hello world, this is a test."
[INFO] Typed 30 characters
Built specifically for the modern Linux desktop
Works on GNOME, KDE, Sway, Hyprland, River—Wayland or X11. Use built-in hotkeys or native compositor keybindings.
Use your compositor's keybindings instead of evdev—no input group required.
Bind any key combo like Super+V for push-to-talk.
All speech recognition happens locally using whisper.cpp. Your voice never leaves your machine. No cloud, no subscriptions, no tracking.
Natural workflow: hold a key to record, release to transcribe. No "wake words", no accidental activations, just controlled dictation.
Choose your hotkey, Whisper model size, output mode (type or clipboard), and more. Works the way you want it to.
Types directly at cursor via wtype, with full CJK/Unicode support. Falls back to ydotool or clipboard. Always works, one way or another.
Written in Rust for performance. Single binary, minimal dependencies. No Python, no virtual environments—just works.
Optional Vulkan, CUDA, Metal, and ROCm support for blazing-fast inference. Sub-second transcription with large-v3-turbo on modern GPUs.
Optional status indicator shows recording state in your bar. See when Voxtype is idle, recording, or transcribing at a glance. 10 built-in icon themes (Nerd Font, Material Design, emoji, and more) or define your own.
Configurable events surface through notify-send, so you always know when recording starts, stops, and what was transcribed.
Pipe transcriptions through local LLMs for translation, domain-specific vocabulary, or custom workflows. Falls back gracefully on errors.
Offload transcription to a self-hosted GPU server running whisper.cpp. Use your laptop while leveraging powerful remote hardware. Can also connect to cloud APIs for those who choose.
Watch Voxtype transform voice into text
Watch as Voxtype captures speech and types a prompt directly into an AI coding assistant. Perfect for hands-free interaction with agentic tools.
Balance speed and accuracy for your needs. With GPU acceleration, even large-v3 achieves sub-second inference!
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny.en |
39 MB | Quick notes, low-end hardware | ||
base.en Recommended |
142 MB | Most users | ||
small.en |
466 MB | Higher accuracy needs | ||
medium.en |
1.5 GB | Professional transcription |
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny |
75 MB | Quick notes, any language | ||
base |
142 MB | General multilingual use | ||
small |
466 MB | Better multilingual accuracy | ||
medium |
1.5 GB | Professional multilingual | ||
large-v3 |
3.1 GB | Maximum accuracy | ||
large-v3-turbo GPU Recommended |
1.6 GB | Fast + accurate |
| Engine / Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
parakeet-tdt-0.6b-v3-int8 English |
670 MB | Best English accuracy, built-in punctuation | ||
moonshine-base |
237 MB | Fastest CPU inference, English | ||
sensevoice-small CJK |
239 MB | Chinese, Japanese, Korean, Cantonese, English | ||
paraformer-zh |
487 MB | Chinese + English bilingual | ||
dolphin-base |
198 MB | 40 languages + 22 Chinese dialects | ||
omnilingual-large |
3.9 GB | 1600+ languages, rare and low-resource |
.en models are English-only but faster and more accurate for English. All ONNX engines require the ONNX binary variant. Switch with voxtype setup onnx --enable.
Get up and running in minutes
Other Linux voice typing tools require you to clone a repo, run an install script, set up a Python virtual environment, and remember to activate it every time you reboot. Voxtype is different: it's a single binary. Install it from your package manager, download a Whisper model, and enable the systemd user service. No virtual environments, no dependency conflicts, no activation scripts. It just works, every time you log in.
# Using paru
paru -S voxtype
# Or using yay
yay -S voxtype
# Install optional dependencies
sudo pacman -S wtype wl-clipboard
Requires Ubuntu 24.04+ or Debian Trixie+ (glibc 2.38). Older versions can build from source.
# Download and install (Ubuntu 24.04+, Debian Trixie+)
wget https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype_0.6.3-1_amd64.deb
sudo dpkg -i voxtype_0.6.3-1_amd64.deb
sudo apt-get install -f
# Install optional dependencies
sudo apt install wtype wl-clipboard
Requires Fedora 39+ (glibc 2.38).
# Download and install (Fedora 39+)
wget https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype-0.6.3-1.x86_64.rpm
sudo dnf install ./voxtype-0.6.3-1.x86_64.rpm
# Install optional dependencies
sudo dnf install wtype wl-clipboard
# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install build dependencies (Arch)
sudo pacman -S base-devel clang alsa-lib
# Clone and build
git clone https://github.com/peteonrails/voxtype
cd voxtype
cargo build --release
# Install
sudo cp target/release/voxtype /usr/local/bin/
For Hyprland, Sway, and River users. Uses native compositor keybindings—no input group required!
# Download whisper model and configure
voxtype setup --download
# Disable built-in hotkey (we'll use compositor keybindings)
cat >> ~/.config/voxtype/config.toml << 'EOF'
[hotkey]
enabled = false
EOF
# Enable state file (required for toggle mode)
echo 'state_file = "auto"' >> ~/.config/voxtype/config.toml
# Install as systemd service
voxtype setup systemd
# Optional: Fix modifier key interference (if using SUPER+key combos)
voxtype setup compositor hyprland # or: sway
Then add keybindings to your compositor config:
# Push-to-talk: hold Super+V to record, release to transcribe
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop
# Or use Scroll Lock
bind = , SCROLL_LOCK, exec, voxtype record start
bindr = , SCROLL_LOCK, exec, voxtype record stop
# Push-to-talk: hold $mod+v to record, release to transcribe
bindsym $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop
# Push-to-talk: hold Super+V to record, release to transcribe
riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'
For GNOME, KDE, and other desktops. Uses kernel-level hotkey detection.
# Add user to input group (required for hotkey detection)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect
# Download whisper model and configure
voxtype setup --download
# Install as systemd service (starts on login)
voxtype setup systemd
# Check status
systemctl --user status voxtype
Tested on all major Linux desktops. Optimized for Wayland, works on X11 too.
Voxtype is a young project and your feedback helps make it better
If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please open an issue. I actively monitor and respond to all reports.
Report an IssueA star on GitHub helps others discover the project. A vote on the AUR package increases the likelihood of inclusion in the Arch extras repository.
Start dictating on your Linux desktop today.