Voxtype - Push-to-Talk Voice-to-Text for Linux and macOS

Why Voxtype?

Built specifically for the modern Linux desktop

Works on Any Linux Desktop

Works on GNOME, KDE, Sway, Hyprland, River—Wayland or X11. Use built-in hotkeys or native compositor keybindings.

Native Hyprland, Sway & River Support

Use your compositor's keybindings instead of evdev—no input group required. Bind any key combo like Super+V for push-to-talk.

Fully Offline & Private

All speech recognition happens locally using whisper.cpp. Your voice never leaves your machine. No cloud, no subscriptions, no tracking.

Push-to-Talk

Natural workflow: hold a key to record, release to transcribe. No "wake words", no accidental activations, just controlled dictation.

Highly Configurable

Choose your hotkey, Whisper model size, output mode (type or clipboard), and more. Works the way you want it to.

Smart Fallbacks

Types directly at cursor via wtype, with full CJK/Unicode support. Falls back to ydotool or clipboard. Always works, one way or another.

Fast & Lightweight

Written in Rust for performance. Single binary, minimal dependencies. No Python, no virtual environments—just works.

GPU Acceleration

Optional Vulkan, CUDA, Metal, and ROCm support for blazing-fast inference. Sub-second transcription with large-v3-turbo on modern GPUs.

Waybar Integration

Optional status indicator shows recording state in your bar. See when Voxtype is idle, recording, or transcribing at a glance. 10 built-in icon themes (Nerd Font, Material Design, emoji, and more) or define your own.

Desktop Notifications

Configurable events surface through notify-send, so you always know when recording starts, stops, and what was transcribed.

LLM Post-Processing

Pipe transcriptions through local LLMs for translation, domain-specific vocabulary, or custom workflows. Falls back gracefully on errors.

Remote Whisper Servers

Offload transcription to a self-hosted GPU server running whisper.cpp. Use your laptop while leveraging powerful remote hardware. Can also connect to cloud APIs for those who choose.

Choose Your Model

Balance speed and accuracy for your needs. With GPU acceleration, even large-v3 achieves sub-second inference!

Whisper Models (English-only)

Model	Size	Best For
`tiny.en`	39 MB	Quick notes, low-end hardware
`base.en` Recommended	142 MB	Most users
`small.en`	466 MB	Higher accuracy needs
`medium.en`	1.5 GB	Professional transcription

Whisper Models (Multilingual)

Model	Size	Best For
`tiny`	75 MB	Quick notes, any language
`base`	142 MB	General multilingual use
`small`	466 MB	Better multilingual accuracy
`medium`	1.5 GB	Professional multilingual
`large-v3`	3.1 GB	Maximum accuracy
`large-v3-turbo` GPU Recommended	1.6 GB	Fast + accurate

ONNX Engines (require ONNX binary variant)

Engine / Model	Size	Best For
`parakeet-tdt-0.6b-v3-int8` English	670 MB	Best English accuracy, built-in punctuation
`moonshine-base`	237 MB	Fastest CPU inference, English
`sensevoice-small` CJK	239 MB	Chinese, Japanese, Korean, Cantonese, English
`paraformer-zh`	487 MB	Chinese + English bilingual
`dolphin-base`	198 MB	40 languages + 22 Chinese dialects
`omnilingual-large`	3.9 GB	1600+ languages, rare and low-resource

.en models are English-only but faster and more accurate for English. All ONNX engines require the ONNX binary variant. Switch with voxtype setup onnx --enable.

Installation

Get up and running in minutes

Other Linux voice typing tools require you to clone a repo, run an install script, set up a Python virtual environment, and remember to activate it every time you reboot. Voxtype is different: it's a single binary. Install it from your package manager, download a Whisper model, and enable the systemd user service. No virtual environments, no dependency conflicts, no activation scripts. It just works, every time you log in.

Install from AUR

# Using paru
paru -S voxtype

# Or using yay
yay -S voxtype

# Install optional dependencies
sudo pacman -S wtype wl-clipboard

Requires Ubuntu 24.04+ or Debian Trixie+ (glibc 2.38). Older versions can build from source.

Install .deb package

# Download and install (Ubuntu 24.04+, Debian Trixie+)
wget https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype_0.6.3-1_amd64.deb
sudo dpkg -i voxtype_0.6.3-1_amd64.deb
sudo apt-get install -f

# Install optional dependencies
sudo apt install wtype wl-clipboard

Requires Fedora 39+ (glibc 2.38).

Install RPM package

# Download and install (Fedora 39+)
wget https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype-0.6.3-1.x86_64.rpm
sudo dnf install ./voxtype-0.6.3-1.x86_64.rpm

# Install optional dependencies
sudo dnf install wtype wl-clipboard

Build from source

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install build dependencies (Arch)
sudo pacman -S base-devel clang alsa-lib

# Clone and build
git clone https://github.com/peteonrails/voxtype
cd voxtype
cargo build --release

# Install
sudo cp target/release/voxtype /usr/local/bin/

Quick Start

For Hyprland, Sway, and River users. Uses native compositor keybindings—no input group required!

# Download whisper model and configure
voxtype setup --download

# Disable built-in hotkey (we'll use compositor keybindings)
cat >> ~/.config/voxtype/config.toml << 'EOF'

[hotkey]
enabled = false
EOF

# Enable state file (required for toggle mode)
echo 'state_file = "auto"' >> ~/.config/voxtype/config.toml

# Install as systemd service
voxtype setup systemd

# Optional: Fix modifier key interference (if using SUPER+key combos)
voxtype setup compositor hyprland  # or: sway

Then add keybindings to your compositor config:

Hyprland (~/.config/hypr/hyprland.conf)

# Push-to-talk: hold Super+V to record, release to transcribe
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop

# Or use Scroll Lock
bind = , SCROLL_LOCK, exec, voxtype record start
bindr = , SCROLL_LOCK, exec, voxtype record stop

Sway (~/.config/sway/config)

# Push-to-talk: hold $mod+v to record, release to transcribe
bindsym $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop

River (~/.config/river/init)

# Push-to-talk: hold Super+V to record, release to transcribe
riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'

For GNOME, KDE, and other desktops. Uses kernel-level hotkey detection.

# Add user to input group (required for hotkey detection)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect

# Download whisper model and configure
voxtype setup --download

# Install as systemd service (starts on login)
voxtype setup systemd

# Check status
systemctl --user status voxtype