Voice to Text for Linux and macOS

Push-to-talk voice-to-text that works on any desktop. Hold a key, speak, release. Your words appear at the cursor.

Fully Offline Open Source Wayland Optimized
Terminal
$ voxtype
[INFO] Voxtype v0.6.0 starting...
[INFO] Using model: base.en
[INFO] Hotkey: SCROLLLOCK
[INFO] Ready! Hold SCROLLLOCK to record.

# User holds ScrollLock and speaks...
[INFO] Recording started...
[INFO] Recording stopped (2.1s)
[INFO] Transcribing...
[INFO] Transcribed: "Hello world, this is a test."
[INFO] Typed 30 characters

Why Voxtype?

Built specifically for the modern Linux desktop

Works on Any Linux Desktop

Works on GNOME, KDE, Sway, Hyprland, River—Wayland or X11. Use built-in hotkeys or native compositor keybindings.

Native Hyprland, Sway & River Support

Use your compositor's keybindings instead of evdev—no input group required. Bind any key combo like Super+V for push-to-talk.

Fully Offline & Private

All speech recognition happens locally using whisper.cpp. Your voice never leaves your machine. No cloud, no subscriptions, no tracking.

Push-to-Talk

Natural workflow: hold a key to record, release to transcribe. No "wake words", no accidental activations, just controlled dictation.

Highly Configurable

Choose your hotkey, Whisper model size, output mode (type or clipboard), and more. Works the way you want it to.

Smart Fallbacks

Types directly at cursor via wtype, with full CJK/Unicode support. Falls back to ydotool or clipboard. Always works, one way or another.

Fast & Lightweight

Written in Rust for performance. Single binary, minimal dependencies. No Python, no virtual environments—just works.

GPU Acceleration

Optional Vulkan, CUDA, Metal, and ROCm support for blazing-fast inference. Sub-second transcription with large-v3-turbo on modern GPUs.

Waybar Integration

Optional status indicator shows recording state in your bar. See when Voxtype is idle, recording, or transcribing at a glance. 10 built-in icon themes (Nerd Font, Material Design, emoji, and more) or define your own.

Desktop Notifications

Configurable events surface through notify-send, so you always know when recording starts, stops, and what was transcribed.

LLM Post-Processing

Pipe transcriptions through local LLMs for translation, domain-specific vocabulary, or custom workflows. Falls back gracefully on errors.

Remote Whisper Servers

Offload transcription to a self-hosted GPU server running whisper.cpp. Use your laptop while leveraging powerful remote hardware. Can also connect to cloud APIs for those who choose.

See It In Action

Watch Voxtype transform voice into text

Voxtype on Omarchy

Video courtesy of Omarchy, Basecamp, and DHH.

Interactive Demos

1 2 3
foot ~ ai-assistant
vol: 75% 14:32
foot ~ ai-assistant
pete@arch:~/projects/myapp$ ai-assistant
AI Coding Assistant v1.0
Enter your prompt or type /help for commands.
You:
gedit - project-notes.md
File Edit View
# Project Notes
## Overview
|
opencode — main.rs
1use std::io;
2
3fn main() {
4 // TODO: implement user input
5
6}
opencode main.rs 5:1
HOME
Press to record

Dictate an AI Prompt

Watch as Voxtype captures speech and types a prompt directly into an AI coding assistant. Perfect for hands-free interaction with agentic tools.

Choose Your Model

Balance speed and accuracy for your needs. With GPU acceleration, even large-v3 achieves sub-second inference!

Whisper Models (English-only)

Model Size Speed Accuracy Best For
tiny.en 39 MB
Quick notes, low-end hardware
small.en 466 MB
Higher accuracy needs
medium.en 1.5 GB
Professional transcription

Whisper Models (Multilingual)

Model Size Speed Accuracy Best For
tiny 75 MB
Quick notes, any language
base 142 MB
General multilingual use
small 466 MB
Better multilingual accuracy
medium 1.5 GB
Professional multilingual
large-v3 3.1 GB
Maximum accuracy

ONNX Engines (require ONNX binary variant)

Engine / Model Size Speed Accuracy Best For
moonshine-base 237 MB
Fastest CPU inference, English
paraformer-zh 487 MB
Chinese + English bilingual
dolphin-base 198 MB
40 languages + 22 Chinese dialects
omnilingual-large 3.9 GB
1600+ languages, rare and low-resource

.en models are English-only but faster and more accurate for English. All ONNX engines require the ONNX binary variant. Switch with voxtype setup onnx --enable.

Installation

Get up and running in minutes

Other Linux voice typing tools require you to clone a repo, run an install script, set up a Python virtual environment, and remember to activate it every time you reboot. Voxtype is different: it's a single binary. Install it from your package manager, download a Whisper model, and enable the systemd user service. No virtual environments, no dependency conflicts, no activation scripts. It just works, every time you log in.

Install from AUR
# Using paru
paru -S voxtype

# Or using yay
yay -S voxtype

# Install optional dependencies
sudo pacman -S wtype wl-clipboard

Requires Ubuntu 24.04+ or Debian Trixie+ (glibc 2.38). Older versions can build from source.

Install .deb package
# Download and install (Ubuntu 24.04+, Debian Trixie+)
wget https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype_0.6.3-1_amd64.deb
sudo dpkg -i voxtype_0.6.3-1_amd64.deb
sudo apt-get install -f

# Install optional dependencies
sudo apt install wtype wl-clipboard

Requires Fedora 39+ (glibc 2.38).

Install RPM package
# Download and install (Fedora 39+)
wget https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype-0.6.3-1.x86_64.rpm
sudo dnf install ./voxtype-0.6.3-1.x86_64.rpm

# Install optional dependencies
sudo dnf install wtype wl-clipboard
Build from source
# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install build dependencies (Arch)
sudo pacman -S base-devel clang alsa-lib

# Clone and build
git clone https://github.com/peteonrails/voxtype
cd voxtype
cargo build --release

# Install
sudo cp target/release/voxtype /usr/local/bin/

Quick Start

For Hyprland, Sway, and River users. Uses native compositor keybindings—no input group required!

# Download whisper model and configure
voxtype setup --download

# Disable built-in hotkey (we'll use compositor keybindings)
cat >> ~/.config/voxtype/config.toml << 'EOF'

[hotkey]
enabled = false
EOF

# Enable state file (required for toggle mode)
echo 'state_file = "auto"' >> ~/.config/voxtype/config.toml

# Install as systemd service
voxtype setup systemd

# Optional: Fix modifier key interference (if using SUPER+key combos)
voxtype setup compositor hyprland  # or: sway

Then add keybindings to your compositor config:

Hyprland (~/.config/hypr/hyprland.conf)
# Push-to-talk: hold Super+V to record, release to transcribe
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop

# Or use Scroll Lock
bind = , SCROLL_LOCK, exec, voxtype record start
bindr = , SCROLL_LOCK, exec, voxtype record stop
Sway (~/.config/sway/config)
# Push-to-talk: hold $mod+v to record, release to transcribe
bindsym $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop
River (~/.config/river/init)
# Push-to-talk: hold Super+V to record, release to transcribe
riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'

For GNOME, KDE, and other desktops. Uses kernel-level hotkey detection.

# Add user to input group (required for hotkey detection)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect

# Download whisper model and configure
voxtype setup --download

# Install as systemd service (starts on login)
voxtype setup systemd

# Check status
systemctl --user status voxtype

Works Everywhere

Tested on all major Linux desktops. Optimized for Wayland, works on X11 too.

GNOME
KDE Plasma
Sway
Hyprland
River
Any Wayland

We Want to Hear From You

Voxtype is a young project and your feedback helps make it better

Ready to try Voxtype?

Start dictating on your Linux desktop today.