Local-first audio inference engine for TTS, ASR, and voice AI workflows.
Website • Documentation • Releases • Getting Started
Izwi is a privacy-focused audio AI platform that runs entirely on your machine. No cloud services, no API keys, no data leaving your device.
Core capabilities:
- Voice Mode — Real-time voice conversations with AI
- Text-to-Speech — Generate natural speech from text
- Studio — Build long-form TTS projects and exports
- Speech Recognition — Convert audio to text with high accuracy
- Speaker Diarization — Identify and separate multiple speakers
- Voice Cloning — Clone any voice from a short audio sample
- Voice Design — Create custom voices from text descriptions
- Forced Alignment — Word-level audio-text alignment
- Chat — Text-based AI conversations
The server exposes OpenAI-compatible API routes under /v1.
Download the latest .dmg from GitHub Releases:
- Open the
.dmgfile - Drag Izwi.app to Applications
- Launch Izwi
wget https://github.com/izwi-ai/izwi/releases/latest/download/izwi_amd64.deb
sudo dpkg -i izwi_amd64.debDownload and run the installer from GitHub Releases.
Full installation guides: macOS • Linux • Windows • From Source
izwi serveOpen http://localhost:8080 in your browser.
izwi pull Qwen3-TTS-12Hz-0.6B-Baseizwi tts "Hello from Izwi!" --output hello.wavizwi pull Parakeet-TDT-0.6B-v3
izwi transcribe audio.wavLong-form ASR is handled automatically: Izwi now chunks long recordings, stitches overlapping transcripts, and returns a full transcript instead of only the first model window.
Optional tuning knobs:
IZWI_ASR_CHUNK_TARGET_SECS=24
IZWI_ASR_CHUNK_MAX_SECS=30
IZWI_ASR_CHUNK_OVERLAP_SECS=3
# Optional: preload models at server startup to reduce first-request cold latency.
# Comma-separated model IDs (for example Whisper-Large-v3-Turbo,Qwen3.5-4B)
IZWI_PRELOAD_MODELS=Whisper-Large-v3-Turbo
# Optional: run a short synthetic ASR warmup after preloading (enabled by default).
IZWI_WARMUP_PRELOADED_MODELS=1
IZWI_ASR_WARMUP_DURATION_MS=800
# Optional: tune text streaming queue depth when using per-character ASR streaming.
IZWI_STREAM_TEXT_QUEUE_CAPACITY=4096Izwi desktop supports optional, opt-in anonymous usage analytics powered by Aptabase.
- Disabled by default until users explicitly opt in.
- Can be enabled during onboarding or later in Settings.
- Users can opt out at any time.
- No prompts, transcripts, audio payloads, local paths, or personal identifiers are sent.
To enable analytics transport in the desktop shell, set the app key in the runtime environment:
APTABASE_APP_KEY=A-US-XXXXXXXXXXXXXXXUse the exact key from Aptabase (for example A-US-... or A-EU-...).
Without this variable, analytics calls are treated as no-op events.
| Category | Models |
|---|---|
| TTS | Qwen3-TTS 12Hz (0.6B Base/CustomVoice, 1.7B Base/CustomVoice/VoiceDesign), Kokoro-82M |
| ASR | Qwen3-ASR GGUF (0.6B, 1.7B), Parakeet-TDT-0.6B-v3, Whisper-Large-v3-Turbo |
| Diarization | Sortformer 4-speaker |
| Chat | Qwen3 GGUF (0.6B, 1.7B, 4B, 8B), Qwen3.5 GGUF (0.8B, 2B, 4B, 9B), LFM2.5 (1.2B Instruct/Thinking GGUF), Gemma 3 (1B) |
| Audio | LFM2.5-Audio-1.5B-GGUF |
| Alignment | Qwen3-ForcedAligner-0.6B (full, 4-bit) |
Run izwi list to see all available models.
Full model documentation: Models Guide
| Resource | Link |
|---|---|
| Getting Started | izwiai.com/docs/getting-started |
| Installation | izwiai.com/docs/installation |
| Features | izwiai.com/docs/features |
| CLI Reference | izwiai.com/docs/cli |
| Models | izwiai.com/docs/models |
| Troubleshooting | izwiai.com/docs/troubleshooting |
Apache 2.0
- Qwen3-TTS by Alibaba
- Parakeet by NVIDIA
- Gemma by Google
- HuggingFace Hub for model hosting
