Features
Izwi provides a comprehensive suite of audio AI capabilities. Each feature is accessible via the web UI, desktop app, and command line.
Core Features
| Feature | Description | Guide |
|---|---|---|
| Voice | Real-time voice conversations with AI | Voice Guide |
| Chat | Text-based AI conversations | Chat Guide |
| Text-to-Speech | Generate natural speech from text | TTS Guide |
| Studio | Manage long-form TTS projects and exports | Studio Guide |
| Transcription | Convert audio to text | Transcription Guide |
| Diarization | Identify multiple speakers | Diarization Guide |
| Voice Cloning | Clone voices from audio samples | Voice Cloning Guide |
| Voice Design | Create voices from descriptions | Voice Design Guide |
Feature Comparison
| Feature | Web UI | Desktop | CLI | API |
|---|---|---|---|---|
| Voice | ✓ | ✓ | — | ✓ |
| Chat | ✓ | ✓ | ✓ | ✓ |
| Text-to-Speech | ✓ | ✓ | ✓ | ✓ |
| Studio | ✓ | ✓ | — | ✓ |
| Transcription | ✓ | ✓ | ✓ | ✓ |
| Diarization | ✓ | ✓ | — | ✓ |
| Voice Cloning | ✓ | ✓ | ✓ | ✓ |
| Voice Design | ✓ | ✓ | ✓ | ✓ |
Getting Started
Start the server:
izwi serveOpen the web UI:
http://localhost:8080Download required models:
izwi pull Qwen3-TTS-12Hz-0.6B-Base izwi pull Qwen3-ASR-0.6B-GGUF izwi pull Qwen3-8B-GGUFModel Requirements
Different features require different models:
| Feature | Required Models |
|---|---|
| Voice | TTS + ASR + Chat model (or unified LFM2.5-Audio-1.5B-GGUF) |
| Chat | Chat model (Qwen3, Qwen3.5, LFM2.5, or Gemma) |
| Text-to-Speech | TTS model |
| Studio | TTS model |
| Transcription | ASR model (Parakeet-TDT-0.6B-v3 default; Qwen3/Whisper/LFM2.5 also supported) |
| Diarization | diar_streaming_sortformer_4spk-v2.1 (+ optional ASR and aligner models) |
| Forced Alignment | Qwen3-ForcedAligner-0.6B (or -4bit) |
| Voice Cloning | Qwen3 TTS Base model (Qwen3-TTS-12Hz-*-Base*) |
| Voice Design | Qwen3 TTS VoiceDesign model (Qwen3-TTS-12Hz-1.7B-VoiceDesign*) |
Next Steps
Choose a feature to learn more:
- Voice Mode — Real-time conversations
- Text-to-Speech — Generate speech
- Studio — Build long-form TTS projects
- Transcription — Convert audio to text