IzwiIzwi

Features

Izwi provides a comprehensive suite of audio AI capabilities. Each feature is accessible via the web UI, desktop app, and command line.

Core Features

FeatureDescriptionGuide
VoiceReal-time voice conversations with AIVoice Guide
ChatText-based AI conversationsChat Guide
Text-to-SpeechGenerate natural speech from textTTS Guide
StudioManage long-form TTS projects and exportsStudio Guide
TranscriptionConvert audio to textTranscription Guide
DiarizationIdentify multiple speakersDiarization Guide
Voice CloningClone voices from audio samplesVoice Cloning Guide
Voice DesignCreate voices from descriptionsVoice Design Guide

Feature Comparison

FeatureWeb UIDesktopCLIAPI
Voice
Chat
Text-to-Speech
Studio
Transcription
Diarization
Voice Cloning
Voice Design

Getting Started

Start the server:

izwi serve

Open the web UI:

http://localhost:8080

Download required models:

izwi pull Qwen3-TTS-12Hz-0.6B-Base izwi pull Qwen3-ASR-0.6B-GGUF izwi pull Qwen3-8B-GGUF

Model Requirements

Different features require different models:

FeatureRequired Models
VoiceTTS + ASR + Chat model (or unified LFM2.5-Audio-1.5B-GGUF)
ChatChat model (Qwen3, Qwen3.5, LFM2.5, or Gemma)
Text-to-SpeechTTS model
StudioTTS model
TranscriptionASR model (Parakeet-TDT-0.6B-v3 default; Qwen3/Whisper/LFM2.5 also supported)
Diarizationdiar_streaming_sortformer_4spk-v2.1 (+ optional ASR and aligner models)
Forced AlignmentQwen3-ForcedAligner-0.6B (or -4bit)
Voice CloningQwen3 TTS Base model (Qwen3-TTS-12Hz-*-Base*)
Voice DesignQwen3 TTS VoiceDesign model (Qwen3-TTS-12Hz-1.7B-VoiceDesign*)

Next Steps

Choose a feature to learn more: