Skip to content

izwi-ai/izwi

Repository files navigation

Izwi icon

Izwi

Local-first audio inference engine for TTS, ASR, and voice AI workflows.

WebsiteDocumentationReleasesGetting Started

Izwi Screenshot


Overview

Izwi is a privacy-focused audio AI platform that runs entirely on your machine. No cloud services, no API keys, no data leaving your device.

Core capabilities:

  • Voice Mode — Real-time voice conversations with AI
  • Text-to-Speech — Generate natural speech from text
  • Studio — Build long-form TTS projects and exports
  • Speech Recognition — Convert audio to text with high accuracy
  • Speaker Diarization — Identify and separate multiple speakers
  • Voice Cloning — Clone any voice from a short audio sample
  • Voice Design — Create custom voices from text descriptions
  • Forced Alignment — Word-level audio-text alignment
  • Chat — Text-based AI conversations

The server exposes OpenAI-compatible API routes under /v1.


Quick Install

macOS

Download the latest .dmg from GitHub Releases:

  1. Open the .dmg file
  2. Drag Izwi.app to Applications
  3. Launch Izwi

Linux

wget https://github.com/izwi-ai/izwi/releases/latest/download/izwi_amd64.deb
sudo dpkg -i izwi_amd64.deb

Windows

Download and run the installer from GitHub Releases.

Full installation guides: macOSLinuxWindowsFrom Source


Quick Start

1. Start the server

izwi serve

Open http://localhost:8080 in your browser.

2. Download a model

izwi pull Qwen3-TTS-12Hz-0.6B-Base

3. Generate speech

izwi tts "Hello from Izwi!" --output hello.wav

4. Transcribe audio

izwi pull Parakeet-TDT-0.6B-v3
izwi transcribe audio.wav

Long-form ASR is handled automatically: Izwi now chunks long recordings, stitches overlapping transcripts, and returns a full transcript instead of only the first model window.

Optional tuning knobs:

IZWI_ASR_CHUNK_TARGET_SECS=24
IZWI_ASR_CHUNK_MAX_SECS=30
IZWI_ASR_CHUNK_OVERLAP_SECS=3
# Optional: preload models at server startup to reduce first-request cold latency.
# Comma-separated model IDs (for example Whisper-Large-v3-Turbo,Qwen3.5-4B)
IZWI_PRELOAD_MODELS=Whisper-Large-v3-Turbo
# Optional: run a short synthetic ASR warmup after preloading (enabled by default).
IZWI_WARMUP_PRELOADED_MODELS=1
IZWI_ASR_WARMUP_DURATION_MS=800
# Optional: tune text streaming queue depth when using per-character ASR streaming.
IZWI_STREAM_TEXT_QUEUE_CAPACITY=4096

Anonymous Analytics (Desktop)

Izwi desktop supports optional, opt-in anonymous usage analytics powered by Aptabase.

  • Disabled by default until users explicitly opt in.
  • Can be enabled during onboarding or later in Settings.
  • Users can opt out at any time.
  • No prompts, transcripts, audio payloads, local paths, or personal identifiers are sent.

To enable analytics transport in the desktop shell, set the app key in the runtime environment:

APTABASE_APP_KEY=A-US-XXXXXXXXXXXXXXX

Use the exact key from Aptabase (for example A-US-... or A-EU-...).

Without this variable, analytics calls are treated as no-op events.


Supported Models

Category Models
TTS Qwen3-TTS 12Hz (0.6B Base/CustomVoice, 1.7B Base/CustomVoice/VoiceDesign), Kokoro-82M
ASR Qwen3-ASR GGUF (0.6B, 1.7B), Parakeet-TDT-0.6B-v3, Whisper-Large-v3-Turbo
Diarization Sortformer 4-speaker
Chat Qwen3 GGUF (0.6B, 1.7B, 4B, 8B), Qwen3.5 GGUF (0.8B, 2B, 4B, 9B), LFM2.5 (1.2B Instruct/Thinking GGUF), Gemma 3 (1B)
Audio LFM2.5-Audio-1.5B-GGUF
Alignment Qwen3-ForcedAligner-0.6B (full, 4-bit)

Run izwi list to see all available models.

Full model documentation: Models Guide


Documentation

Resource Link
Getting Started izwiai.com/docs/getting-started
Installation izwiai.com/docs/installation
Features izwiai.com/docs/features
CLI Reference izwiai.com/docs/cli
Models izwiai.com/docs/models
Troubleshooting izwiai.com/docs/troubleshooting

License

Apache 2.0

Acknowledgments

About

On-device audio AI runtime. Local first transcription, speaker diarization, TTS, and voice cloning with an OpenAI compatible API.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors