CrisperWeaver

On-device speech recognition + text-to-speech. No cloud. 40+ model families, one app.

CrisperWeaver is a cross-platform Flutter app for fully-offline audio transcription and speech synthesis. Drop in a file, paste a URL, or record with the mic — audio never leaves the device. 40+ open-weight ASR families and 20+ TTS families are supported through a single unified engine (CrispASR): Whisper (+ Distil-Whisper), Parakeet (TDT/CTC/RNNT variants + Japanese), Canary, Voxtral, Qwen3-ASR (+ Mega-ASR LoRA), Cohere, Granite (3.x + 4.1 family), FastConformer-CTC, Canary-CTC, Wav2Vec2 / HuBERT / Data2Vec, OmniASR (CTC + LLM + streaming), FireRed, Kyutai-STT, GLM-ASR, Moonshine (+ German variants), VibeVoice ASR, MiMo ASR, Gemma4-E2B (140+ languages), FunASR (+ 31-lang MLT), Paraformer-ZH, SenseVoice (built-in language ID) — plus Kokoro / VibeVoice / Qwen3-TTS (0.6B + 1.7B + VoiceDesign + Gwen-TTS) / Orpheus (+ German variants) / Chatterbox (+ Turbo + Kartoffelbox + Lahgtna Arabic) / IndexTTS / VoxCPM2 / CosyVoice3 / F5-TTS / Bark / CSM / Dia / FastPitch / MeloTTS / OuteTTS / Parler-TTS / Pocket TTS / SpeechT5 / KugelAudio / Piper for synthesis, Pyannote v3 for ML diarisation, FireRedPunc + fullstop-punc + BiLSTM truecaser for punctuation/capitalisation restoration, CLD3 + GlotLID + FastText LID-176 for text language detection, Silero LID for audio language detection, and FireRed/MarbleNet/Whisper-VAD-EncDec VAD options.

Part of the Crisp ecosystem

Project	Role
CrispASR	C++ ASR + TTS engine powering this app — 40+ ASR + 20+ TTS backends, CLI + C-ABI
CrisperWeaver	This app — Flutter GUI for CrispASR
CrispEmbed	Text embedding engine (ggml) — XLM-R, Qwen3-Embed, Gemma3, dense + sparse + ColBERT
Susurrus	Python ASR GUI with 9 backends (faster-whisper, mlx-whisper, voxtral, ...)

What you can do

Transcribe any audio file — WAV / MP3 / FLAC decoded on-device, no ffmpeg required.
Batch-process multiple files — drop many onto the window or onto the Batch Queue card, "Transcribe all" drains serially; each run lands in History.
Record from the mic and transcribe live.
Paste a URL to a remote file; CrisperWeaver downloads and processes it.
Drag and drop files onto the transcription screen (desktop) or directly on the batch queue.
Receive shared audio from the OS share sheet (Android / iOS / macOS).
Choose your model family and quantisation — q4_0 / q5_0 / q4_k / q5_k / q6_k / q8_0 variants plus f16 originals. Model picker filters by name + backend; Model Management screen auto-probes HuggingFace to discover every available quant.
Add from HuggingFace repo — link-icon button on the Models screen accepts any OWNER/NAME HF repo (e.g. cstr/voxtral-mini-3b-2507-GGUF), lists the GGUF / .bin files under your chosen backend, and registers them as downloadable models. Mirror of crispasr --hf-repo on the CLI side; escape hatch for repos not yet in the baked catalogue.
Download and manage models from a built-in browser — parallel queue, resume, SHA-1 verify, cancel, delete. Companion files (codec / voice / tokenizer GGUFs) auto-download alongside their parent model, including for runtime-discovered HF quants.
Advanced decoding knobs — translate-to-English (Whisper), beam search, initial-prompt vocabulary bias (huge win for domain audio), audio Q&A prompt for instruct-tuned LLM backends (Voxtral / Qwen3), source + target language pickers for true speech translation.
Tune the decoder live — best-of-N slider (1–10, picks the highest-scoring of N decodes; works on every backend) and decoder temperature slider for every backend crispasr_session_set_temperature honours (canary, cohere, parakeet, moonshine, voxtral, qwen3, granite, glm-asr, gemma4, omniasr-llm, kyutai-stt).
Tune the VAD — pick between Silero (bundled), FireRedVAD (F1 97.57%), MarbleNet, Whisper-VAD-EncDec; live sliders for threshold, min-speech-ms, min-silence-ms, speech-pad-ms.
Pick the diarisation method — vad-turns (default, mono-friendly), pyannote (ML, GGUF), stereo energy, stereo cross-correlation.
Pick the language-detection method — Whisper-encoder LID (reuses an existing model) or Silero 95-langs (faster + smaller).
See live performance numbers — real-time factor, words per second, wall-clock.
Get word-level timestamps and language auto-detection (via Whisper).
Stream transcription from long-running mic input — partial transcripts appear in the output card while you talk (10 s sliding window / 3 s step).
Watch long files transcribe in real time — chunked Whisper splits >60 s files into 30 s windows and streams segments through as each finishes, instead of waiting until the end.
Export to .txt, .srt, .vtt, .json, .csv, .lrc (lyrics), or .wts (Whisper Text Segments debug) through the system share sheet.
Review history — every run is persisted as JSON and browseable / re-exportable, with speaker renames preserved across launches.
Rename diariser speakers — tap a speaker chip in the output to override the auto-assigned label ("Speaker 1" → "Alice"); the new name persists in history.
See where storage went — Settings → Storage breakdown lists per-backend disk usage with a one-click "delete all of X" action.
Diagnose with logs — in-app viewer with filter / search / copy / export, optional file sink.
Synthesize speech (TTS) — pick a downloaded TTS model + voice + codec on the Synthesize screen, type text, hit Synthesize; output plays in-app and saves as WAV. Advanced section adds trim-silence, 0.25×–4× speed slider, reference-transcript field for runtime voice cloning, and natural-language voice-design prompts (qwen3-tts VoiceDesign).
Translate text — dedicated Translate screen wired to CrispASR's crispasr_session_translate_text. Pick from M2M-100 (100 langs, any-to-any), WMT21 Dense (en↔X, two separate checkpoints), or MADLAD-400 (419 langs); source/target language dropdowns with one-click swap, max-tokens slider, copy-to-clipboard.
Bake your own voice — Bake voice screen drives CrispASR's bake-chatterbox-voice-from-wav.py via Process.start: pick a WAV reference, set the output filename, hit Bake. Stdout + stderr stream into a live tail panel + the in-app Log viewer; the resulting GGUF is dropped straight into the models directory so it shows up in the voicepack picker on the next open. Desktop-only — mobile has no Python runtime. Available from the cake icon in the Synthesize app-bar.
Restore punctuation — FireRedPunc (ZH+EN) or fullstop-punc (EN/DE/FR/IT) toggle in Advanced Options; turns wav2vec2 / fastconformer-ctc / firered-asr lowercase output into properly punctuated text. Picker chooses which family runs when both are downloaded.
Use CrisperWeaver in English or German — full i18n scaffold via flutter_localizations, 866 keys per locale covering every user-facing string (guarded by an ARB-consistency test).
Capture system audio — "Transcribe what's playing in Zoom / YouTube / any app." ScreenCaptureKit on macOS 13+, parec on Linux, ffmpeg-WASAPI loopback on Windows, MediaProjection on Android 10+; iOS deliberately unsupported (Apple's sandbox forbids it).
Boost custom vocabulary — chip list in Advanced Options; routed per-backend-class via initial_prompt (whisper / moonshine), setAsk prefix (audio-LLM backends), or no-op (CTC-style with explanatory helper text).
Edit any segment inline — long-press → edit dialog. Edits update both AppState AND the on-disk history JSON so they survive a reload. Edited segments get a pencil-icon marker.
Search history — substring filter on title + transcript with yellow-highlighted matches; auto-expand of matching entries.
Edit audio with waveform sync — dedicated audio editor screen with trim / cut middle / split into chapters + an optional collapsible transcript pane on the same screen. Tap a segment → playhead jumps. Long-press a segment → Select / Trim to / Mark for split. Tap the waveform → matching segment highlights. Pure-Dart, no FFmpeg, identical on every platform.
Tidy any transcript — pure-Dart deterministic pass (filler removal, repeat collapsing, sentence-case, punctuation fix, optional annotation strip) with a before/after preview of the first three segments. Optional opt-in LLM pass on top via a three-mode selector: Off (deterministic only), Cloud (bring your own OpenAI-compatible endpoint — OpenAI / Anthropic via proxy / OpenRouter / Groq / Cerebras / Together / local llama-server; key stays on device), or Local (point at a GGUF chat model on disk and run it on-device via Metal / CUDA / CPU through libcrispasr's chat ABI — no network, no API key, model stays warm across passes).
Summarise meetings — Action Items / Key Topics / Decisions as structured Markdown, via either the cloud BYOK endpoint or the on-device chat model — same three-mode chooser as the cleanup pass.
Save presets — bundle (backend, modelId, language, AdvancedOptions) into a named preset. One tap restores all four atomically. JSON-backed with schema-versioned migration so a stale preset on a newer build doesn't crash.
Push-to-transcribe from anywhere — desktop-only system hotkey (macOS / Linux / Windows). Push-to-talk OR toggle behaviour; combo parser handles modifier aliases (cmd / command / win / super → meta, ctrl → control, option → alt).
Clone a voice in three steps — guided wizard launched from the Synthesize screen: record 10 s OR pick a WAV → type the reference transcript → hand off to Synthesize with everything pre-populated. Runs on top of the existing chatterbox / indextts / qwen3-tts-base / vibevoice runtime-cloning surface.
Whisper subtitle formatting — tokens-per-segment cap + split-on-word toggle in Advanced Options. Produces SRT-friendly short subtitle lines instead of long-paragraph segments. Split-on-punct additionally breaks at sentence-ending punctuation (. ! ?) — works with any backend, not just whisper.
Semantic transcript search — search your history by meaning, not just keywords. When a small embedding model is downloaded (~23 MB), History search uses real vector embeddings (cosine similarity) instead of substring matching. Persisted embeddings avoid re-encoding. Cross-modal audio embeddings supported with larger models.
Subtitle overlay / teleprompter — fullscreen always-on-top transparent overlay showing live streaming transcription as subtitles. Font size, position, and background controls. On macOS the window floats above other apps via platform channel.
Compare transcripts side by side — re-transcribe the same audio with two models in parallel (two single-worker pools via Future.wait) and see a word-level diff with Jaccard similarity stats. Helps pick the best model for your domain.
Tag transcript segments — annotate segments with bookmark, action-item, question, important, highlight, decision, or follow-up tags. Tags persist in history and render as emoji badges.
Keyboard-navigate transcripts — J/K to jump segments, Space to play/pause, Enter to edit, Tab to jump to next low-confidence word. Desktop power-user feature.
Export to note-taking tools — Obsidian (YAML frontmatter + timestamped bullets), Notion (speaker headers), Logseq (indented blocks), YouTube chapters (HH:MM:SS lines).
Watch a folder for new audio — point CrisperWeaver at a directory and new audio files are auto-transcribed. Duplicate files are auto-skipped via audio fingerprinting. Desktop-only; configurable in Settings.
TTS pronunciation lexicon — user-editable word-to-pronunciation override table for proper nouns, acronyms, and domain terms. Applied before TTS synthesis.
Detect chapters automatically — topic-shift detection via vocabulary analysis, exportable as YouTube chapters or Podcasting 2.0 JSON.
Confidence heatmap — toggle in the transcript output menu to color-code words by decoder confidence (background gradient: green → yellow → orange → red).
Synthetic content compliance — every TTS-generated WAV is watermarked (native CrispASR spread-spectrum, upgradeable to AudioSeal neural watermark via GGUF; pure-Dart LSB fallback on older dylibs) and carries LIST INFO provenance metadata identifying it as AI-generated. MP3 exports carry ID3v2.3 TXXX frames for AI provenance (AI_GENERATED, GENERATOR, AI_CONTENT_NOTICE). Voice-cloned TTS output is prepended with a 3-beep 880 Hz audio disclaimer (EU AI Act Art. 50(4)). Post-embed watermark verification detects the watermark immediately after embedding and warns if absent. Consent attestation audit logging ([CONSENT] ts=… model=… voice=… attestation="user consent") records every voice-cloned synthesis in the same format used by CrispASR and CrispTTS. Speaker enrollment requires explicit biometric consent (GDPR Art. 9); consent records are persisted and deleted alongside speaker profiles. Exports support an optional machine-readable disclosure notice. See About → Synthetic Content Compliance in the app.

Supported models

One dispatcher (CrispasrSession) handles every backend; bundled libcrispasr reports at startup which are linked in the current build, and the Models screen filter chips group them by kind (ASR / TTS / Voices / Codecs / Post-processors). The Model Management screen also probes CrispASR's built-in C-side registry on every open, so any backend the bundled libcrispasr knows about appears even if it isn't hardcoded in the app catalog.

ASR

Family	Sizes	Languages	Notes
Whisper	tiny → large-v3 + q4_0/q5_0/q8_0	99	Word-level ts, lang-detect, streaming
Distil-Whisper	distil-large-v3 (f16 / q5_0 / q4_k)	99	~6× faster than large-v3, ~50% smaller
Parakeet (NVIDIA)	TDT 0.6b v2/v3, 1.1b; CTC 0.6b/1.1b; TDT+CTC 110m/1.1b; RNNT 0.6b/1.1b; JA 0.6b	25 EU (auto-detect) / JA	Fast, native word timestamps
Canary (NVIDIA)	1b-v2	25 EU (explicit src/tgt)	Speech translation X ↔ en
Qwen3-ASR	0.6b, 1.7b	30 + 22 Chinese dialects	Multilingual
Mega-ASR	1.7b (Qwen3-ASR + robustness LoRA)	30 + 22 Chinese dialects	LoRA merged offline, qwen3 runtime
Cohere	03-2026	13	High-accuracy Conformer decoder
Granite Speech	3.2-8b, 3.3-2b/8b, 4.0-1b, 4.1-2b	en fr de es pt ja	Instruction-tuned
Granite Speech 4.1	2B / 4.1+ / 4.1-NAR	en fr de es pt ja	Instruction-tuned + parallel decode
FastConformer-CTC	small → xxlarge	en	Low-latency CTC
Canary-CTC	1b	25 EU	CTC variant of canary
Voxtral Mini	3B (2507), 4B realtime (2602)	8 / 13	Speech translation; realtime 4B
Wav2Vec2 / HuBERT / Data2Vec	large-xlsr-53 (EN, DE, multi) + HuBERT-Large + Data2Vec-Base	per-model	Self-supervised CTC family
OmniASR	CTC 300M/1B, LLM 300M/1B	1600+ languages	CTC + LLM variants
OmniASR LLM unlim.	300M v2 streaming	1600+ languages	Streaming, 15 s protocol, unlimited audio
FireRed ASR2	aed-2b	zh / en	AED-style
Kyutai STT	1b	en	Streaming-style
GLM-ASR Nano	nano	multilingual	GLM-family
Moonshine	tiny / base + streaming + DE variants	en / de	Tiny CPU-friendly, BPE tokenizer companion
VibeVoice ASR	large	multilingual	Large multilingual ASR (~4.5 GB)
MiMo ASR	2.5B + tokenizer companion	en zh	XiaomiMiMo, two-file (model + codec)
Gemma4-E2B	2B (q4_k)	140+ languages	USM Conformer + Gemma-4 35L
FunASR Nano	2512 (f16 / q4_k), MLT 2512	zh + en (nano) / multi (mlt)	Alibaba's compact ASR, 70-block SANM
Paraformer-ZH	base (f16 / q4_k / q8_0)	zh	FunASR family, NAR Mandarin ASR
SenseVoice	small (f16 / q4_k / q8_0)	zh en ja ko yue	Built-in LID + use_itn punctuation
MOSS-Audio	4B Instruct (q4_k ~3.8 GB)	multilingual	ASR + audio QA + scene description (Whisper enc + Qwen3 LLM)

TTS

Family	Sizes	Notes
Kokoro	82M + voicepacks	Multilingual, espeak-ng phonemiser bundled
VibeVoice	realtime 0.5B (f32 + tokenizer), 1.5B base	+ voicepack via `setVoice`; 1.5B supports runtime WAV cloning
Qwen3-TTS	0.6B base + customvoice + codec, 1.7B base/CV/VoiceDesign, Gwen-TTS (Vietnamese)	Customvoice has 9 baked speakers; VoiceDesign + Parler-TTS accept natural-language voice descriptions via `setInstruct`
Orpheus	3B + SNAC codec + German variants (lex-au, Kartoffel natural/synthetic)	8 baked English speakers; SNAC via `setCodecPath`
Chatterbox	850 MB base + turbo / Kartoffelbox (DE) / Lahgtna (Arabic)	T3 AR + S3Gen flow-matching; voice cloning via baked GGUF
IndexTTS	1.6 GB	GPT-2 AR + BigVGAN; zero-shot WAV cloning, ZH+EN
VoxCPM2	q4_k (1.6 GB) + f16	Tokenizer-free diffusion AR; zero-shot, 29 languages, 48 kHz native
CosyVoice3	0.5B (LLM + flow + HiFT + voices)	9 languages + 18 Chinese dialects; zero-shot voice cloning
F5-TTS	v1 Base (~953 MB)	DiT flow-matching; zero-shot voice clone from 3-15 s WAV
Bark	small (~500 MB)	3-stage GPT-2; multilingual, 10 German speakers
CSM	1B (q4_k ~1.4 GB)	Sesame CSM-1B conversational TTS, single EN voice
Dia	1.6B (f16 ~3 GB) + DAC codec	Dialogue TTS with `[S1]`/`[S2]` speaker tags, English
FastPitch	60M (~120 MB)	NVIDIA non-autoregressive parallel TTS, deterministic, EN
MeloTTS	v2 (4 EN speakers) + v3 + BERT companion	VITS2, 44.1 kHz
OuteTTS	0.3 1B + WavTokenizer decoder	OLMo-1B + VQ-GAN; voice clone via JSON speaker, EN
Parler-TTS	Mini v1.1 (~900 MB)	Describe voice in natural language via `setInstruct`, EN
Pocket TTS	100M (~220 MB)	Kyutai continuous-latent AR; voice clone from WAV, EN
SpeechT5	80M (~300 MB)	Microsoft AR mel decoder + HiFi-GAN, EN
Zonos	v0.1 (q4_k ~872 MB, f16 ~3.1 GB) + DAC codec	Zyphra 500M — emotion, pitch, rate control, speaker cloning, 44.1 kHz
KugelAudio	0 Open (f16 ~14 GB)	Large TTS model
Piper	15-60 MB per voice	VITS, 250+ community voices, 30+ languages

Post-processors

Family	Languages	Notes
PCS	47 languages	All-in-one: punctuation + truecasing + sentence boundaries (XLM-R)
FireRedPunc	ZH + EN	BERT-based punctuation + capitalisation
Fullstop-punc	EN / DE / FR / IT	Multilingual punctuation restoration
Truecaser LSTM	DE / EN / ES / RU	BiLSTM character-level truecasing (97.9% F1 German)

Diarisation / LID / VAD GGUFs

Family	Role	Notes
Pyannote v3 seg	Diarisation	ML segmentation, up to 3 speakers per slice
TitaNet-Large	Speaker embedding	192-d L2-normalised; pair with enrolled speakers
Silero LID 95	Audio language ID	95-language classifier, ~16 MB GGUF — faster than Whisper LID
ECAPA-TDNN LID	Audio language ID	107 languages, ~42 MB
FireRed LID	Audio language ID	120 languages, ~300 MB
CLD3	Text language ID	109 languages, ~440 KB — powers Translate auto-detect
GlotLID v3	Text language ID	2102 languages (ISO 639-3), ~250 MB
FastText LID-176	Text language ID	176 languages, ~63 MB
FireRedVAD	VAD	F1 97.57%, ~3 MB
MarbleNet VAD	VAD	Small (~500 KB), EN/DE/FR/ES/RU/ZH
Whisper-VAD-EncDec	VAD (experimental)	Japanese-trained (works on EN too), ~22 MB

Downloads pull f16 from ggerganov/whisper.cpp and quantised variants from cstr/whisper-ggml-quants and other cstr/*-GGUF repos. Skip-checksum toggle in Settings for custom or mirrored GGUFs.

Platforms

Platform	State
macOS	✅ Released — `.app.zip`, Metal-enabled, all 24+ backend dylibs (incl. kokoro / orpheus / mimo-asr) bundled, espeak-ng auto-bundled for kokoro phonemisation
Linux	✅ Released — `.tar.gz` bundle
Windows	✅ Released — portable `.zip` + installable `.msix` with Explorer "Open With" registration for audio + subtitle types
Android	✅ Released — real-ASR APK (`arm64-v8a`) with `libwhisper.so` cross-built in CI. External models dir picker requests "All files access" runtime permission (Android 11+) so reinstalls don't lose access to `/storage/emulated/0/...` paths.
iOS	⚠️ Unsigned IPA — sideload via SideStore / AltStore / Feather
Web (PWA)	✅ Live at `crisperweaver-web.vercel.app` — ASR + TTS via CrispASR HF Space cloud engine (9 ASR + 5 TTS backends). Client-side text embeddings via CrispEmbed WASM (~19 MB model, ~50-100ms/sentence). Auto-deploys from GitHub Actions.

Roadmap and blockers: see PLAN.md.

Building

Prerequisites

Flutter 3.38.x (stable)
Xcode + CocoaPods for iOS / macOS
JDK 17 + Android SDK for Android
CMake + a C++ toolchain to build libwhisper from CrispASR
Optional: Metal (macOS), CUDA / Vulkan (Linux/Windows), Core ML (iOS), NNAPI (Android)

Clone the two repos side-by-side

mkdir crisperweaver && cd crisperweaver
git clone https://github.com/CrispStrobe/CrispASR.git
git clone https://github.com/CrispStrobe/CrisperWeaver.git

pubspec.yaml refers to the Dart FFI package via path: ../CrispASR/flutter/crispasr.

Desktop one-shot (recommended)

Each desktop platform has an end-to-end script that configures + builds CrispASR's libwhisper / whisper.dll, runs flutter build, and bundles every needed dynamic library next to the runner. They expect the sibling CrispASR checkout described above.

# macOS
./scripts/build_macos.sh release

# Linux
./scripts/build_linux.sh release

# Windows (cmd.exe — picks up pwsh if installed, falls back to powershell.exe)
build_windows.bat release
# or directly:
pwsh -File scripts\build_windows.ps1 release

Pass debug instead of release for a debug build; add --rebuild-cmake (or -RebuildCmake on Windows) to force a fresh CrispASR cmake configure. Each script's output ends with a path to the runnable bundle / .app / .exe.

The bundlers can also be invoked standalone (scripts/bundle_macos_dylibs.sh, scripts/bundle_linux_libs.sh, scripts/bundle_windows_dlls.ps1) if you've already built CrispASR and flutter build <platform> separately and just want to drop the libs in place.

At runtime, CrispASREngine resolves the library by probing platform-specific names — crispasr.dll / libcrispasr.dylib / libcrispasr.so first, then the whisper-named alias — under the bundle, the user's CrispASR checkout, system lib dirs, and any user-supplied override path.

Manual / iterative dev

cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release \
      -DBUILD_SHARED_LIBS=ON -DWHISPER_METAL=ON
cmake --build build --parallel --target crispasr

cd ../CrisperWeaver
flutter pub get
flutter run -d macos        # or: linux, windows, android, ios

flutter run picks up the freshly-built CrispASR shared library directly from ../CrispASR/build/src/ for fast inner-loop work.

Using it

First run: open Settings → Manage models and download the model you want. Default pick is Whisper base (~140 MB, covers 99 languages).
Transcribe a file: back to the main screen, drop a file or click the picker. Language auto-detects by default.
Record from the mic: use the recorder card. Stop → transcribe.
Stream from mic (Whisper): toggle Stream in the recorder. Partial text arrives as you speak.
Synthesize speech (TTS): tap the Synthesize icon in the app-bar (next to Models). Pick a downloaded TTS model + voice + codec, type text, hit Synthesize — output plays in-app and can be saved as WAV via the share sheet.
Export the result: share-sheet button → pick TXT / SRT / VTT / JSON.
Browse past runs: History in the drawer — everything is persisted as JSON under <app-docs>/history/.

Useful Settings:

Preferred engine — CrispASR is the default.
Default model / language — pre-select across new transcriptions.
Custom models directory — point CrisperWeaver at an existing GGUF library on an external disk (the Models directory picker), e.g. /Volumes/backups/ai/crispasr, to reuse already-downloaded models instead of re-fetching into the app sandbox. Cross-platform; the model picker reads flat GGUFs from that folder and "Use default" restores the sandbox <app-docs>/models/whisper_cpp location. (On Android 11+ an external/shared-storage path needs the one-time "All files access" grant.)
Skip checksum verification — custom / mirrored GGUFs.
Log level + file sink — <app-docs>/logs/session.log for bug reports.

Testing

The default test pass is fast and offline:

flutter analyze    # 0 issues — see analysis_options.yaml for the strict rule set
flutter test       # 378 tests, ~20 s — widget unit + service / persistence / batch / dispatch

test/backend_dispatch_test.dart always runs the cheap dispatch checks (every backend appears in CrispasrSession.availableBackends(), opens with bogus paths fail cleanly). The expensive end-to-end synth / transcribe roundtrips are tagged slow and skip themselves silently unless their CRISPASR_TEST_<BACKEND>_MODEL env var points at a downloaded GGUF. Opt in with:

M=/path/to/crispasr-models
CRISPASR_TEST_KOKORO_MODEL=$M/kokoro-82m-q8_0.gguf \
CRISPASR_TEST_KOKORO_VOICE=$M/kokoro-voice-af_heart.gguf \
CRISPASR_TEST_QWEN3_TTS_MODEL=$M/qwen3-tts-12hz-0.6b-customvoice-q8_0.gguf \
CRISPASR_TEST_QWEN3_TTS_CODEC=$M/qwen3-tts-tokenizer-12hz.gguf \
CRISPASR_TEST_VIBEVOICE_MODEL=$M/vibevoice-realtime-0.5b-tts-f32-tokenizer.gguf \
CRISPASR_TEST_VIBEVOICE_VOICE=$M/vibevoice-voice-en-Emma_woman.gguf \
CRISPASR_TEST_ORPHEUS_MODEL=$M/orpheus-3b-base-q8_0.gguf \
CRISPASR_TEST_ORPHEUS_CODEC=$M/snac-24khz.gguf \
CRISPASR_TEST_MIMO_ASR_MODEL=$M/mimo/mimo-asr-q4_k.gguf \
CRISPASR_TEST_MIMO_ASR_TOKENIZER=$M/mimo/mimo-tokenizer-q4_k.gguf \
CRISPASR_TEST_COSYVOICE3_MODEL=$M/cosyvoice3-llm-q4_k.gguf \
CRISPASR_TEST_WHISPER_MODEL=$M/ggml-tiny.bin \
flutter test --tags slow test/backend_dispatch_test.dart

The cosyvoice3 roundtrip needs its flow / hift / voices companions sitting beside the LLM GGUF (auto-discovered by filename). There is also an f5-tts roundtrip (CRISPASR_TEST_F5TTS_MODEL=$M/f5-tts-v1-base-f16.gguf, zero-shot clone from test/jfk.wav) — left out of the line above on purpose: its DiT synth is extremely slow (~50 min for one short sentence on M1 Metal), so opt into it only deliberately and on its own.

A single sweep takes ~25 min on M1 Metal end-to-end (full sweep — 6 backends including a 3 B Llama and a 4 GB f32 vibevoice). Each backend skips if its env var isn't set, so partial sweeps are cheap. Coverage:

flutter test --coverage      # writes coverage/lcov.info
genhtml coverage/lcov.info -o coverage/html  # if lcov is installed

See dart_test.yaml for the tag config and PLAN.md §5.18 for the test-suite speed roadmap (the architectural win is a persistent ggml-metal pipeline cache via MTLBinaryArchive, which would cut the cold-start kernel-JIT cost from minutes to seconds).

CI & releases

ci.yml — on push / PR: flutter analyze + flutter test on Ubuntu and macOS, plus debug .app and Linux bundle uploaded as workflow artifacts. Checks out sibling CrispStrobe/CrispASR automatically. Slow tests (--tags slow) are skipped by default; CI can opt in by setting the CRISPASR_TEST_* env vars and adding --tags slow to the test command. Analyzer is configured to error on use_build_context_synchronously, avoid_print, unused_*, inference_failure_*, and deprecated_member_use — a regression fails the build instead of silently piling up.
release.yml — on a vX.Y.Z tag (or manual dispatch): builds and uploads
- crisper_weaver-macos.zip — Metal-enabled .app, ad-hoc signed, all 24+ backend dylibs (kokoro / orpheus / mimo-asr included as of CrispASR ba7d6ed) bundled.
- crisper_weaver-linux-x64.tar.gz — GTK-3 desktop bundle with all backend .so's.
- crisper_weaver-windows-x64.zip — portable zip with runner.exe + whisper.dll + sibling backend DLLs.
- crisper_weaver-windows-x64.msix — installable MSIX that registers "Open With → CrisperWeaver" for audio + subtitle types in Explorer. Unsigned for now (see Windows install below).
- crisper_weaver-android-arm64.apk — real ASR. CrispASR cross-built via build-android.sh --abi arm64-v8a; libwhisper.so and sibling backend .so's dropped into jniLibs/arm64-v8a/.
- crisper_weaver-ios-unsigned.ipa — sideload-compatible (see below).

Both workflows honour CRISPASR_REPO / CRISPASR_REF env vars at the top of each file, in case you maintain a fork of CrispASR.

Sideloading iOS

We don't pay for the Apple Developer Program yet, so the iOS IPA in releases is unsigned. To install it on your own device, use a sideload service:

SideStore — iOS app installer that uses your own Apple ID for signing (free: 7-day re-sign; paid ADP: 1-year). Self-hosted via StosVPN or pair-with-computer.
AltStore — the original self-sign flow; requires a desktop-side companion (AltServer).
Feather — open-source alternative.

All three accept the .ipa directly: download crisper_weaver-ios-unsigned.ipa from the release page, open in SideStore (Files → share to SideStore), tap Install. The free-tier 7-day limit means you'll need to re-sign weekly unless you also have a paid Apple Developer account.

Installing on Windows

Two artefacts ship per release; pick one:

crisper_weaver-windows-x64.zip — portable. Unzip anywhere, run crisper_weaver.exe. No installer, no file associations, doesn't survive across machines cleanly. Good for quick trials and tech users.
crisper_weaver-windows-x64.msix — installable. Lands in Apps & features, uninstalls cleanly, and registers "Open With → CrisperWeaver" in Explorer for .wav / .mp3 / .m4a / .flac / .ogg / .aac / .opus / .wma / .srt / .vtt. Recommended for everyday use.

The MSIX is not signed by a Microsoft Store-trusted CA yet (Microsoft Store registration is on the roadmap), so the first install needs a one-time trust step:

Download crisper_weaver-windows-x64.msix from the release page.
Right-click → Properties → tick Unblock (under the General tab, bottom right).
Double-click the .msix → Install. Windows Security may warn ("Untrusted publisher") — accept once.
After install, right-click any audio file in Explorer → Open With → CrisperWeaver.

Once we publish to Microsoft Store, that warning goes away — winget install CrisperWeaver and the Store listing will both ship the same MSIX, signed.

Roadmap

The short version:

CoreML acceleration for Whisper on iOS/macOS (enable WHISPER_USE_COREML inside CrispASR, ship the paired .mlmodelc next to the .bin).
Swap the MFCC/k-means diarization stopgap for the shared-lib crispasr_diarize_segments_abi introduced in CrispASR v0.4.5 (pyannote GGUF + energy/xcorr/vad-turns in one call).
Wire the shared crispasr_detect_language_pcm (v0.4.6) for auto-language on backends that lack native LID.
Wire crispasr_align_words_abi (v0.4.7) to give word-level timestamps to LLM-based backends (qwen3 / voxtral / granite) that don't emit them natively.
Expose more CrispASR power-user knobs in the UI: beam search / best-of-N, temperature, source/target languages for translation backends, audio Q&A mode, streaming UI (see PLAN.md §5.8). VAD and initialPrompt already shipped in v0.1.7.
Windows CI job (build script + DLL bundler shipped — scripts/build_windows.ps1; CI workflow still TODO).
Android CI: cross-build libwhisper.so + sibling backends, drop into jniLibs/, ship a real-ASR APK.
iOS pod install + device-build CI verification.
Finish i18n migration for widgets + older Settings strings.
Backend-specific UX (source/target language UI for Canary, Voxtral audio Q&A mode, Granite --ask flag, etc.).

Full per-item breakdown with file paths, risks, and verification steps: PLAN.md.

Technical learnings collected during development (FFI quirks, dylib bundling, macOS chrome, CI patterns): LEARNINGS.md.

License & author

CrisperWeaver is GNU AGPL-3.0-or-later. See LICENSE for full text and the in-app About screen for the auto-aggregated third-party license list (showLicensePage).

Name		Name	Last commit message	Last commit date
Latest commit History 432 Commits
.github/workflows		.github/workflows
android		android
assets		assets
docs		docs
ios		ios
lib		lib
linux		linux
macos		macos
scripts		scripts
test		test
web		web
windows		windows
.gitignore		.gitignore
.metadata		.metadata
AppLogo.png		AppLogo.png
CHANGELOG.md		CHANGELOG.md
HISTORY.md		HISTORY.md
LEARNINGS.md		LEARNINGS.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
analysis_options.yaml		analysis_options.yaml
build_all.sh		build_all.sh
build_and_run.sh		build_and_run.sh
build_android.sh		build_android.sh
build_ios.sh		build_ios.sh
build_windows.bat		build_windows.bat
dart_test.yaml		dart_test.yaml
l10n.yaml		l10n.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrisperWeaver

Part of the Crisp ecosystem

What you can do

Supported models

ASR

TTS

Post-processors

Diarisation / LID / VAD GGUFs

Platforms

Building

Prerequisites

Clone the two repos side-by-side

Desktop one-shot (recommended)

Manual / iterative dev

Using it

Testing

CI & releases

Sideloading iOS

Installing on Windows

Roadmap

License & author

About

Uh oh!

Releases 71

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CrisperWeaver

Part of the Crisp ecosystem

What you can do

Supported models

ASR

TTS

Post-processors

Diarisation / LID / VAD GGUFs

Platforms

Building

Prerequisites

Clone the two repos side-by-side

Desktop one-shot (recommended)

Manual / iterative dev

Using it

Testing

CI & releases

Sideloading iOS

Installing on Windows

Roadmap

License & author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 71

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages