Whisp is an Oh My Zsh plugin that adds idempotency, convenience features, and speaker diarization to the WhisperX CLI tool. It helps you efficiently transcribe audio files without duplicating work.
- Idempotent Processing: Skip files that already have transcriptions unless explicitly forced
- Speaker Diarization: Identify who is speaking with
--diarize(powered by pyannote.audio) - Batch Processing: Transcribe multiple files with a single command
- Extension Filtering: Process files of specific audio types
- Model Selection: Easily switch between WhisperX models
- Recursive Searching: Optionally find audio files in subdirectories
- Output Control: View WhisperX's real-time output or suppress it
- Resource Management: Limit thread usage to prevent system slowdown
- Oh My Zsh
- WhisperX CLI tool properly installed and available in your PATH
- For diarization: A HuggingFace API token with access to pyannote models
-
Clone this repository:
git clone https://github.com/yourusername/whisp.git ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/whisp -
Add the plugin to your
.zshrcfile:plugins=(... whisp)
-
Reload your shell:
source ~/.zshrc
# Transcribe all supported audio files in the current directory
whisp
# Transcribe a specific file
whisp file.mp3
# Transcribe all files with a specific extension
whisp mp3
# Transcribe files with any of multiple extensions
whisp mp3 m4a wav
# Transcribe multiple specific files
whisp file1.mp3 file2.m4a# Choose which WhisperX model to use (default is turbo)
whisp --model tiny
whisp --model base
whisp --model small
whisp --model medium
whisp --model large
whisp --model turbo
# Force transcription even if a transcription already exists
whisp --force
# Specify language for transcription
whisp --language en
# Search for audio files in subdirectories
whisp --subdir
# Run silently (suppress WhisperX output)
whisp --silent
# Limit threads used (reduces system load)
whisp --cores 2
# Set compute type (default: float32, also: float16, int8)
whisp --compute-type float32
# Combine options
whisp mp3 --model medium --force --subdir --cores 4Speaker diarization identifies who is speaking and when. To use it:
- Create a HuggingFace account
- Accept the pyannote model agreements:
- Create an access token at HuggingFace Settings
- Either set
HF_TOKENin your environment or pass--hf-token
# Transcribe with speaker identification
whisp --diarize meeting.mp3
# Pass HuggingFace token directly
whisp --diarize --hf-token hf_abc123 meeting.mp3
# Specify expected number of speakers
whisp --diarize --min-speakers 2 --max-speakers 4 call.mp3- Single File Mode: If a transcription exists, prompts you before creating a new one
- Batch Mode: Automatically skips files with existing transcriptions
- Force Mode: Creates uniquely named transcriptions without overwriting existing ones
- mp3
- mp4
- m4a
- wav
- flac
- aac
- ogg
- wma
whisp mp3 --model mediumwhisp --subdirwhisp interview.mp3 --forcewhisp mp3 wav --silentwhisp --diarize --min-speakers 2 meeting.mp3This has only been tested on macOS Sequoia 15. YMMV.
MIT © Jacob Reiff
Contributions are welcome! Please feel free to submit a Pull Request.