A Claude Code skill for transcribing audio and video files using OpenAI's Whisper with context-grounding from markdown files.
- Audio/Video Transcription: Convert media files to text using OpenAI Whisper
- Context Grounding: Uses markdown files in the same directory to improve accuracy for technical terms, names, and jargon
- Multi-format Support: Works with mp3, wav, m4a, mp4, webm, and more
- Cross-platform: Supports macOS (Homebrew) and Linux installations
- Automated Workflow: Python script handles the full transcription pipeline
The easiest way to install this skill is using the skilz universal installer:
npx skilz install SpillwaveSolutions_whisper-transcribe/whisper-transcribeThis command automatically downloads and configures the skill for Claude Code.
View on Skilz Marketplace: whisper-transcribe
Clone the repository to your Claude Code skills directory:
git clone https://github.com/SpillwaveSolutions/whisper-transcribe.git ~/.claude/skills/whisper-transcribeAfter installing the skill, you need to install Whisper and ffmpeg on your system.
brew install ffmpeg openai-whisper# Install ffmpeg
sudo apt install ffmpeg # Debian/Ubuntu
# Install Whisper
pip install openai-whisperwhisper --version
ffmpeg -versionwhisper /path/to/audio.mp3 --output_dir /path/to/outputpython scripts/transcribe_with_context.py /path/to/audio.mp3 --model base --language enThe script will:
- Find markdown context files in the same directory
- Run Whisper transcription
- Apply corrections based on context (technical terms, names)
- Save both original and grounded transcripts
| Model | Speed | Accuracy | RAM Required | Best For |
|---|---|---|---|---|
| tiny | Fastest | Lower | ~1 GB | Quick drafts, testing |
| base | Fast | Good | ~1 GB | General use |
| small | Medium | Better | ~2 GB | Important recordings |
| medium | Slower | High | ~5 GB | Professional transcription |
| large | Slowest | Highest | ~10 GB | Critical accuracy needs |
For MacBook Pro with Apple Silicon: small or medium models recommended for best speed/accuracy balance.
Create markdown files in the same directory as your audio to improve transcription accuracy.
# Meeting Context
## Speakers
- Richard Hightower (host)
- Jane Smith (engineering lead)
## Technical Terms
- Kubernetes (container orchestration)
- FastAPI (Python web framework)
- AlloyDB (Google Cloud database)
## Acronyms
- CI/CD - Continuous Integration/Continuous Deployment
- PR - Pull RequestSee assets/context-template.md for a complete template.
whisper-transcribe/
├── SKILL.md # Skill definition
├── README.md # This file
├── scripts/
│ └── transcribe_with_context.py # Automated transcription script
├── references/
│ └── whisper-options.md # Complete Whisper CLI reference
└── assets/
└── context-template.md # Template for context files
This skill activates when users mention:
- whisper, transcribe, transcription
- audio to text, video to text, speech to text
- meeting transcript, convert recording
- File extensions: .mp3, .wav, .m4a, .mp4, .webm
# macOS
brew install openai-whisper
# Linux
pip install openai-whisper
export PATH="$HOME/.local/bin:$PATH"# macOS
brew install ffmpeg
# Linux
sudo apt install ffmpegUse a smaller model:
whisper "audio.mp3" --model tiny- Use
tinyorbasemodel for faster results - Ensure correct architecture is being used (Apple Silicon vs Intel)
Contributions are welcome! Please feel free to submit a Pull Request.
MIT