English · 中文
Turn a long podcast, YouTube video, or X Space into a resumable transcript on Apple Silicon — without a 90-minute whisper job that looks hung or writes 0 bytes.
One command: URL or local file → download (subtitle-first for YouTube/Bilibili) → chunked mlx-whisper → {slug}.transcript.txt. Optional digest stub for your AI agent.
- Unified ingest — YouTube, Bilibili, Xiaoyuzhou (小宇宙), X Space, Apple Podcasts, or local audio/video in one CLI.
- Chunked transcription — ffmpeg splits long audio; mlx-whisper runs serially; each segment appends to the output file (
tail -ffriendly). - Resume after interrupt —
manifest.json+--resume; no full restart on 60+ minute sources. - Subtitle-first for video — tries VTT/subs before whisper when available.
- Digest stub (v0.2+) —
digest.py preparewrites chapter skeleton + agent prompt; flags mlx repetition hallucinations on intro/outro segments.
URL or local file
↓ ingest.py (platform detect → download or subs)
↓ transcribe_chunked.py (if no usable subtitles)
↓ {slug}.transcript.txt + {slug}.ingest.json
↓ digest.py prepare (optional)
↓ your Agent fills {slug}-digest.md
Input (Xiaoyuzhou episode):
python3 ingest.py "https://www.xiaoyuzhoufm.com/episode/EPISODE_ID" \
--out-dir ./output --language zh --resumeOutput:
output/
episode-slug.m4a
episode-slug.transcript.txt # incremental, segment markers
episode-slug.transcript.txt.chunks/ # manifest + segment mp3s
episode-slug.ingest.json # run metadata
Short video with subtitles — ingest may skip whisper entirely and write transcript from VTT.
Clone and install dependencies:
git clone https://github.com/runesleo/long-media-cli.git
cd long-media-cli
chmod +x ingest.sh ingest.py digest.py space_pipeline.sh long_media.sh
brew install ffmpeg yt-dlp
pipx install mlx-whisperAny AI coding agent:
Point your agent at docs/ingest.md + docs/digest-template.md. The pipeline is plain Python + shell — no framework lock-in.
- macOS Apple Silicon (M-series) — primary tested platform
ffmpeg,ffprobe,yt-dlpmlx_whisper(pipx install mlx-whisper) — default engine- Optional:
faster-whisper(--engine faster), OpenAI API (--engine openai+OPENAI_API_KEY) - YouTube / X Space download:
--cookies-from-browser chrome(default) or your browser of choice
Privacy note: By default, ingest.py and download_twitter_space.py pass --cookies-from-browser chrome to yt-dlp so authenticated downloads work. That reads cookies from your local Chrome profile on this machine only — nothing is uploaded by this CLI. To disable: --cookies-from-browser "" or set an empty value in the shell wrapper.
# Podcast / Xiaoyuzhou
./ingest.sh "https://www.xiaoyuzhoufm.com/episode/..." ./output zh
# YouTube (subs first, else whisper)
./ingest.sh "https://www.youtube.com/watch?v=..." ./output zh
# X Space (480s segments)
./ingest.sh "https://x.com/i/spaces/SPACE_ID" ./output zh 480
# Local file
./ingest.sh ./episode.m4a ./output zh 600
# Digest stub after transcript complete
python3 digest.py status ./output/episode.transcript.txt
python3 digest.py prepare ./output/episode.transcript.txtLow-level transcribe only:
python3 transcribe_chunked.py episode.m4a \
--out episode.transcript.txt \
--language zh --engine mlx --segment-sec 600 --resumeTested on MacBook Pro Apple Silicon · mlx_whisper · production runs (not re-run for every tag).
| Source | Duration | Segments | Result |
|---|---|---|---|
| Xiaoyuzhou podcast | ~139 min | 14 × 600 s | 14/14 · incremental transcript |
| X Space replay | ~71 min | 9 × 480 s | 9/9 · seg_0/seg_8 may hallucinate (mlx intro/outro) |
See docs/chunked-local.md for resume/progress details.
- Long audio only — designed for >20 min sources; short clips may over-chunk.
- mlx intro/outro hallucination — some Spaces/podcasts repeat lyrics-like garbage in first/last segment;
digest.py prepareflags these — review or re-run those segments. - Bilibili — uses direct API (yt-dlp 412 workaround); may break if API changes.
- Xiaoyuzhou — scrapes CDN URL from page HTML; SPA changes may require script update.
- Digest body — CLI writes stub + agent prompt only; LLM summary is Agent-driven, not built-in.
- No Windows/Linux CI — Apple Silicon + mlx is the happy path;
fasterengine is the CPU fallback.
Ingest
- pyproject.toml +
pip installentry point -
--engineauto-detect when mlx unavailable
Transcribe
- Optional segment re-run by index (fix hallucinated seg_0/seg_8 without full job)
- Speaker diarization hook (short files / per-segment)
Digest
- Optional LLM backend behind env flag (opt-in, not default)
Leo (@runes_leo) — AI × Crypto independent builder. Trading on Polymarket, building data and content pipelines with Claude Code and Codex.
leolabs.me — writing · community · open-source tools · indie projects · all platforms.
X Subscription — paid content weekly, or just buy me a coffee 😁
Learn in public, Build in public.
MIT — see LICENSE.