long-media-cli

English · 中文

Turn a long podcast, YouTube video, or X Space into a resumable transcript on Apple Silicon — without a 90-minute whisper job that looks hung or writes 0 bytes.

One command: URL or local file → download (subtitle-first for YouTube/Bilibili) → chunked mlx-whisper → {slug}.transcript.txt. Optional digest stub for your AI agent.

What you get

Unified ingest — YouTube, Bilibili, Xiaoyuzhou (小宇宙), X Space, Apple Podcasts, or local audio/video in one CLI.
Chunked transcription — ffmpeg splits long audio; mlx-whisper runs serially; each segment appends to the output file (tail -f friendly).
Resume after interrupt — manifest.json + --resume; no full restart on 60+ minute sources.
Subtitle-first for video — tries VTT/subs before whisper when available.
Digest stub (v0.2+) — digest.py prepare writes chapter skeleton + agent prompt; flags mlx repetition hallucinations on intro/outro segments.

How it works

URL or local file
    ↓ ingest.py (platform detect → download or subs)
    ↓ transcribe_chunked.py (if no usable subtitles)
    ↓ {slug}.transcript.txt + {slug}.ingest.json
    ↓ digest.py prepare (optional)
    ↓ your Agent fills {slug}-digest.md

Input (Xiaoyuzhou episode):

python3 ingest.py "https://www.xiaoyuzhoufm.com/episode/EPISODE_ID" \
  --out-dir ./output --language zh --resume

Output:

output/
  episode-slug.m4a
  episode-slug.transcript.txt          # incremental, segment markers
  episode-slug.transcript.txt.chunks/  # manifest + segment mp3s
  episode-slug.ingest.json             # run metadata

Short video with subtitles — ingest may skip whisper entirely and write transcript from VTT.

Setup

Clone and install dependencies:

git clone https://github.com/runesleo/long-media-cli.git
cd long-media-cli
chmod +x ingest.sh ingest.py digest.py space_pipeline.sh long_media.sh

brew install ffmpeg yt-dlp
pipx install mlx-whisper

Any AI coding agent:

Point your agent at docs/ingest.md + docs/digest-template.md. The pipeline is plain Python + shell — no framework lock-in.

Requirements

macOS Apple Silicon (M-series) — primary tested platform
ffmpeg, ffprobe, yt-dlp
mlx_whisper (pipx install mlx-whisper) — default engine
Optional: faster-whisper (--engine faster), OpenAI API (--engine openai + OPENAI_API_KEY)
YouTube / X Space download: --cookies-from-browser chrome (default) or your browser of choice

Privacy note: By default, ingest.py and download_twitter_space.py pass --cookies-from-browser chrome to yt-dlp so authenticated downloads work. That reads cookies from your local Chrome profile on this machine only — nothing is uploaded by this CLI. To disable: --cookies-from-browser "" or set an empty value in the shell wrapper.

Quick start

# Podcast / Xiaoyuzhou
./ingest.sh "https://www.xiaoyuzhoufm.com/episode/..." ./output zh

# YouTube (subs first, else whisper)
./ingest.sh "https://www.youtube.com/watch?v=..." ./output zh

# X Space (480s segments)
./ingest.sh "https://x.com/i/spaces/SPACE_ID" ./output zh 480

# Local file
./ingest.sh ./episode.m4a ./output zh 600

# Digest stub after transcript complete
python3 digest.py status ./output/episode.transcript.txt
python3 digest.py prepare ./output/episode.transcript.txt

Low-level transcribe only:

python3 transcribe_chunked.py episode.m4a \
  --out episode.transcript.txt \
  --language zh --engine mlx --segment-sec 600 --resume

Verified

Tested on MacBook Pro Apple Silicon · mlx_whisper · production runs (not re-run for every tag).

Source	Duration	Segments	Result
Xiaoyuzhou podcast	~139 min	14 × 600 s	14/14 · incremental transcript
X Space replay	~71 min	9 × 480 s	9/9 · seg_0/seg_8 may hallucinate (mlx intro/outro)

See docs/chunked-local.md for resume/progress details.

Known limitations (v0.2)

Long audio only — designed for >20 min sources; short clips may over-chunk.
mlx intro/outro hallucination — some Spaces/podcasts repeat lyrics-like garbage in first/last segment; digest.py prepare flags these — review or re-run those segments.
Bilibili — uses direct API (yt-dlp 412 workaround); may break if API changes.
Xiaoyuzhou — scrapes CDN URL from page HTML; SPA changes may require script update.
Digest body — CLI writes stub + agent prompt only; LLM summary is Agent-driven, not built-in.
No Windows/Linux CI — Apple Silicon + mlx is the happy path; faster engine is the CPU fallback.

Roadmap

Ingest

pyproject.toml + pip install entry point
--engine auto-detect when mlx unavailable

Transcribe

Optional segment re-run by index (fix hallucinated seg_0/seg_8 without full job)
Speaker diarization hook (short files / per-segment)

Digest

Optional LLM backend behind env flag (opt-in, not default)

About the author

Leo (@runes_leo) — AI × Crypto independent builder. Trading on Polymarket, building data and content pipelines with Claude Code and Codex.

leolabs.me — writing · community · open-source tools · indie projects · all platforms.

X Subscription — paid content weekly, or just buy me a coffee 😁

Learn in public, Build in public.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

long-media-cli

What you get

How it works

Setup

Requirements

Quick start

Verified

Known limitations (v0.2)

Roadmap

About the author

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
REVIEW-codex-pass.md		REVIEW-codex-pass.md
digest.py		digest.py
download_twitter_space.py		download_twitter_space.py
ingest.py		ingest.py
ingest.sh		ingest.sh
long_media.sh		long_media.sh
space_pipeline.sh		space_pipeline.sh
transcribe_chunked.py		transcribe_chunked.py

Folders and files

Latest commit

History

Repository files navigation

long-media-cli

What you get

How it works

Setup

Requirements

Quick start

Verified

Known limitations (v0.2)

Roadmap

About the author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages