Fast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar.
0.10.0 preview (unreleased): this README reflects the upcoming release.
- Chrome Side Panel chat (streaming agent + history) inside the sidebar.
- YouTube slides: screenshots + OCR + transcript cards, timestamped seek, OCR/Transcript toggle.
- Media-aware summaries: auto‑detect video/audio vs page content.
- Streaming Markdown + metrics + cache‑aware status.
- CLI supports URLs, files, podcasts, YouTube, audio/video, PDFs.
- URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, podcasts, RSS.
- Slide extraction for video sources (YouTube/direct media) with OCR + timestamped cards.
- Transcript-first media flow: published transcripts when available, Whisper fallback when not.
- Streaming output with Markdown rendering, metrics, and cache-aware status.
- Local, paid, and free models: OpenAI‑compatible local endpoints, paid providers, plus an OpenRouter free preset.
- Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates.
One‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown.
Chrome Web Store: Summarize Side Panel
YouTube slide screenshots (from the browser):
- Install the CLI (choose one):
- npm (cross‑platform):
npm i -g @steipete/summarize - Homebrew (macOS arm64):
brew install steipete/tap/summarize
- npm (cross‑platform):
- Install the extension (Chrome Web Store link above) and open the Side Panel.
- The panel shows a token + install command. Run it in Terminal:
summarize daemon install --token <TOKEN>
Why a daemon/service?
- The extension can’t run heavy extraction inside the browser. It talks to a local background service on
127.0.0.1for fast streaming and media tools (yt‑dlp, ffmpeg, OCR, transcription). - The service autostarts (launchd/systemd/Scheduled Task) so the Side Panel is always ready.
If you only want the CLI, you can skip the daemon install entirely.
Notes:
- Summarization only runs when the Side Panel is open.
- Auto mode summarizes on navigation (incl. SPAs); otherwise use the button.
- Daemon is localhost-only and requires a shared token.
- Autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task).
- Tip: configure
freeviasummarize refresh-free(needsOPENROUTER_API_KEY). Add--set-defaultto set model=free.
More:
- Step-by-step install: apps/chrome-extension/README.md
- Architecture + troubleshooting: docs/chrome-extension.md
- Firefox compatibility notes: apps/chrome-extension/docs/firefox.md
- Select Video + Slides in the Summarize picker.
- Slides render at the top; expand to full‑width cards with timestamps.
- Click a slide to seek the video; toggle Transcript/OCR when OCR is significant.
- Requirements:
yt-dlp+ffmpegfor extraction;tesseractfor OCR. Missing tools show an in‑panel notice.
- Build + load the extension (unpacked):
- Chrome:
pnpm -C apps/chrome-extension buildchrome://extensions→ Developer mode → Load unpacked- Pick:
apps/chrome-extension/.output/chrome-mv3
- Firefox:
pnpm -C apps/chrome-extension build:firefoxabout:debugging#/runtime/this-firefox→ Load Temporary Add-on- Pick:
apps/chrome-extension/.output/firefox-mv3/manifest.json
- Chrome:
- Open Side Panel/Sidebar → copy token.
- Install daemon in dev mode:
pnpm summarize daemon install --token <TOKEN> --dev
Requires Node 22+.
- npx (no install):
npx -y @steipete/summarize "https://example.com"- npm (global):
npm i -g @steipete/summarize- npm (library / minimal deps):
npm i @steipete/summarize-coreimport { createLinkPreviewClient } from '@steipete/summarize-core/content'- Homebrew (custom tap):
brew install steipete/tap/summarizeApple Silicon only (arm64).
- CLI only: just install via npm/Homebrew and run
summarize ...(no daemon needed). - Chrome/Firefox extension: install the CLI and run
summarize daemon install --token <TOKEN>so the Side Panel can stream results and use local tools.
summarize "https://example.com"URLs or local paths:
summarize "/path/to/file.pdf" --model google/gemini-3-flash-preview
summarize "https://example.com/report.pdf" --model google/gemini-3-flash-preview
summarize "/path/to/audio.mp3"
summarize "/path/to/video.mp4"YouTube (supports youtube.com and youtu.be):
summarize "https://youtu.be/dQw4w9WgXcQ" --youtube autoPodcast RSS (transcribes latest enclosure):
summarize "https://feeds.npr.org/500005/podcast.xml"Apple Podcasts episode page:
summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432"Spotify episode page (best-effort; may fail for exclusives):
summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY"--length controls how much output we ask for (guideline), not a hard cap.
summarize "https://example.com" --length long
summarize "https://example.com" --length 20k- Presets:
short|medium|long|xl|xxl - Character targets:
1500,20k,20000 - Optional hard cap:
--max-output-tokens <count>(e.g.2000,2k)- Provider/model APIs still enforce their own maximum output limits.
- If omitted, no max token parameter is sent (provider default).
- Prefer
--lengthunless you need a hard cap.
- Minimums:
--lengthnumeric values must be >= 50 chars;--max-output-tokensmust be >= 16. - Preset targets (source of truth:
packages/core/src/prompts/summary-lengths.ts):- short: target ~900 chars (range 600-1,200)
- medium: target ~1,800 chars (range 1,200-2,500)
- long: target ~4,200 chars (range 2,500-6,000)
- xl: target ~9,000 chars (range 6,000-14,000)
- xxl: target ~17,000 chars (range 14,000-22,000)
Best effort and provider-dependent. These usually work well:
text/*and common structured text (.txt,.md,.json,.yaml,.xml, ...)- Text-like files are inlined into the prompt for better provider compatibility.
- PDFs:
application/pdf(provider support varies; Google is the most reliable here) - Images:
image/jpeg,image/png,image/webp,image/gif - Audio/Video:
audio/*,video/*(local audio/video files MP3/WAV/M4A/OGG/FLAC/MP4/MOV/WEBM automatically transcribed, when supported by the model)
Notes:
- If a provider rejects a media type, the CLI fails fast with a friendly message.
- xAI models do not support attaching generic files (like PDFs) via the AI SDK; use Google/OpenAI/Anthropic for those.
Use gateway-style ids: <provider>/<model>.
Examples:
openai/gpt-5-minianthropic/claude-sonnet-4-5xai/grok-4-fast-non-reasoninggoogle/gemini-3-flash-previewzai/glm-4.7openrouter/openai/gpt-5-mini(force OpenRouter)
Note: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).
- Text inputs over 10 MB are rejected before tokenization.
- Text prompts are preflighted against the model input limit (LiteLLM catalog), using a GPT tokenizer.
summarize <input> [flags]Use summarize --help or summarize help for the full help text.
--model <provider/model>: which model to use (defaults toauto)--model auto: automatic model selection + fallback (default)--model <name>: use a config-defined model (see Configuration)--timeout <duration>:30s,2m,5000ms(default2m)--retries <count>: LLM retry attempts on timeout (default1)--length short|medium|long|xl|xxl|s|m|l|<chars>--language, --lang <language>: output language (auto= match source)--max-output-tokens <count>: hard cap for LLM output tokens--cli [provider]: use a CLI provider (--model cli/<provider>). If omitted, uses auto selection with CLI enabled.--stream auto|on|off: stream LLM output (auto= TTY only; disabled in--jsonmode)--plain: keep raw output (no ANSI/OSC Markdown rendering)--no-color: disable ANSI colors--format md|text: website/file content format (defaulttext)--markdown-mode off|auto|llm|readability: HTML -> Markdown mode (defaultreadability)--preprocess off|auto|always: controlsuvx markitdownusage (defaultauto)- Install
uvx:brew install uv(or https://astral.sh/uv/)
- Install
--extract: print extracted content and exit (URLs only)- Deprecated alias:
--extract-only
- Deprecated alias:
--slides: extract slide screenshots for YouTube/direct video URLs--slides-ocr: run OCR on extracted slides (requirestesseract)--slides-dir <dir>: base output dir for slide images (default./slides)--slides-scene-threshold <value>: scene detection threshold (0.1-1.0)--slides-max <count>: maximum slides to extract--slides-min-duration <seconds>: minimum seconds between slides--json: machine-readable output with diagnostics, prompt,metrics, and optional summary--verbose: debug/diagnostics on stderr--metrics off|on|detailed: metrics output (defaulton)
--model auto builds candidate attempts from built-in rules (or your model.rules overrides).
CLI tools are not used in auto mode unless you enable them via cli.enabled in config.
Why: CLI adds ~4s latency per attempt and higher variance.
Shortcut: --cli (with no provider) uses auto selection with CLI enabled.
When enabled, auto prepends CLI attempts in the order listed in cli.enabled
(recommended: ["gemini"]), then tries the native provider candidates
(with OpenRouter fallbacks when configured).
Enable CLI attempts:
{
"cli": { "enabled": ["gemini"] }
}Disable CLI attempts:
{
"cli": { "enabled": [] }
}Note: when cli.enabled is set, it is also an allowlist for explicit --cli / --model cli/....
Non-YouTube URLs go through a fetch -> extract pipeline. When direct fetch/extraction is blocked or too thin,
--firecrawl auto can fall back to Firecrawl (if configured).
--firecrawl off|auto|always(defaultauto)--extract --format md|text(defaulttext; if--formatis omitted,--extractdefaults tomdfor non-YouTube URLs)--markdown-mode off|auto|llm|readability(defaultreadability)auto: use an LLM converter when configured; may fall back touvx markitdownllm: force LLM conversion (requires a configured model key)off: disable LLM conversion (still may return Firecrawl Markdown when configured)
- Plain-text mode: use
--format text.
--youtube auto tries best-effort web transcript endpoints first. When captions are not available, it falls back to:
- Apify (if
APIFY_API_TOKENis set): uses a scraping actor (faVsWy9VTSNVIhWpR) - yt-dlp + Whisper (if
yt-dlpis available): downloads audio, then transcribes with localwhisper.cppwhen installed (preferred), otherwise falls back to OpenAI (OPENAI_API_KEY) or FAL (FAL_KEY)
Environment variables for yt-dlp mode:
YT_DLP_PATH- optional path to yt-dlp binary (otherwiseyt-dlpis resolved viaPATH)SUMMARIZE_WHISPER_CPP_MODEL_PATH- optional override for the localwhisper.cppmodel fileSUMMARIZE_WHISPER_CPP_BINARY- optional override for the local binary (default:whisper-cli)SUMMARIZE_DISABLE_LOCAL_WHISPER_CPP=1- disable local whisper.cpp (force remote)OPENAI_API_KEY- OpenAI Whisper transcriptionFAL_KEY- FAL AI Whisper fallback
Apify costs money but tends to be more reliable when captions exist.
Extract slide screenshots (scene detection via ffmpeg) and optional OCR:
summarize "https://www.youtube.com/watch?v=..." --slides
summarize "https://www.youtube.com/watch?v=..." --slides --slides-ocrOutputs are written under ./slides/<videoId>/ (or --slides-dir). OCR results are included in JSON output
(--json) and stored in slides.json inside the slide directory. When scene detection is too sparse, the
extractor also samples at a fixed interval to improve coverage.
Format the extracted transcript as Markdown (headings + paragraphs) via an LLM:
summarize "https://www.youtube.com/watch?v=..." --extract --format md --markdown-mode llmLocal audio/video files are transcribed first, then summarized. --video-mode transcript forces
direct media URLs (and embedded media) through Whisper first. Prefers local whisper.cpp when available; otherwise requires
OPENAI_API_KEY or FAL_KEY.
Run: summarize <url>
- Apple Podcasts
- Spotify
- Amazon Music / Audible podcast pages
- Podbean
- Podchaser
- RSS feeds (Podcasting 2.0 transcripts when available)
- Embedded YouTube podcast pages (e.g. JREPodcast)
Transcription: prefers local whisper.cpp when installed; otherwise uses OpenAI Whisper or FAL when keys are set.
--language/--lang controls the output language of the summary (and other LLM-generated text). Default is auto.
When the input is audio/video, the CLI needs a transcript first. The transcript comes from one of these paths:
- Existing transcript (preferred)
- YouTube: uses
youtubei/captionTrackswhen available. - Podcasts: uses Podcasting 2.0 RSS
<podcast:transcript>(JSON/VTT) when the feed publishes it.
- YouTube: uses
- Whisper transcription (fallback)
- YouTube: falls back to yt-dlp (audio download) + Whisper transcription when configured; Apify is a last resort.
- Prefers local
whisper.cppwhen installed + model available. - Otherwise uses cloud Whisper (OpenAI
OPENAI_API_KEY) or FAL (FAL_KEY).
For direct media URLs, use --video-mode transcript to force transcribe -> summarize:
summarize https://example.com/file.mp4 --video-mode transcript --lang enSingle config location:
~/.summarize/config.json
Supported keys today:
{
"model": { "id": "openai/gpt-5-mini" }
}Shorthand (equivalent):
{
"model": "openai/gpt-5-mini"
}Also supported:
model: { "mode": "auto" }(automatic model selection + fallback; see docs/model-auto.md)model.rules(customize candidates / ordering)models(define presets selectable via--model <preset>)media.videoMode: "auto"|"transcript"|"understand"openai.useChatCompletions: true(force OpenAI-compatible chat completions)
Note: the config is parsed leniently (JSON5), but comments are not allowed. Unknown keys are ignored.
Precedence:
--modelSUMMARIZE_MODEL~/.summarize/config.json- default (
auto)
Set the key matching your chosen --model:
OPENAI_API_KEY(foropenai/...)ANTHROPIC_API_KEY(foranthropic/...)XAI_API_KEY(forxai/...)Z_AI_API_KEY(forzai/...; supportsZAI_API_KEYalias)GEMINI_API_KEY(forgoogle/...)- also accepts
GOOGLE_GENERATIVE_AI_API_KEYandGOOGLE_API_KEYas aliases
- also accepts
OpenAI-compatible chat completions toggle:
OPENAI_USE_CHAT_COMPLETIONS=1(or setopenai.useChatCompletionsin config)
OpenRouter (OpenAI-compatible):
- Set
OPENROUTER_API_KEY=... - Prefer forcing OpenRouter per model id:
--model openrouter/<author>/<slug> - Built-in preset:
--model free(uses a default set of OpenRouter:freemodels)
Quick start: make free the default (keep auto available)
summarize refresh-free --set-default
summarize "https://example.com"
summarize "https://example.com" --model autoRegenerates the free preset (models.free in ~/.summarize/config.json) by:
- Fetching OpenRouter
/models, filtering:free - Skipping models that look very small (<27B by default) based on the model id/name
- Testing which ones return non-empty text (concurrency 4, timeout 10s)
- Picking a mix of smart-ish (bigger
context_length/ output cap) and fast models - Refining timings and writing the sorted list back
If --model free stops working, run:
summarize refresh-freeFlags:
--runs 2(default): extra timing runs per selected model (total runs = 1 + runs)--smart 3(default): how many smart-first picks (rest filled by fastest)--min-params 27b(default): ignore models with inferred size smaller than N billion parameters--max-age-days 180(default): ignore models older than N days (set 0 to disable)--set-default: also sets"model": "free"in~/.summarize/config.json
Example:
OPENROUTER_API_KEY=sk-or-... summarize "https://example.com" --model openrouter/meta-llama/llama-3.1-8b-instruct:freeIf your OpenRouter account enforces an allowed-provider list, make sure at least one provider
is allowed for the selected model. When routing fails, summarize prints the exact providers to allow.
Legacy: OPENAI_BASE_URL=https://openrouter.ai/api/v1 (and either OPENAI_API_KEY or OPENROUTER_API_KEY) also works.
Z.AI (OpenAI-compatible):
Z_AI_API_KEY=...(orZAI_API_KEY=...)- Optional base URL override:
Z_AI_BASE_URL=...
Optional services:
FIRECRAWL_API_KEY(website extraction fallback)YT_DLP_PATH(path to yt-dlp binary for audio extraction)FAL_KEY(FAL AI API key for audio transcription via Whisper)APIFY_API_TOKEN(YouTube transcript fallback)
The CLI uses the LiteLLM model catalog for model limits (like max output tokens):
- Downloaded from:
https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json - Cached at:
~/.summarize/cache/
Recommended (minimal deps):
@steipete/summarize-core/content@steipete/summarize-core/prompts
Compatibility (pulls in CLI deps):
@steipete/summarize/content@steipete/summarize/prompts
pnpm install
pnpm check- Docs index: docs/README.md
- CLI providers and config: docs/cli.md
- Auto model rules: docs/model-auto.md
- Website extraction: docs/website.md
- YouTube handling: docs/youtube.md
- Media pipeline: docs/media.md
- Config schema and precedence: docs/config.md
- "Receiving end does not exist": Chrome did not inject the content script yet.
- Extension details -> Site access -> On all sites (or allow this domain)
- Reload the tab once.
- "Failed to fetch" / daemon unreachable:
summarize daemon status- Logs:
~/.summarize/logs/daemon.err.log
License: MIT


