Summarize
Extract clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without hand-coding brittle scraper logic.
Use this skill when you need deterministic extraction for YouTube, podcast feeds, PDFs, scanned images, or local media files.
Terminology used in this file:
- DOM: Document Object Model, the page element structure used by browser-based extractors.
- OCR: Optical character recognition (extracting text from images/scans).
- ANSI codes: Terminal color/control sequences;
--plainremoves them for machine parsing.
Setup
brew tap steipete/tap
brew install summarize
- Claude Code: copy this skill folder into
.claude/skills/summarize/ - Codex CLI: append this SKILL.md content to your project's root
AGENTS.md
For the full installation walkthrough (prerequisites, optional dependencies, verification, troubleshooting), see references/installation-guide.md.
Staying Updated
This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.
After installing, tell your agent: "Check UPDATES.md in the summarize skill for any new features or changes."
When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."
Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.
Quick Start
Run one extraction flow end-to-end:
summarize --version
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
summarize --extract "/path/to/document.pdf" --plain
Use --extract --plain as the default pattern for deterministic, non-ANSI output.
Decision Tree: summarize vs Other Tools
Need content from the web?
|
+-- Static web page (article, docs, blog)?
| --> WebFetch (built-in, zero deps, faster)
| --> Jina r.jina.ai (zero install alternative)
| --> summarize ONLY if above tools fail or return garbage
|
+-- JS-heavy SPA / dynamic content?
| --> Crawl4AI crwl (full browser rendering)
| --> summarize will NOT help here (no JS rendering)
|
+-- Anti-bot / paywalled / Cloudflare-protected?
| --> summarize --firecrawl always (requires FIRECRAWL_API_KEY)
| --> browser-based workflow as fallback
|
+-- YouTube video?
| --> summarize --extract (ONLY option for transcript)
| --> Add --youtube web for captions-only (faster)
| --> Add --slides for visual slide extraction
|
+-- Podcast / RSS feed?
| --> summarize --extract (ONLY option)
| --> Supports Apple Podcasts, Spotify, RSS feeds, Podbean, etc.
|
+-- PDF (URL or local file)?
| --> summarize --extract (ONLY CLI option)
| --> Requires: uvx/markitdown (brew install uv)
|
+-- Image (OCR)?
| --> summarize --extract (ONLY CLI option)
| --> Requires: tesseract
|
+-- Audio / video file?
--> summarize --extract (ONLY CLI option)
--> Requires: whisper-cli (local) or OPENAI_API_KEY (cloud)
Rule of thumb: summarize is the default for media extraction (YouTube, podcasts, audio, video, images). For web pages, prefer WebFetch/Jina/Crawl4AI depending on DOM complexity (how hard the page structure is to parse). Use summarize for web only when other tools fail.
Extraction Mode (Primary)
--extract prints raw extracted content and exits. No LLM involved.
Use this first. You can handle any downstream synthesis in your own workflow.
# Web page extraction (plain text, default)
summarize --extract "https://example.com" --plain
# Web page extraction (markdown format)
summarize --extract "https://example.com" --format md --plain
# YouTube transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
# YouTube transcript with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain
# YouTube transcript formatted as markdown (requires LLM -- uses API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain
# YouTube slides + transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --slides --plain
# Podcast (RSS feed)
summarize --extract "https://feeds.example.com/podcast.xml" --plain
# Apple Podcasts episode
summarize --extract "https://podcasts.apple.com/us/podcast/EPISODE_ID" --plain
# PDF from URL
summarize --extract "https://example.com/document.pdf" --plain
# PDF from local file
summarize --extract "/path/to/document.pdf" --plain
# Image OCR
summarize --extract "/path/to/image.png" --plain
# Audio transcription
summarize --extract "/path/to/audio.mp3" --plain
# Video transcription
summarize --extract "/path/to/video.mp4" --plain
# Stdin (pipe content)
pbpaste | summarize --extract - --plain
cat document.pdf | summarize --extract - --plain
Always use --plain when extracting for agent consumption. It suppresses ANSI/OSC rendering.
Extraction defaults:
- URLs default to
--format mdin extract mode - Files default to
--format text - PDF requires uvx/markitdown (
--preprocess auto, which is default)
LLM Summarization Mode (Secondary)
Use this mode only when you explicitly want summarize to perform synthesis itself.
# Summarize a URL (requires API key for the chosen model)
summarize "https://example.com" --model anthropic/claude-sonnet-4-5 --length long
# Summarize with a custom prompt
summarize "https://example.com" --prompt "Extract key technical decisions and their rationale"
# Summarize YouTube video
summarize "https://www.youtube.com/watch?v=VIDEO_ID" --length xl
# JSON output with metrics
summarize "https://example.com" --json --model openai/gpt-5-mini
API keys for LLM mode (set in ~/.summarize/config.json or env vars):
ANTHROPIC_API_KEY-- for anthropic/ modelsOPENAI_API_KEY-- for openai/ modelsGEMINI_API_KEY-- for google/ modelsXAI_API_KEY-- for xai/ models
Dependency Matrix
| Feature | Required Deps |
|---|---|
| Web page extraction | None |
| YouTube transcript (captions) | None (web mode) |
| YouTube transcript (no captions) | yt-dlp + whisper or API key |
| YouTube slides | yt-dlp + ffmpeg |
| Podcast transcription | yt-dlp + whisper or API key |
| PDF extraction | uvx/markitdown |
| Image OCR | tesseract |
| Audio/video transcription | whisper-cli (local) or OPENAI_API_KEY |
| Anti-bot sites (Firecrawl) | FIRECRAWL_API_KEY |
| Slide OCR | tesseract |
What is not installed (by design):
whisper-cli/ whisper.cpp -- heavy binary, install when audio transcription is needed- Firecrawl API key -- paid service, configure when anti-bot extraction is needed
- LLM API keys in summarize config -- only add if you use LLM Summarization Mode
Key Flags Quick Reference
| Flag | Purpose | Example |
|---|---|---|
--extract | Raw content extraction, no LLM | summarize --extract URL |
--plain | No ANSI rendering (agent-safe output) | Always use for agents |
--format md|text | Output format (md default for URLs in extract) | --format md |
--youtube auto|web|yt-dlp | YouTube transcript source | --youtube web (captions only) |
--slides | Extract video slides with ffmpeg | --slides --slides-ocr |
--timestamps | Include timestamps in transcripts | --timestamps |
--firecrawl off|auto|always | Firecrawl for anti-bot sites | --firecrawl always |
--preprocess off|auto|always | Preprocessing (markitdown for PDFs) | Default auto |
--markdown-mode | HTML-to-MD conversion mode | --markdown-mode readability |
--timeout | Fetch/LLM timeout | --timeout 2m |
--verbose | Debug output to stderr | Troubleshooting |
--json | Structured JSON output with metrics | --json |
--length | Summary length (LLM mode only) | --length xl |
--model | LLM model (LLM mode only) | `--model anth |