Summarize

Extract clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without hand-coding brittle scraper logic.

Use this skill when you need deterministic extraction for YouTube, podcast feeds, PDFs, scanned images, or local media files.

Terminology used in this file:

DOM: Document Object Model, the page element structure used by browser-based extractors.
OCR: Optical character recognition (extracting text from images/scans).
ANSI codes: Terminal color/control sequences; --plain removes them for machine parsing.

Setup

brew tap steipete/tap
brew install summarize

Claude Code: copy this skill folder into .claude/skills/summarize/
Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, optional dependencies, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the summarize skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Quick Start

Run one extraction flow end-to-end:

summarize --version
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
summarize --extract "/path/to/document.pdf" --plain

Use --extract --plain as the default pattern for deterministic, non-ANSI output.

Decision Tree: summarize vs Other Tools

Need content from the web?
  |
  +-- Static web page (article, docs, blog)?
  |     --> WebFetch (built-in, zero deps, faster)
  |     --> Jina r.jina.ai (zero install alternative)
  |     --> summarize ONLY if above tools fail or return garbage
  |
  +-- JS-heavy SPA / dynamic content?
  |     --> Crawl4AI crwl (full browser rendering)
  |     --> summarize will NOT help here (no JS rendering)
  |
  +-- Anti-bot / paywalled / Cloudflare-protected?
  |     --> summarize --firecrawl always (requires FIRECRAWL_API_KEY)
  |     --> browser-based workflow as fallback
  |
  +-- YouTube video?
  |     --> summarize --extract (ONLY option for transcript)
  |     --> Add --youtube web for captions-only (faster)
  |     --> Add --slides for visual slide extraction
  |
  +-- Podcast / RSS feed?
  |     --> summarize --extract (ONLY option)
  |     --> Supports Apple Podcasts, Spotify, RSS feeds, Podbean, etc.
  |
  +-- PDF (URL or local file)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: uvx/markitdown (brew install uv)
  |
  +-- Image (OCR)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: tesseract
  |
  +-- Audio / video file?
        --> summarize --extract (ONLY CLI option)
        --> Requires: whisper-cli (local) or OPENAI_API_KEY (cloud)

Rule of thumb: summarize is the default for media extraction (YouTube, podcasts, audio, video, images). For web pages, prefer WebFetch/Jina/Crawl4AI depending on DOM complexity (how hard the page structure is to parse). Use summarize for web only when other tools fail.

Extraction Mode (Primary)

--extract prints raw extracted content and exits. No LLM involved. Use this first. You can handle any downstream synthesis in your own workflow.

# Web page extraction (plain text, default)
summarize --extract "https://example.com" --plain

# Web page extraction (markdown format)
summarize --extract "https://example.com" --format md --plain

# YouTube transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain

# YouTube transcript with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain

# YouTube transcript formatted as markdown (requires LLM -- uses API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain

# YouTube slides + transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --slides --plain

# Podcast (RSS feed)
summarize --extract "https://feeds.example.com/podcast.xml" --plain

# Apple Podcasts episode
summarize --extract "https://podcasts.apple.com/us/podcast/EPISODE_ID" --plain

# PDF from URL
summarize --extract "https://example.com/document.pdf" --plain

# PDF from local file
summarize --extract "/path/to/document.pdf" --plain

# Image OCR
summarize --extract "/path/to/image.png" --plain

# Audio transcription
summarize --extract "/path/to/audio.mp3" --plain

# Video transcription
summarize --extract "/path/to/video.mp4" --plain

# Stdin (pipe content)
pbpaste | summarize --extract - --plain
cat document.pdf | summarize --extract - --plain

Always use --plain when extracting for agent consumption. It suppresses ANSI/OSC rendering.

Extraction defaults:

URLs default to --format md in extract mode
Files default to --format text
PDF requires uvx/markitdown (--preprocess auto, which is default)

LLM Summarization Mode (Secondary)

Use this mode only when you explicitly want summarize to perform synthesis itself.

# Summarize a URL (requires API key for the chosen model)
summarize "https://example.com" --model anthropic/claude-sonnet-4-5 --length long

# Summarize with a custom prompt
summarize "https://example.com" --prompt "Extract key technical decisions and their rationale"

# Summarize YouTube video
summarize "https://www.youtube.com/watch?v=VIDEO_ID" --length xl

# JSON output with metrics
summarize "https://example.com" --json --model openai/gpt-5-mini

API keys for LLM mode (set in ~/.summarize/config.json or env vars):

ANTHROPIC_API_KEY -- for anthropic/ models
OPENAI_API_KEY -- for openai/ models
GEMINI_API_KEY -- for google/ models
XAI_API_KEY -- for xai/ models

Dependency Matrix

Feature	Required Deps
Web page extraction	None
YouTube transcript (captions)	None (web mode)
YouTube transcript (no captions)	yt-dlp + whisper or API key
YouTube slides	yt-dlp + ffmpeg
Podcast transcription	yt-dlp + whisper or API key
PDF extraction	uvx/markitdown
Image OCR	tesseract
Audio/video transcription	whisper-cli (local) or OPENAI_API_KEY
Anti-bot sites (Firecrawl)	FIRECRAWL_API_KEY
Slide OCR	tesseract

What is not installed (by design):

whisper-cli / whisper.cpp -- heavy binary, install when audio transcription is needed
Firecrawl API key -- paid service, configure when anti-bot extraction is needed
LLM API keys in summarize config -- only add if you use LLM Summarization Mode

Key Flags Quick Reference

Flag	Purpose	Example
`--extract`	Raw content extraction, no LLM	`summarize --extract URL`
`--plain`	No ANSI rendering (agent-safe output)	Always use for agents
`--format md\|text`	Output format (md default for URLs in extract)	`--format md`
`--youtube auto\|web\|yt-dlp`	YouTube transcript source	`--youtube web` (captions only)
`--slides`	Extract video slides with ffmpeg	`--slides --slides-ocr`
`--timestamps`	Include timestamps in transcripts	`--timestamps`
`--firecrawl off\|auto\|always`	Firecrawl for anti-bot sites	`--firecrawl always`
`--preprocess off\|auto\|always`	Preprocessing (markitdown for PDFs)	Default `auto`
`--markdown-mode`	HTML-to-MD conversion mode	`--markdown-mode readability`
`--timeout`	Fetch/LLM timeout	`--timeout 2m`
`--verbose`	Debug output to stderr	Troubleshooting
`--json`	Structured JSON output with metrics	`--json`
`--length`	Summary length (LLM mode only)	`--length xl`
`--model`	LLM model (LLM mode only)	`--model anth

summarize

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

oh-my-issues

claude-mem

Get new Desenvolvimento skills every Monday