/watch — Claude watches a video
You don't have a video input; this skill gives you one. A Python script downloads the video, extracts frames as JPEGs, gets a timestamped transcript (native captions first, then local mlx-whisper as fallback — runs on-device, no API and no key), and prints frame paths. You then Read each frame path to see the images and combine them with the transcript to answer the user.
Step 0 — Setup preflight (runs every /watch invocation, silent on success)
Python interpreter: every python3 ... command in this skill is for macOS/Linux. On Windows, substitute python — the python3 command on Windows is the Microsoft Store stub and will not run the script.
Before every /watch run, verify that dependencies are in place:
python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py" --check
This is a <100ms lookup. On exit 0, the script emits nothing — proceed to Step 1 without comment. Do NOT announce "setup is complete" to the user — they don't need a status message on every turn. The only acceptable user-visible output from Step 0 is when remediation is required.
On non-zero exit, follow the table:
| Exit | Meaning | Action |
|---|---|---|
2 | Missing binaries (ffmpeg / ffprobe / yt-dlp) | Run installer |
3 | No local whisper engine (mlx-whisper / openai-whisper) | Run installer, then tell user the pip3 command it prints |
4 | Both missing | Run installer |
The installer is idempotent — safe to re-run:
python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py"
On macOS with Homebrew, it auto-installs ffmpeg and yt-dlp. On Linux/Windows, it prints the exact install commands for the user to run. For transcription it checks for a local whisper engine (mlx-whisper preferred on Apple Silicon, openai-whisper as a CPU fallback) and prints the pip3 install command if neither is present. No API key, no config file, no .env — transcription runs entirely on-device.
If no whisper engine is installed: run the installer and relay the exact pip3 install … command it prints (mlx-whisper on Apple Silicon, openai-whisper on Windows/Linux/Intel Macs — do not assume mlx, it only installs on Apple Silicon). If they don't want to install it, proceed with --no-whisper and tell them videos without native captions will come back frames-only.
Structured mode (optional): python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py" --json emits {status, missing_binaries, whisper_backend, has_whisper, platform} where status is one of ready | needs_install | needs_whisper | needs_install_and_whisper.
Within a single session, you can skip Step 0 on follow-up /watch calls — once --check returned 0, nothing about the environment changes between turns.
When to use
- User pastes a video URL (YouTube, Vimeo, X, TikTok, Twitch clip, most yt-dlp-supported sites) and asks about it.
- User points at a local video file (
.mp4,.mov,.mkv,.webm, etc.) and asks about it. - User types
/watch <url-or-path> [question].
Recommended limits
- Best accuracy: videos under 10 minutes. Frame coverage scales inversely with duration.
- Hard caps: 100 frames total and 2 fps. Token cost grows with frame count, so the script targets a frame budget by duration (and never exceeds 2 fps even when the budget would imply more):
- ≤30s → ~1-2 fps (up to 30 frames)
- 30s-1min → ~40 frames
- 1-3min → ~60 frames
- 3-10min → ~80 frames
- >10min → 100 frames, sparsely spaced (warning printed)
- If the user hands you a long video, consider asking whether they want a specific section before burning tokens on a sparse scan.
How to invoke
Step 1 — parse the user input. Separate the video source (URL or path) from any question the user asked. Example: /watch https://youtu.be/abc what language is this in? → source = https://youtu.be/abc, question = what language is this in?.
Step 2 — run the watch script. Pass the source verbatim. Do not shell-escape it yourself beyond normal quoting:
python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" "<source>"
Optional flags:
--start T/--end T— focus on a section. AcceptsSS,MM:SS, orHH:MM:SS. When either is set, fps auto-scales denser (see "Focusing on a section" below).--max-frames N— lower the cap for tighter token budget (e.g.--max-frames 40)--resolution W— change frame width in px (default 512; bump to 1024 only if the user needs to read on-screen text)--fps F— override auto-fps (clamped to 2 fps max)--out-dir DIR— keep working files somewhere specific (default: an auto-generated tmp dir)--cookies-from-browser B— read cookies from a local browser (chrome,firefox,safari,edge,brave, …) for login-gated sources--cookies FILE— path to a Netscape-formatcookies.txt(alternative to--cookies-from-browser)--whisper mlx|openai-whisper— force a specific local Whisper engine (default: prefer mlx-whisper, fall back to openai-whisper)--no-whisper— disable the local Whisper fallback entirely (frames-only if no captions)
Login-gated sources (Instagram, X, private/age-restricted videos)
Public videos (most of YouTube, Vimeo, TikTok, Loom, etc.) download with no auth. But some sources gate the download behind a login: Instagram, X/Twitter, age-restricted or private/unlisted YouTube, members-only content. Those need the user's own cookies.
Do NOT pass cookies pre-emptively. Always try the plain download first. Only reach for cookies when it fails with a login / private / 403 / "login required" / "rate-limit" error. The user never types the flag themselves — you add it and re-run. When that happens, walk the user through it (these are sub-steps of the main Step 2, not the main flow):
(a) Ask which browser they're logged into. "To grab this Instagram video I need to borrow the cookies from a browser where you're logged into Instagram. Which one are you logged in on — Chrome, Safari, Firefox, Edge, or Brave?" Supported values: chrome, firefox, safari, edge, brave, chromium, opera, vivaldi.
(b) Re-run with that browser (on Windows use python, not python3 — see Step 0):
python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" "https://www.instagram.com/reel/XXXX/" --cookies-from-browser chrome
(c) Handle the common per-browser snags (tell the user the specific fix, don't just retry):
- Chrome on macOS locks its cookie DB while open and its cookies are encrypted. Two things may happen: (1) extraction fails with "could not copy/open the cookie database" → tell the user to fully quit Chrome (Cmd-Q, not just close the window) and retry; (2) a macOS Keychain prompt pops up ("… wants to use your confidential information stored in Chrome Safe Storage") → tell the user to click Always Allow. If Chrome keeps fighting it, suggest they switch to Safari or Firefox.
- Chrome on Windows also locks the DB while running → tell the user to fully close it (check the system tray) and retry.
- Safari on macOS needs the app running this (the terminal / Claude Code) to have Full Disk Access (System Settings → Privacy & Security → Full Disk Access). If Safari extraction fails, point them there, or fall back to another browser.
- Firefox usually works without closing it — good fallback on any OS when Chrome is stubborn.
(d) Manual fallback if browser extraction just won't cooperate (most reliable, works on macOS / Windows / Linux): guide the user to export a cookies.txt and pass it with --cookies:
- Install a cookies-export extension — "Get cookies.txt LOCALLY" (open-source, exports Netscape format) for Chrome/Edge, or "cookies.txt" for Firefox.
- Open and log into the site (e.g.
instagram.com). - Click the extension → Export → save the file (e.g.
~/Downloads/cookies.txt). - Re-run: `python3 "${CLAUDE_SKILL_DIR}/scripts/watch.