SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

watch

Marketing

Watch a video from YouTube, Instagram, X/Twitter, Vimeo, TikTok or any of ~1800 yt-dlp sites (or a local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or local mlx-whisper fallback, no API key), and hands the result to Claude so it can answer questions about what's in the video.

3estrelas
Ver no GitHub ↗Autor: mathiaschuLicença: MIT

/watch — Claude watches a video

You don't have a video input; this skill gives you one. A Python script downloads the video, extracts frames as JPEGs, gets a timestamped transcript (native captions first, then local mlx-whisper as fallback — runs on-device, no API and no key), and prints frame paths. You then Read each frame path to see the images and combine them with the transcript to answer the user.

Step 0 — Setup preflight (runs every /watch invocation, silent on success)

Python interpreter: every python3 ... command in this skill is for macOS/Linux. On Windows, substitute python — the python3 command on Windows is the Microsoft Store stub and will not run the script.

Before every /watch run, verify that dependencies are in place:

python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py" --check

This is a <100ms lookup. On exit 0, the script emits nothing — proceed to Step 1 without comment. Do NOT announce "setup is complete" to the user — they don't need a status message on every turn. The only acceptable user-visible output from Step 0 is when remediation is required.

On non-zero exit, follow the table:

ExitMeaningAction
2Missing binaries (ffmpeg / ffprobe / yt-dlp)Run installer
3No local whisper engine (mlx-whisper / openai-whisper)Run installer, then tell user the pip3 command it prints
4Both missingRun installer

The installer is idempotent — safe to re-run:

python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py"

On macOS with Homebrew, it auto-installs ffmpeg and yt-dlp. On Linux/Windows, it prints the exact install commands for the user to run. For transcription it checks for a local whisper engine (mlx-whisper preferred on Apple Silicon, openai-whisper as a CPU fallback) and prints the pip3 install command if neither is present. No API key, no config file, no .env — transcription runs entirely on-device.

If no whisper engine is installed: run the installer and relay the exact pip3 install … command it prints (mlx-whisper on Apple Silicon, openai-whisper on Windows/Linux/Intel Macs — do not assume mlx, it only installs on Apple Silicon). If they don't want to install it, proceed with --no-whisper and tell them videos without native captions will come back frames-only.

Structured mode (optional): python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py" --json emits {status, missing_binaries, whisper_backend, has_whisper, platform} where status is one of ready | needs_install | needs_whisper | needs_install_and_whisper.

Within a single session, you can skip Step 0 on follow-up /watch calls — once --check returned 0, nothing about the environment changes between turns.

When to use

  • User pastes a video URL (YouTube, Vimeo, X, TikTok, Twitch clip, most yt-dlp-supported sites) and asks about it.
  • User points at a local video file (.mp4, .mov, .mkv, .webm, etc.) and asks about it.
  • User types /watch <url-or-path> [question].

Recommended limits

  • Best accuracy: videos under 10 minutes. Frame coverage scales inversely with duration.
  • Hard caps: 100 frames total and 2 fps. Token cost grows with frame count, so the script targets a frame budget by duration (and never exceeds 2 fps even when the budget would imply more):
    • ≤30s → ~1-2 fps (up to 30 frames)
    • 30s-1min → ~40 frames
    • 1-3min → ~60 frames
    • 3-10min → ~80 frames
    • >10min → 100 frames, sparsely spaced (warning printed)
  • If the user hands you a long video, consider asking whether they want a specific section before burning tokens on a sparse scan.

How to invoke

Step 1 — parse the user input. Separate the video source (URL or path) from any question the user asked. Example: /watch https://youtu.be/abc what language is this in? → source = https://youtu.be/abc, question = what language is this in?.

Step 2 — run the watch script. Pass the source verbatim. Do not shell-escape it yourself beyond normal quoting:

python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" "<source>"

Optional flags:

  • --start T / --end T — focus on a section. Accepts SS, MM:SS, or HH:MM:SS. When either is set, fps auto-scales denser (see "Focusing on a section" below).
  • --max-frames N — lower the cap for tighter token budget (e.g. --max-frames 40)
  • --resolution W — change frame width in px (default 512; bump to 1024 only if the user needs to read on-screen text)
  • --fps F — override auto-fps (clamped to 2 fps max)
  • --out-dir DIR — keep working files somewhere specific (default: an auto-generated tmp dir)
  • --cookies-from-browser B — read cookies from a local browser (chrome, firefox, safari, edge, brave, …) for login-gated sources
  • --cookies FILE — path to a Netscape-format cookies.txt (alternative to --cookies-from-browser)
  • --whisper mlx|openai-whisper — force a specific local Whisper engine (default: prefer mlx-whisper, fall back to openai-whisper)
  • --no-whisper — disable the local Whisper fallback entirely (frames-only if no captions)

Login-gated sources (Instagram, X, private/age-restricted videos)

Public videos (most of YouTube, Vimeo, TikTok, Loom, etc.) download with no auth. But some sources gate the download behind a login: Instagram, X/Twitter, age-restricted or private/unlisted YouTube, members-only content. Those need the user's own cookies.

Do NOT pass cookies pre-emptively. Always try the plain download first. Only reach for cookies when it fails with a login / private / 403 / "login required" / "rate-limit" error. The user never types the flag themselves — you add it and re-run. When that happens, walk the user through it (these are sub-steps of the main Step 2, not the main flow):

(a) Ask which browser they're logged into. "To grab this Instagram video I need to borrow the cookies from a browser where you're logged into Instagram. Which one are you logged in on — Chrome, Safari, Firefox, Edge, or Brave?" Supported values: chrome, firefox, safari, edge, brave, chromium, opera, vivaldi.

(b) Re-run with that browser (on Windows use python, not python3 — see Step 0):

python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" "https://www.instagram.com/reel/XXXX/" --cookies-from-browser chrome

(c) Handle the common per-browser snags (tell the user the specific fix, don't just retry):

  • Chrome on macOS locks its cookie DB while open and its cookies are encrypted. Two things may happen: (1) extraction fails with "could not copy/open the cookie database" → tell the user to fully quit Chrome (Cmd-Q, not just close the window) and retry; (2) a macOS Keychain prompt pops up ("… wants to use your confidential information stored in Chrome Safe Storage") → tell the user to click Always Allow. If Chrome keeps fighting it, suggest they switch to Safari or Firefox.
  • Chrome on Windows also locks the DB while running → tell the user to fully close it (check the system tray) and retry.
  • Safari on macOS needs the app running this (the terminal / Claude Code) to have Full Disk Access (System Settings → Privacy & Security → Full Disk Access). If Safari extraction fails, point them there, or fall back to another browser.
  • Firefox usually works without closing it — good fallback on any OS when Chrome is stubborn.

(d) Manual fallback if browser extraction just won't cooperate (most reliable, works on macOS / Windows / Linux): guide the user to export a cookies.txt and pass it with --cookies:

  1. Install a cookies-export extension — "Get cookies.txt LOCALLY" (open-source, exports Netscape format) for Chrome/Edge, or "cookies.txt" for Firefox.
  2. Open and log into the site (e.g. instagram.com).
  3. Click the extension → Export → save the file (e.g. ~/Downloads/cookies.txt).
  4. Re-run: `python3 "${CLAUDE_SKILL_DIR}/scripts/watch.

Como adicionar

/plugin marketplace add mathiaschu/watch

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.