Animation Review
Review animations and interactions by recording the browser and sending the video to Gemini for structured analysis.
Prerequisites
ffmpeginstalled (brew install ffmpeg)playwrightPython package with Chromium (pip install playwright && playwright install chromium)google-genaiPython package (pip install google-genai)GEMINI_API_KEYenvironment variable set- For manual recording only: macOS Screen Recording permission granted to terminal app
How Gemini fits in
Gemini is the eyes — it watches the recording and describes what it sees with precision. You are the hands — you translate those observations into code changes with full codebase context.
Treat all Gemini analysis as observational evidence, not authoritative diagnosis. Gemini cannot see the code. When it suggests root causes or implementation fixes, treat these as hypotheses from an external observer who can see the symptoms but not the source. Its frame-level descriptions of what happens visually are reliable. Its theories about why are informed guesses that you should verify against the actual code.
This applies across all modes, but especially to diagnose (where Gemini hypothesizes about bugs) and inspire (where Gemini decomposes effects without knowing your tech stack).
Interpreting timestamps and durations
Gemini can only observe what's in the sampled frames. Its temporal precision depends on the analysis FPS:
| Mode | FPS | Frame interval | What this means |
|---|---|---|---|
| check | 5 | 200ms | A "300ms animation" could be anywhere from 200-400ms (1-2 frames). Timestamps are ±200ms. Good enough for "does it happen" but not for timing accuracy. |
| review | 12 | 83ms | A "300ms animation" is ~3-4 frames. Easing character is visible. Durations are accurate within ~80ms — enough to judge "too fast" vs "too slow". |
| diagnose/inspire | 24 | 42ms | A "300ms animation" is ~7 frames. Easing curves, stagger offsets, and glitch moments are precisely observable. Durations are accurate within ~40ms. |
When Gemini reports a duration like "~400ms with ease-out", consider the mode's precision. At 5fps that could really be 200-600ms. At 24fps it's likely 360-440ms. The system prompt tells Gemini to report in frame counts alongside milliseconds so you can judge precision yourself.
If you're debugging a timing issue and Gemini's temporal precision isn't sufficient, re-run at a higher mode. A check that reveals "something's off with the timing" can be followed up with a diagnose for frame-precise detail. You can also escalate just a specific time range — see "Escalating a region" below.
Workflow
1. Determine source and mode
There are two entry points — figure out which applies:
User provides a video file. The user has a screen recording (.mov, .mp4, .webm) — either of their own UI or a reference they want to recreate. Skip straight to step 3 (Analyze). No recording step needed. User-provided videos are typically 30-60fps, which is more than enough for any analysis mode — Gemini downsamples to the mode's FPS automatically.
Agent captures from the browser. The animation is running in a local dev server and needs to be recorded. Proceed to step 2 (Record), then step 3 (Analyze).
Choose the analysis mode based on what question you're trying to answer:
| Mode | FPS | Model | Question | When to use |
|---|---|---|---|---|
| check | 5 | Flash | "Does it work?" | First pass — verify the animation fires, completes, and doesn't visually break. Early development, after wiring up a new animation, smoke testing. |
| review | 12 | Flash | "How does it feel?" | Design and polish — evaluate easing, timing, choreography, and overall quality. Use when the animation works but needs refinement, or for design review before shipping. |
| diagnose | 24 | Pro | "What's going wrong?" | Bug investigation — the animation has a specific visual problem (jump, stutter, misalignment, timing glitch). Need frame-precise evidence to locate the root cause in code. |
| inspire | 24 | Pro | "What's happening here?" | Reference analysis — the user has a video showing a desired effect and wants a technical decomposition to guide implementation. |
Decision guide for agents:
- User says "check if this works" / "does the animation play" / just implemented something new → check
- User says "how does this look" / "review the animation" / "is the timing right" / design feedback → review
- User says "there's a bug" / "it jumps" / "something's wrong with" / describes a specific visual issue → diagnose
- User provides a video of someone else's UI / "I want it to look like this" / "recreate this effect" → inspire
- For inspire: if the user hasn't said which part of the video or which specific effect to focus on, ask them before running the analysis. A full-video decomposition without focus will be vague and unhelpful.
If the mode isn't clear from context, ask the user.
2. Record (skip if user provided a video)
The recording step captures browser interactions as video. Record generously, analyze precisely — Playwright's video always includes the full session (page load through close), so capture more than you need. The recorder always records at 24fps (the Gemini analysis maximum) so that any analysis mode can sample at full resolution, including escalation of a specific time range to diagnose. The recorder logs a timeline showing when each action executed relative to the video start — use these timestamps with --start/--end on analyze.py to focus Gemini on the relevant window.
Automated (preferred) — Claude drives the browser
Use record_browser.py when Claude is performing the interactions. Playwright records the page internally, then transcodes to mp4. No screen recording permissions needed.
# Record a page load animation
python3 ~/.claude/skills/animation-review/scripts/record_browser.py http://localhost:3000
# Click a button, wait for animation, record everything
python3 ~/.claude/skills/animation-review/scripts/record_browser.py http://localhost:3000 \
-a 'click:.play-btn' -a 'wait:2000'
# Carousel: click through slides
python3 ~/.claude/skills/animation-review/scripts/record_browser.py http://localhost:5173/carousel \
-a 'wait:1000' -a 'click:.next' -a 'wait:800' -a 'click:.next' -a 'wait:800'
# Scroll-triggered animation
python3 ~/.claude/skills/animation-review/scripts/record_browser.py http://localhost:3000 \
-a 'scroll:500' -a 'wait:1000' -a 'scroll:500' -a 'wait:1000'
# Custom viewport size
python3 ~/.claude/skills/animation-review/scripts/record_browser.py http://localhost:3000 \
-W 1920 -H 1080 -a 'click:.hero-cta' -a 'wait:3000'
Action format for -a:
wait:MS— wait N millisecondsclick:SELECTOR— click an elementscroll:PIXELS— scroll down (negative for up)hover:SELECTOR— hover over an elementpress:KEY— press a key (Enter, Tab, ArrowRight, etc.)type:SELECTOR|TEXT— type into an element (pipe separates selector from text)
Construct actions from what you know about the animation trigger. For page-load animations, no actions are needed — the recording captures from navigation onward.
Use --wait-before to adjust the pause after page load (default 500ms) and --wait-after to adjust the pause after the last action (default 1000ms).
The recorder prints a timeline to stderr showing when each action ran:
→ click:.play-btn (at 1.8s)
→ wait:2000 (at 1.9s)
Timeline: actions 1.8s–3.9s, total 4.9s
These timestamps are measured Python-side (wall clock), not from the video encoder, so they're approximate — always add ~1s buffer on each side when using --start/--end. For the example above, use --start 1s --end 5s.
After recording: notify the user
Once the recording finishes, **immediately move it to `.animatio