fb-ad-video-studio
Direct-response video ads built as code. HyperFrames (HeyGen's HTML-video framework) is the engine; this skill is the opinionated FB/IG/TikTok ad recipe layered on top — distilled from 23 production iterations of a real founder ad.
It produces two families:
- Motion-graphics spot — kinetic typography + Lottie/CSS motion + VO + SFX. No presenter footage. Fully self-serve, most templatable, fastest.
- Talking-head founder ad — presenter video with a motion-graphic overlay rail, the proven speaker PIP arc, synced captions, and an SFX rail. Needs a user-supplied recording.
Prerequisites (do not reimplement the framework)
This skill assumes HyperFrames is available. It depends on, and defers framework mechanics to, these skills/tools — invoke them, don't duplicate them:
npx hyperframesCLI (Node ≥ 22, FFmpeg, Docker for render on Pop!_OS/Linux)hyperframesskill — composition authoring,data-*semantics, GSAP timeline registrationhyperframes-cliskill — init / lint / inspect / preview / renderhyperframes-mediaskill — TTS / transcribe / background removal
If those skills aren't installed: npx skills add heygen-com/hyperframes.
Brand fidelity (DESIGN.md) — check FIRST
Before any client ad, run the pre-flight check. If a DESIGN.md exists
(composition project root, or ~/claude_work/brand-kits/<slug>/), it is the
source of truth. HyperFrames reads DESIGN.md/design.md natively, so
syncing the client file into the composition project root makes brand
colors/fonts authoritative with zero code change; also map its tokens onto the
template :root vars (--accent, --ink, --bg, --mute) so captions,
kinetic type, lockup, and PIP borders are on-brand. Unresolved TODO: VERIFY
or lint errors → stop and resolve with the user. No DESIGN.md → use the
brand-kit
skill to extract + verify one. Never guess a client's brand. Keep the
proven structure/pacing; only the brand layer comes from DESIGN.md.
When to use / not
Use: any paid social video ad — motion-graphics spot, founder/UGC talking-head ad, kinetic-type promo, Reel/Short, "make a video like this winning ad".
Don't use: static images → image-studio. Long-form/explainer/YouTube content → generic hyperframes. AI-generated talking avatars → HeyGen avatar tools. Diagrams → excalidraw.
The two templates
Copy a folder from templates/ into a fresh project and edit.
| Template | Length | Needs | Best for |
|---|---|---|---|
motion-graphics-spot/ | 15–30s | VO script only | Offer/feature ads, retargeting, no on-camera talent |
talking-head-founder-ad/ | 45–75s | Presenter recording + VO | Founder story, trust/authority, cold traffic |
Both ship as real HyperFrames compositions (index.html + hyperframes.json + meta.json) with generic brand tokens (--accent, --ink, …) and [BRACKET] placeholder copy. They are scaffolds — rework freely.
Production pipeline (the proven order)
Run scripts from scripts/. This order is the one that survived 23 iterations:
- Script →
script.md. One idea, problem-first, reader is the hero. Keep beats ≤ 5s. - VO →
scripts/tts.py(ElevenLabs Chris, settings baked in). Talking-head: record the presenter instead. - Cut silences (talking-head only) →
scripts/cut-silences.py— whisper VAD + ffmpeg, trims to 0.18s gaps. Typical 77s→67s. - Re-whisper the cut →
scripts/whisper-words.py— word-level timestamps drive every caption, SFX, and visual sync. Timings shift after cutting; always re-whisper. - Re-encode footage (talking-head only) →
scripts/reencode-footage.sh— fixes sparse-keyframe seek failures. - Build the composition — edit the template HTML; anchor captions/SFX/reveals to the whisper word starts.
- SFX →
scripts/fetch-sfx.sh— Mixkit, tight-trimmed. VO + SFX beats high-converting ads; music is optional (mix −18 dB if used). - Lint + inspect →
npx hyperframes lint && npx hyperframes inspect. - Render →
scripts/render.sh(wrapsnpx hyperframes render --docker).
See references/ for the full reasoning behind every step.
Reverse-template workflow (the unlock)
Same superpower as image-studio, for video. When the user shares a winning video ad ("make one like this", "match this competitor", "build a template from this"):
- Get the reference (file/URL). Extract the structure, not the content.
- Map the scene arc (hook / problem / agitate / solution / proof / CTA) with timestamps.
- Measure pacing — cut cadence, word-reveal duration, beat length, SFX density.
- Note the device — talking-head PIP arc? pure kinetic type? screen-recording inset?
- Write a parametric HyperFrames composition with variables for the swappable parts (VO, presenter clip, brand tokens, headline beats, proof points, CTA, offer).
- Render a build to validate the structure matches.
- Save it in
templates/as<vertical>-<hook>.html. The template is the deliverable; future ads reuse it with new VO/footage/brand.
Battle-tested patterns (apply by default)
Full detail in references/patterns.md and references/audio-sources.md. The non-negotiables:
- Cold-open with the motion graphic alone (speaker hidden 0–3s) — strongest hook.
- Speaker PIP arc: hidden → PIP → full → PIP → full → hidden (lockup). Rhythmic variety, never "covered".
- Lottie clip wrappers alive from t=0 (
data-start="0"full duration); control visibility with GSAP opacity, not latedata-start. Looping seek must be clip-relative. - No CSS
transformon GSAP-animated elements — GSAP overwrites the whole property. Center withgsap.set(el,{xPercent:-50}). - Captions at y≈1180–1400, anchored to actual whisper word starts (not script estimates). Split beats > 5s.
- Pacing: 0.18s drift cuts (not 0.5s fades), 0.25s word reveals, ElevenLabs
<break>≤ 0.4s. - Audio: Mixkit SFX (previews are full clips), trim tight (whoosh 0.6s, punch 0.22s, ding 0.65s). Skip music for direct-response, or −18 dB. ElevenLabs SFX = mushy, avoid.
- Render: Docker required on Pop!_OS. Fix
media_missing_id(silent audio), add hard-killtl.set(el,{opacity:0})after fades for seek safety.
Output
A rendered MP4 (1080×1920 vertical default; 1080×1080 and 1080×1350 supported) plus the reusable composition. Deliver renders to the user; save the generalized composition back to templates/ when it's a new reusable structure.