wjs-overlaying-video

Post-production for a video clip: cover, captions, illustrations, CTA, custom motion graphics — all composed in ONE HyperFrames project and rendered in a SINGLE final encode. No cascade of decodes/re-encodes (each cascade pass degrades quality and burns time).

When to use

Downstream of /wjs-segmenting-video — the segmentation skill hands you cropped clips + per-clip SRTs; this skill turns them into upload-ready MP4s with cover/captions/illustrations/CTA.
User has a finished video and wants to dress it up with motion graphics: opening hook, key-quote callout, closing slogan, chapter cards, AI-generated cover as first frame.
User wants HTML/CSS-quality captions on a video (kinetic word-by-word highlighting, custom fonts, large outlined text, seekable per cue).
User wants illustration overlays at specific hook moments — diagrams, big text emphasis, flow charts.

Don't use for:

Splitting one long video into clips → use /wjs-segmenting-video.
Creating the source SRT → use /wjs-transcribing-audio (then /wjs-translating-subtitles if you need a different language).
Full HyperFrames productions where the source isn't a fixed video → use hyperframes directly.
微信视频号 / 抖音 upload (no public API for those) → this skill produces the MP4; upload is manual.

What this skill IS — and IS NOT

Is	Is not
Everything that goes ON TOP of a video clip: cover, caption, chapter, illustration, CTA	Cutting / cropping a video (that's `/wjs-segmenting-video` + `/wjs-reframing-video`)
One HyperFrames composition per clip = ONE final encode	A multi-step decode/encode cascade
`cover` is the literal first frame of the output (platforms auto-pick it as thumbnail)	A separate thumbnail file the user uploads alongside
Captions are HTML/CSS — `-webkit-text-stroke` for white-on-anything readability	libass burn-in (deprecated)
Illustrations: re-usable `stack` / `hammer` patterns + custom escape hatch	One bespoke HTML/CSS per illustration without re-use
AI covers regenerated at native target aspect (1024×1792 for vertical, 1536×1024 for horizontal)	Single 1024×1536 default that letterboxes or crops on the platform

The pipeline

clip.mp4 + clip.zh-CN.burn.srt   (from /wjs-segmenting-video hand-off)
   ↓
1. (Optional) Generate AI cover via gpt-image-2
   make_cover.py --segments S.json --out output/ --size 1024x1792
   cover_NN_slug.png

2. Scaffold a HyperFrames project per clip
   hf_clip_NN/1080/{index.html, clip.mp4, cover.png, captions.json}

3. Compose: cover scene + body video + caption track + chapter chip
            + 1-2 illustrations at hook moments + CTA scene

4. npm run check (lint + validate + visual inspect)
   npm run render → upload-ready MP4

A 2-minute vertical 1080×1920 composition renders in ~2-3 min on M-series Mac.

Color: tone-map HLG/HDR source → SDR BEFORE compositing

Only tone-map genuinely HLG/HDR sources. If the body clip is ALREADY Rec.709 SDR — e.g. a graded multicam render, or polysync output where an S-Log3→709 LUT was already applied — running the HLG tone-map recipe on it washes/darkens the already-correct color. build_hf_clips.py's tonemap_to_sdr now probes color_transfer (_is_hlg_hdr): HLG/PQ → tone-map; otherwise a straight re-encode with dense keyframes (no tone-map). Either way you still get the -g 30 dense-keyframe encode HyperFrames needs.

iPhone / modern-camera footage is often HLG HDR (bt2020 / arib-std-b67). If you feed that straight into HyperFrames it either renders washed-out ("发白") or, with a naive --sdr, too dark ("发黑"); and the HDR x265 path can hang the renderer. Pre-convert the body clip to SDR (bt709) 30fps h264 with a locked zscale tone-map, then composite the SDR clip.

The verified recipe (tonemap_to_sdr() in build_hf_clips.py). npl=203 matches macOS-native (qlmanage) reference brightness; hable keeps contrast; this preserves the ORIGINAL look (natural skin / foliage / brick), no wash, no darkening:

# zscale-capable ffmpeg — Homebrew's lacks zscale/tonemap.
# imageio-ffmpeg ships one: .../imageio_ffmpeg/binaries/ffmpeg-macos-aarch64-v7.1
TONEMAP_VF = ("zscale=tin=arib-std-b67:min=bt2020nc:pin=bt2020:t=linear:npl=203,"
              "format=gbrpf32le,tonemap=tonemap=hable:desat=0,"
              "zscale=t=bt709:m=bt709:p=bt709:r=tv,format=yuv420p,fps=30")
# encode: libx264 -crf 18 -color_primaries/-trc/-colorspace bt709
#         -g 30 -keyint_min 30 -movflags +faststart   ← see gotcha below

Dense-keyframe gotcha. HyperFrames seeks the body video frame-by-frame. A clip with sparse keyframes (long GOP) makes it freeze on stale frames — the render log warns Video "video" has sparse keyframes. Always encode the SDR clip with -g 30 -keyint_min 30 (one keyframe per frame-second) so every seek lands clean.

Verify the render log says No HDR sources detected — rendering SDR. If it says HDR detected, your clip wasn't tone-mapped — fix that first.

Version stamp (every output)

Stamp 「skill名字 + 版本号」 bottom-right, shown during the END/CTA scene, so every render is traceable to the pipeline version that made it. Bump VERSION in build_hf_clips.py on each pipeline change.

#ver-stamp { position: absolute; right: 28px; bottom: 28px; z-index: 30;
  font-size: 20px; color: rgba(150,150,156,0.55); letter-spacing: 0.06em; }

<div id="ver-stamp" class="clip" data-start="{cta_start}" data-duration="{cta_dur}"
     data-track-index="2">wjs-overlaying-video v1.3</div>

Standard overlay types (the 6 building blocks)

Every clip's final composition is built from some combination of these. The agent picks the right ones per clip — typically all 6 for a podcast highlight, or just 1-2 for a single annotation overlay.

1. `cover` — full-frame AI image as first frame

The cover IS the first frame (no animation, no zoom) so platforms that auto-pick the first frame as the thumbnail get your designed cover by default. Always verify with ffmpeg -ss 0 -vframes 1 — frame 0 must NOT be black or platform thumbnails will be black.

HTML:

<div id="cover" class="clip" data-start="0" data-duration="1.6"
     data-track-index="1" data-layout-allow-overflow>
  <img src="cover.png" alt="" data-layout-allow-overflow />
</div>

CSS:

#cover { position: absolute; inset: 0; background: #0c0d10; overflow: hidden; }
#cover img { position: absolute; inset: 0; width: 100%; height: 100%; object-fit: cover; }

Generation: use /wjs-segmenting-video/scripts/make_cover.py (wraps gpt-image-2 images edit with the midpoint frame as ref):

# For 1080×1920 vertical output (视频号 / 抖音):
make_cover.py --segments S.json --out output/ --size 1024x1792 [--single N]

# For 1920×1080 horizontal output (YouTube / B站):
make_cover.py --segments S.json --out output/ --size 1536x1024

Aspect must match output frame. --size 1024x1536 (2:3, the script default) gets letterboxed or cropped on 9:16 output — always pass 1024x1792 for vertical. The cover image's aspect is what the viewer sees full-frame, so mismatch is visible. Re-roll one with --single N; codex provider can transient-fail mid-batch.

Codex auth required: the script calls codex CLI via gpt-image-2-skill. If ~/.codex/auth.json is missing, the script errors. See gpt-image-2-skill for setup.

Reference frame must match the OUTPUT orientation. make_cover reads output/frame_NN_slug.jpg as the photographic background it keeps. For a vertical clip that came from a horizontal two-person source, the default frame_NN is the horizontal two-shot — feeding that to a 1024x1792 cover crams both people into portrait awkwardly. Replace frame_NN_slug.jpg with a vertical single-speaker frame pulled from the already-cropped body clip first (ffmpeg -ss <t> -i clip_vert.mp4 -frames:v 1 frame_NN_slug.jpg), then run make_cover. The cover

wjs-overlaying-video

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Recibe nuevas skills de Design e Frontend todos los lunes

wjs-overlaying-video

When to use

What this skill IS — and IS NOT

The pipeline

Color: tone-map HLG/HDR source → SDR BEFORE compositing

Version stamp (every output)

Standard overlay types (the 6 building blocks)

1. `cover` — full-frame AI image as first frame

Comentarios · Sin comentarios

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Recibe nuevas skills de Design e Frontend todos los lunes

wjs-overlaying-video

When to use

What this skill IS — and IS NOT

The pipeline

Color: tone-map HLG/HDR source → SDR BEFORE compositing

Version stamp (every output)

Standard overlay types (the 6 building blocks)

1. cover — full-frame AI image as first frame

Comentarios · Sin comentarios

1. `cover` — full-frame AI image as first frame