Video Generation & Editing
Guide for generating, editing, analyzing, and post-processing videos using AI models and FFmpeg-backed tools exposed through the Hyper MCP.
Requirements
This skill assumes the Hyper MCP is connected to your agent so the tools below are available. The underlying providers (OpenAI Sora, Google Veo, ByteDance Seedance, OpenAI TTS, transcription, etc.) are configured under your Hyper integrations.
Tool surface
| Group | Tools |
|---|---|
| Generation | generate_video, sora_remix_video, sora_delete_video |
| Analysis | analyze_video, capture_video_frame, transcribe_video |
| Subtitles & captions | generate_subtitles, burn_subtitles, burn_highlighted_captions |
| Audio | text_to_speech, add_audio_to_video |
| Editing | clip_video, stitch_videos, overlay_text |
Out of scope
- Image generation, ad creative composition, brand extraction — use
image-generationorad-creative-generation. - Posting finished videos to social platforms — use
tiktok,instagram, orlinkedin. - Running paid video campaigns — use
google-ads,meta-ads,tiktok-ads.
Available Tools
| Tool | Purpose | Runs in Background |
|---|---|---|
generate_video | Generate video from text / image prompt | Yes |
sora_remix_video | Modify existing Sora video | Yes |
sora_delete_video | Delete a Sora video | No |
capture_video_frame | Extract frame as image | No |
analyze_video | Watch and understand video content | No |
transcribe_video | Extract audio transcript | No |
generate_subtitles | Create SRT / VTT subtitle file | No |
burn_subtitles | Burn subtitles onto video | Yes |
burn_highlighted_captions | TikTok / karaoke-style word-by-word captions | Yes |
text_to_speech | Generate voiceover audio from text | No |
add_audio_to_video | Add / replace audio track on video | Yes |
clip_video | Extract a time segment from video | Yes |
stitch_videos | Concatenate multiple clips | Yes |
overlay_text | Add text / titles to video | Yes |
Video Understanding
You can watch and analyze any video using analyze_video. This sends the video to a multimodal AI that sees both visual and audio content.
When to use analyze_video
- After generating a video: check if it matches your intent
- Before stitching: verify scene consistency across clips
- Quality review: check for glitches, character drift, lighting issues
- Content understanding: "what happens in this video?"
Analysis Types
analyze_video(file_id="...", analysis_type="general")
analyze_video(file_id="...", analysis_type="quality_review")
analyze_video(file_id="...", analysis_type="scene_breakdown")
analyze_video(file_id="...", question="Does this match: [original prompt]?")
Self-Review Workflow
Always review generated videos before delivering to the user:
result = generate_video(prompt="...", model="veo-3.1-generate-preview")
review = analyze_video(file_id="video_file_id", analysis_type="quality_review")
# If issues found, regenerate with adjustments. If quality is good, proceed to editing.
Script Planning
For longer, cohesive videos, plan the FULL SCRIPT before generating:
1. Scene Breakdown
- Scenes: break story into segments
- Sora: 4 / 8 / 12 seconds per scene
- Veo: 4-8 seconds per scene
- Seedance: 4-15 seconds per scene (native audio with lip-sync)
- Camera: shot type (wide, close-up, tracking), angles, movement
- Transitions: how each scene connects to the next
- Consistency: character descriptions, color palette, visual style
Scene Chaining Technique
To create seamless multi-scene videos:
Scene 1 (text-to-video)
generate_video(prompt="...", model="veo-3.1-generate-preview")
Scene 2+ (image-to-video)
capture_video_frame(video_file_id="scene1_file_id", frame_position="last")
generate_video(prompt="continuation: ...", image_file_id="captured_frame_id")
Repeat: extract last frame → generate next scene.
Stitching Scenes Together
After generating all scenes, combine them:
stitch_videos(video_file_ids=["scene1_id", "scene2_id", "scene3_id"])
stitch_videos(
video_file_ids=["scene1_id", "scene2_id", "scene3_id"],
transition="crossfade",
crossfade_duration=0.5,
)
Subtitle / Caption Workflow
Full pipeline: Video → Transcript → Subtitles → Burned Video
transcript = transcribe_video(file_id="video_file_id")
subs = generate_subtitles(file_id="video_file_id", transcript=transcript, format="srt")
burn_subtitles(
video_file_id="video_file_id",
subtitle_file_id=subs.file_id,
style="bold_outline",
position="bottom",
)
Subtitle Styles
| Style | Effect |
|---|---|
default | Plain white text |
bold_outline | Bold white with black outline (recommended) |
shadow | White text with drop shadow |
box | White text on semi-transparent black box |
Text Overlays
Add titles, lower-thirds, CTAs, and other graphics:
overlay_text(
video_file_id="video_file_id",
overlays=[
{
"text": "Episode 1: The Beginning",
"start_time": 0.0,
"end_time": 3.0,
"position": "center",
"font_size": 48,
"color": "white",
"background": "black@0.5",
},
{
"text": "Subscribe for more!",
"start_time": 10.0,
"end_time": 14.0,
"position": "bottom-right",
"font_size": 28,
},
],
)
Overlay Positions
top, bottom, center, top-left, top-right, bottom-left, bottom-right
Voiceover / Narration
Generate natural-sounding voiceover with TTS and add it to any video:
audio = text_to_speech(
text="Welcome to our product. Here's how it works...",
voice="nova",
model="tts-1",
)
add_audio_to_video(
video_file_id="video_id",
audio_file_id=audio.file_id,
mode="replace",
)
add_audio_to_video(
video_file_id="video_id",
audio_file_id=audio.file_id,
mode="mix",
audio_volume=0.8,
)
Available Voices
alloy, ash, coral, echo, fable, nova (recommended), onyx, sage, shimmer
Highlighted Captions (TikTok / Reels Style)
Add word-by-word highlighted captions that light up as spoken:
burn_highlighted_captions(
video_file_id="video_id",
style="tiktok",
highlight_color="#3B82F6",
base_color="white",
words_per_group=3,
position="center",
)
burn_highlighted_captions(
video_file_id="video_id",
style="karaoke",
highlight_color="yellow",
base_color="white",
background="black@0.6",
words_per_group=4,
position="bottom",
)
Video Clipping
Extract segments from longer videos:
clip_video(
video_file_id="long_video_id",
start_time=45.0,
end_time=60.0,
)
UGC / TikTok Production Workflow
Complete workflow for producing UGC-style content:
- Script: plan scenes, dialogue, and visual style
- Generate: create each scene with
generate_video - Review: use
analyze_videoto check each scene for quality - Chain: extract last frames with
capture_video_frame, generate next scenes - Stitch: combine all scenes with
stitch_videos - Narrate: generate voiceover with
text_to_speech+add_audio_to_video - Caption: add TikTok-style captions with
burn_highlighted_captions - Overlay: add titles / CTAs with
overlay_text - Final review: use
analyze_videoon the final video for quality check
Example: Narrated UGC Video
generate_video(prompt="...", model="veo-3.1-generate-preview")
audio = text_to_speech(text="Your narration script here...", voice="nova")
add_audio_to_video(video_file_id="generated_video_id", audio_file_id=audio.file_id)
bur