SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

movie-maker-fast

Design e Frontend

Fast cinematic video generator built on LTX 2.3 (distilled fp8, video-only) + abliterated Gemma text encoder + physics/control/motion LoRA stack. Produces a full screenplay-driven film at roughly 10-15x the speed of the WAN MultiTalk pipeline (Movie Maker Slow WAN). Use this skill when the user wants a movie / film / cinematic video / animated scene production and speed matters more than strict li

9estrelas
Ver no GitHub ↗Autor: AEON-7Licença: MIT

Movie Maker Fast — LTX 2.3 cinematic video pipeline

📖 For execution, read AGENTS.md first — it has the glossary, decision tree, literal copy-pasteable recipes, and a troubleshooting table optimized for AI agents. This file (SKILL.md) is the deep-dive reference for prompt-engineering recipes, chunking strategy details, and advanced configuration. Use SKILL.md when you need to understand why something works the way it does; use AGENTS.md when you just need to do the thing.

Companion to radio-drama-production (audio only), music-producer (standalone music), and tts-voice-designer (voice casting). This skill is the video engine; it imports all three of those for the audio passes.

0. Target host + tool

  • Host: ${SSH_USER}@127.0.0.1 (Workstation — RTX 5090, 64 GB RAM)
  • Tool: ${COMFYUI_ROOT}\scene_production_tool\movie_maker_fast.py
  • Companion tools (invoked by this one for audio):
    • scene_production_tool/radio_drama.py — dialogue TTS + SFX priority chain
    • music_tool/music_maker.py — music cues via ACE Step XL base + APG chain
  • ComfyUI endpoint: http://127.0.0.1:8188

1. Why this skill exists (and when NOT to use it)

The original cinema pipeline (AGENT_CINEMA_AUTOPILOT using render_all_acts.py + WAN 2.1 MultiTalk) produces very tight lip-synced dialogue but takes ~20–30 min per shot. For a 10-minute drama that's 4–6 hours of render.

Movie Maker Fast uses LTX 2.3 distilled fp8 — a video-only model tuned for speed. A 7-second clip at 832×480 renders in ~75 s warm on the 5090. The full 10-minute drama renders in ~30–40 min. ~10–15× speedup.

Use this skill when:

  • Visuals are the primary deliverable; lip-sync is "close enough"
  • You want a cinematic film with musical scoring + SFX, dialogue may be VO or off-frame
  • Speed matters (previews, iterations, multi-shot drafts before committing)
  • The production has many scenes (>15) where MultiTalk's per-shot cost is prohibitive

Use AGENT_CINEMA_AUTOPILOT (slow WAN) instead when:

  • On-screen character dialogue requires tight lip-sync (every word matches mouth)
  • Hero shots where motion naturalness on the speaker is paramount
  • Short-form work where the 20-min-per-shot cost is acceptable

Both pipelines can coexist — the same screenplay.json works for both.

2. Three render modes — --mode fast | quality | abstract

LTX 2.3 is trained predominantly on real-world video. Each mode tunes the LoRA stack + sampler for a different content class. Pick by what kind of video you're making:

ModeContent classStackSamplerCFGSteps
fast (default)Narrative / character / real-world scenesDistilled + IC-union + VBVR physicseuler3.020
qualityHigher prompt-fidelity / motion varietyNon-distilled FP8 + distill LoRA @ 0.5 + IC-union + VBVReuler3.030
abstractFractals, geometry, artwork in motion, psychedelic, non-physicalNO always-on LoRAs (physics would hurt)euler_ancestral5.030

Why abstract drops the physics + reference LoRAs:

  • VBVR enforces object permanence, gravity, and collision realism — exactly wrong for a pulsing mandala or fractal unfold.
  • IC-LoRA union control carries reference-scene semantics that don't apply to non-representational content.
  • euler_ancestral adds stochastic variation each step, which morphs abstract content more expressively than plain euler.
  • Higher CFG (5 vs 3) + 30 steps compensate for the distilled model's natural-video bias when asked for unfamiliar geometry.

2a. Model stack (all on disk, all verified)

Fast mode (DEFAULT — --mode fast)

SlotFileRole
Baseltx-2.3-22b-distilled-fp8.safetensors (27 GB)Video-only distilled 22B, fp8
Video VAELTX23_video_vae_bf16.safetensors
Text encodergemma_3_12B_it.safetensorsBase Gemma-3 12B IT (Comfy-Org/ltx-2 split)
Abliteration LoRAgemma-3-12b-it-abliterated_heretic_lora_rank64_bf16.safetensorsAvailable on disk; not auto-applied (needs CLIP-side wiring — manual workflow only)
LoRA (always)ltx-2.3-22b-ic-lora-union-control-ref0.5.safetensors @ 1.0Reference-based char/scene control
LoRA (always)ltx2/Ltx2.3-Licon-VBVR-I2V-96000-R32.safetensors @ 1.0Physics / object permanence

No distilled-lora-384 in fast mode — already baked into the checkpoint. Adding it would over-distill.

Quality mode (--mode quality)

SlotFileRole
Baseltx-2.3-22b-dev-fp8.safetensors (~29 GB)Non-distilled FP8 base — higher prompt-fidelity, more motion variety
Video VAEsame
Text encodersame
LoRAltx-2.3-22b-distilled-lora-384.safetensors @ 0.5Partial distill — compresses step count without baking in full distilled behaviour (root of loras/ — no ltx2/ prefix)
LoRAltx-2.3-22b-ic-lora-union-control-ref0.5.safetensors @ 1.0
LoRAltx2/Ltx2.3-Licon-VBVR-I2V-96000-R32.safetensors @ 1.0

Quality mode is ~30–50% slower than fast mode. Use it when fast-mode output looks too "average" or when you need stronger prompt adherence. No joint-AV path — audio comes exclusively from the separate audio stack (Qwen3-TTS / ACE-Step / MMAudio).

3. Per-scene LoRA routing

Tags on a scene (or dialogue direction) route to extra LoRAs on top of the always-on stack. Substring-matched case-insensitively. Cap at 3 extras per clip to avoid model interference.

TagLoRA addedEffect
poseltx2/ltx23__demopose_d3m0p0s3.safetensors @ 1.0Skeleton-driven motion
zoomoutltx2/ltx23_zoomout_z00m047.safetensors @ 0.9Camera pulls back
camera: dolly-leftltx-2-19b-lora-camera-control-dolly-left.safetensors @ 0.8Dolly motion
camera: jib-downltx2/ltx-2-19b-lora-camera-control-jib-down.safetensors @ 0.8Jib drop
transitionltx2.3-transition.safetensors @ 1.0Scene-boundary clips (auto-added)
style: claymationltx2/Claymation.safetensors @ 0.8Stop-motion / clay
style: ghibliStudioGhibli.Redmond... @ 0.7Ghibli watercolor
style: ghibli_offsetghibli_style_offset.safetensors @ 0.6Lighter Ghibli shift
style: galaxyltx2/LTX23-GalaxyAce.safetensors @ 0.9Cosmic / nebular / starfield
style: tribalSmooth_Tribal.safetensors @ 0.7Ornamental / pattern-rich
style: illustrationIllustration concept Variant 3A.safetensors @ 0.7Illustrative / graphic
style: cyberpunkCyberPunkAI.safetensors @ 0.8Neon / tech noir
character: talkingheadltx-2.3-id-lora-talkvid-3k.safetensors @ 0.8Face consistency on close-ups

LoRA sourcing: Camera and motion LoRAs above are HuggingFace-hosted (free, requires HF_TOKEN for some). The style LoRAs (style: claymation / ghibli / ghibli_offset / galaxy / tribal / illustration / cyberpunk) are Civitai-hosted and require a CIVITAI_TOKEN (set in .env). See setup.sh for the download URL pattern. All LoRAs are optional — plain prompts without these tags work without any of them.

Style shortcut

Instead of typing the full tag, use --style <name>:

python movie_maker_fast.py clip --image abstract.png \
  --prompt "kaleidoscopic mandala, pulsing concentric circles, iridescent color shifts" \
  --mode abstract --style galaxy --duration 5

That appends style: galaxy to the tag list, which picks up the galaxy LoRA.

transition is automatically added to the last chunk of any multi-chunk scene so boundaries blend. You don't usually need to set it manually.

4. Image persistence & character consistency (the anti-drift toolkit)

LTX 2.3 can "wander" — the input image transforms into something unrelated over a 7 s clip, and chunks of the same scene can look like four unrelated shots spliced together. Three mechanisms,

Como adicionar

/plugin marketplace add AEON-7/aeon-movie-maker

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.