SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

music-producer

Outros

Generate standalone music tracks (songs, instrumentals, cues) at maximum audio fidelity using ACE Step 1.5 XL on the Workstation workstation. USE THIS SKILL when the user asks to make a music track, song, instrumental, beat, album cut, demo, jingle, score cue, or any standalone audio deliverable where music quality is the primary concern (not background bed for a radio drama or video). Triggers on

6estrelas
Ver no GitHub ↗Autor: AEON-7Licença: MIT

Music Producer — ACE Step 1.5 XL, maximum fidelity

Generate standalone music tracks at the highest audio quality the system is capable of. For radio-drama music beds (fast, "good enough"), use the radio-drama-production skill instead; this one exists for tracks where the audio is the deliverable.

0. Target host + tool

  • Host: ${SSH_USER}@127.0.0.1 (Workstation — RTX 5090, 64 GB RAM, Win 11 + OpenSSH)
  • Tool: ${COMFYUI_ROOT}\music_tool\music_maker.py
  • Templates: music_tool\templates\ace_step_music_apg_api.json (APG chain) + ace_step_music_simple_api.json (simple KSampler)
  • ComfyUI endpoint on Workstation: http://127.0.0.1:8188

1. Why a dedicated tool

scene_production_tool/radio_drama.py uses a simple KSampler template tuned for turbo variants — fast, clean enough to sit under dialogue, but the ceiling is the xl_base_sft merged model at CFG 3. The APG-requiring base models (xl_base fp32, xl_sft bf16) distort audibly under that template because ACE Step's full base models need SamplerCustomAdvanced + APG + CFGGuider to avoid artifacts (per NerdyRodent's v35 reference workflow and Stability's training notes).

music_maker.py here uses the proper APG chain for xl_base / xl_sft, producing clean output at true base-model quality. It also defaults to lossless FLAC output (48 kHz stereo), unlike the radio-drama pipeline which writes MP3 V0.

2. Variants — pick by quality/speed tradeoff

VariantUNetChainStepsCFGTime (per 90 s)Best for
xl_base (default)acestep_v1.5_xl_base.safetensors (19.95 GB fp32)APG507.0~21 sAlbum masters, standalone songs, hero cues
xl_sftacestep_v1.5_xl_sft_bf16.safetensorsAPG456.0~18 sNear-base quality, faster, bf16
xl_base_sftacestep_v1.5_xl_merge_base_sft_ta_0.5.safetensorssimple KSampler353.0~21 sBalance (shared default with radio-drama)
xl_turboacestep_v1.5_xl_turbo_bf16.safetensorssimple KSampler101.0~12 sPreview iterations, fast A/B
base_turboacestep_v1.5_turbo.safetensors (4.8 GB)simple KSampler81.0~8 sSmallest/fastest, lowest quality

APG variants use SamplerCustomAdvanced with:

  • APG(eta=0.7, norm_threshold=2.5, momentum=-0.75) (v35 params)
  • CFGGuider(cfg=per-variant)
  • KSamplerSelect("gradient_estimation")
  • BasicScheduler("simple", steps, denoise=1.0)
  • ModelSamplingAuraFlow(shift=3)
  • RandomNoise(seed)

Simple variants use a straight KSampler with euler / simple — works because those models are distilled (turbo) or merged (base+SFT).

3. Quick-start

Three ways to invoke from anywhere:

Direct SSH one-liner

ssh ${SSH_USER}@127.0.0.1 'cd ${COMFYUI_ROOT} && python music_tool\music_maker.py --prompt "lofi jazz, warm Rhodes, soft saxophone, brushed drums, vinyl crackle" --duration 180 --bpm 78 --key "A minor"'

From a sidecar script (recommended for longer tracks)

ssh ${SSH_USER}@127.0.0.1 'start /B python ${COMFYUI_ROOT}\music_tool\music_maker.py --prompt "..." --duration 240 --variant xl_base > ${USER_HOME}\music_maker_run.log 2>&1'
ssh ${SSH_USER}@127.0.0.1 'powershell -Command "Get-Content ${USER_HOME}\music_maker_run.log -Wait -Tail 10"'

Pull the result

scp ${SSH_USER}@127.0.0.1:${COMFYUI_ROOT}/output/music/lofi_jazz_*.flac .

4. Argument reference

python music_maker.py [options]

  --prompt STR            (required) comma-separated music descriptors
  --duration FLOAT        track length in seconds (default 120, max ~240)
  --bpm INT               tempo (default 75)
  --key STR               key/scale, e.g. "A minor", "C# major" (default "A minor")
  --lyrics STR_OR_PATH    literal lyrics OR path to .txt file (default empty = instrumental)
  --variant {xl_base|xl_sft|xl_base_sft|xl_turbo|base_turbo}  (default xl_base)
  --steps INT             override the variant's preset step count
  --cfg FLOAT             override the variant's preset CFG
  --seed INT              fixed seed for reproducibility
  --output / -o PATH      output file (.flac / .wav / .mp3) — default is
                          output/music/<slug>_<seed>.flac

5. Writing good prompts

ACE Step understands music the way image models understand art — the prompt is a cloud of descriptors, not a sentence. Pile on comma-separated tags across four categories:

Genre + subgenre

lofi jazz / jazz fusion / bossa nova / swing / cool jazz / bebop
ambient drone / cinematic ambient / dark ambient / space music
lofi hiphop / boom bap / trip hop / chillhop / study beats
neo-soul / R&B / funk / gospel
classical / chamber / string quartet / solo piano / minimalist / romantic
cinematic orchestral / film score / epic trailer / horror score / ghibli-style
indie rock / shoegaze / post-rock / dream pop / synthwave / vaporwave
electronic / IDM / techno / house / drum and bass / ambient techno
world / flamenco / tango / celtic / middle eastern / afrobeat / reggae

Instruments (more specific = better)

warm Rhodes piano, muted saxophone, brushed jazz drums,
upright bass walking line, vibraphone, muted trumpet,
Fender Rhodes, clean Stratocaster, nylon-string guitar,
Moog bass, analog synth pad, mellotron strings,
violin section, cello, timpani, woodwinds,
hand drums, sitar, oud, didgeridoo, koto

Production / mix character

vinyl crackle, tape hiss, analog warmth, lo-fi compression,
big reverb, long delay, spring reverb, plate reverb,
close-mic'd, room ambience, field recording,
dry and intimate, lush and wide, spectral shimmer,
sidechained pump, pumping kick, saturated bass

Mood / setting

nocturnal, rainy window, coffee shop, late-night drive,
contemplative, melancholic, uplifting, triumphant, dark foreboding,
urgent, tense, calm and measured, reverent, sacred,
morning coffee, sunrise, sunset, winter, summer, desert, forest

Rhythm / groove cues (reinforces BPM)

relaxed 4/4 swing, boom-bap groove, head-nod groove,
samba syncopation, waltz 3/4, odd meter 7/8,
driving straight 8ths, laid back behind the beat

Full example prompt

lofi jazz, mellow hip hop beat, warm Rhodes piano, soft muted saxophone,
brushed jazz drums, upright bass walking line, vinyl crackle,
rainy window atmosphere, nocturnal, study beats, relaxed 4/4 swing

Anti-patterns

  • ❌ Full sentences ("A beautiful jazz song with piano") — ACE expects tags, not prose
  • ❌ Requesting specific artists ("in the style of Miles Davis") — might hint but not reliable
  • ❌ Contradictory tags ("aggressive peaceful / loud quiet") — model averages to mush
  • ❌ Song-structure prose ("verse 1 goes like...") — use the --lyrics arg for vocals

Writing for dynamics, feel, and punch

If your tracks sound flat / same-level / lifeless, the prompt is usually why. ACE Step mirrors the energy envelope of its tags. A "wall of sound" prompt produces a wall-of-sound track — no peaks, no valleys, no feel.

Words that CREATE dynamics (use these):

punchy, snappy, transient-rich, kick-forward, staccato, percussive,
breathy, restrained, sparse, minimal, space between notes,
quiet intro, slow build, drops to silence, sudden hit,
accent on the one, ghost note, syncopated, rhythmic tension,
call and response, rest, pause, breathing room,
rises and falls, crescendo, decrescendo, swell, taper,
loud-quiet-loud dynamics, cinematic dynamics,
sidechain pump, ducking, gated, stabbed, plucked, stabs,
muted, then big, whispered then roared

Words that KILL dynamics (avoid or use sparingly):

wall of sound, dense mix, thick, maximal, lush full arrangement,
constant energy, always moving, never stops, saturated everything,
massive, huge, overwhelming, pounding nonstop,
layered and layered, everything at once,
compressed to the max, radio-ready loud  ← asks the model to pre-compress

**Structural cu

Como adicionar

/plugin marketplace add AEON-7/aeon-music-maker

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.