SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

music-analyzer

Desenvolvimento

Analyzes music files (MP3, WAV, FLAC, M4A, OGG) and produces a detailed music and lyrics analysis including genre recognition, mood analysis, BPM, key, instrument detection and lyrics interpretation. Uses HDEMUCS for vocal separation, Whisper for lyrics transcription and librosa for audio feature extraction with CUDA GPU acceleration. Use this skill when the user wants to analyze a music file, exa

1estrelas
Ver no GitHub ↗Autor: rtfirstLicença: MIT

Music Analyzer

Comprehensive music file analysis: lyrics extraction via vocal separation + Whisper, audio features via librosa, and Claude interprets the raw data into genre, mood, instruments etc.

Prerequisites

  • Python 3.10+ with PyTorch, torchaudio, openai-whisper, librosa, soundfile
  • ffmpeg must be in PATH
  • Scripts located in: scripts/ (relative to this skill's base directory)
  • NVIDIA GPU with CUDA support recommended (works on CPU too, just slower)
  • A requirements.txt is included in the skill's base directory

Workflow

Step 0: Find Python and check dependencies

The analysis script checks all dependencies on startup. If anything is missing, it returns a JSON error with the exact install command. Run:

python "<SKILL_BASE_DIR>/scripts/analyze_music.py" --help

If this fails, find a working Python first:

python -c "import torch; print(torch.cuda.is_available())" 2>/dev/null || echo "FAIL"
python3 -c "import torch; print(torch.cuda.is_available())" 2>/dev/null || echo "FAIL"

If no Python has torch, or other packages are missing, guide the user through installation:

Missing packages -- the script returns JSON like:

{"error": "Missing Python packages: torch, librosa, ...", "install": "pip install -r requirements.txt"}

Tell the user to run the install command from the JSON. For CUDA GPU support:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r "<SKILL_BASE_DIR>/requirements.txt"

For CPU-only (no NVIDIA GPU):

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r "<SKILL_BASE_DIR>/requirements.txt"

Missing ffmpeg -- the script returns:

{"error": "ffmpeg not found in PATH", "install": "Windows: winget install Gyan.FFmpeg | Mac: brew install ffmpeg | Linux: sudo apt install ffmpeg"}

Tell the user the platform-specific install command.

If this fails or returns False, try common alternatives on Windows:

  • ls /c/Python3*/python.exe 2>/dev/null

On Linux/Mac:

  • which python3.11 python3.12 python3.13 2>/dev/null

Use the Python executable that succeeds for all subsequent commands. Store it as <PYTHON> for the rest of this workflow.

Step 1: Get input

If the user has not provided a file path, ask with AskUserQuestion:

"Which music file should I analyze? (Please provide the full path)"

Validate:

  • File exists
  • Format is supported: .mp3, .wav, .flac, .m4a, .ogg
  • If the user only provides a filename, search with Glob in common folders: ~/Music/, ~/Downloads/, ~/Desktop/

Optionally ask if it's an instrumental track (saves ~60s processing time).

Step 2: Run analysis

Determine the skill base directory from the context (provided as "Base directory for this skill" when the skill is loaded). Then run:

<PYTHON> "<SKILL_BASE_DIR>/scripts/analyze_music.py" "<AUDIO_PATH>"

Where <PYTHON> is the Python executable found in Step 0, <SKILL_BASE_DIR> is the base directory of this skill, and <AUDIO_PATH> is the absolute path to the audio file.

Optional flags:

  • --skip-lyrics for instrumental tracks
  • --whisper-model small for faster (but less accurate) transcription

The script outputs JSON to stdout. Status messages appear on stderr. Set timeout to 600000ms (10 minutes) -- vocal separation + Whisper need time.

Step 3: Interpret JSON

Read the references/genre_profiles.md reference table from this skill's base directory.

Interpret the JSON data using the reference table and your music knowledge:

Genre Detection

Combine multiple signals:

  1. BPM narrows genre families (e.g., 60-90 = R&B/Reggae/Hip-Hop, 120-140 = Dance/Rock/Pop)
  2. Spectral Centroid indicates brightness (high = Electronic/Metal, low = Jazz/Classical)
  3. Energy (RMS) distinguishes intense vs. gentle
  4. Key/Mode gives genre hints (minor = Metal/Hip-Hop, major = Pop/Country)
  5. Onset Density shows rhythmic complexity
  6. Lyrics content confirms or corrects (rap text = Hip-Hop, love ballad = Pop/R&B)
  7. ZCR (Zero Crossing Rate) correlates with distortion/noise
  8. Assign scores from 0-100 for up to 5 genres and 5 subgenres

Mood Detection (Music)

  • Major + fast + high energy = Energetic/Happy
  • Minor + slow + low energy = Sad/Melancholic
  • High dynamic range + variable energy = Dramatic/Intense
  • Building energy curve = Hopeful/Triumphant
  • Assign scores 0-100 for up to 5 moods

Instrument Detection

Derive instruments from:

  • Spectral Contrast Bands: Low bands = Bass, mid = Guitar/Keys, high = Cymbals
  • Onset Density: High + percussive = Drums/Percussion
  • Accompaniment Features: Centroid and MFCCs of the accompaniment track
  • Genre Context: Once genre is identified, typical instruments are probable
  • List the most likely instruments (no scores)

BPM & Key

Take directly from JSON features. Format: {BPM}BPM, {Key} {Mode} Example: 120BPM, E Major

Vocals

Describe based on:

  • lyrics.has_vocals: whether vocals are present
  • lyrics.vocal_rms: vocal volume (>0.08 = strong, 0.03-0.08 = moderate, <0.03 = soft)
  • lyrics.language: detected language
  • Genre context for typical vocal characteristics
  • Format: e.g., "Male vocals, mid-range, melodic" or "Female vocals, high pitch, powerful"

Lyrics Analysis

Use the transcribed text (lyrics.text) and analyze:

  • Summary: 2-3 sentences about themes, narrative, emotional content
  • Moods: Up to 5 moods with scores 0-100 based on text analysis
  • Themes: Up to 5 themes with scores 0-100
  • Language: Language from Whisper detection, in long form (e.g., "English", "German")
  • Explicit: Check the text for profanity/explicit content -> "Yes" or "No"

If no lyrics present (instrumental):

Summary: Instrumental track - no vocals detected
Moods: N/A
Themes: N/A
Language: N/A
Explicit: No

Step 4: Formatted output

Present the result in exactly this format. IMPORTANT: Single-value fields (Moods, Themes, Language, Explicit, Genres, Subgenres, Instruments, BPM & Key, Vocals) go on the SAME LINE as their label, separated by a space. Only multi-sentence text blocks (Summary, Production Description) start on the line BELOW their label.

LYRICS ANALYSIS

Summary
[2-3 sentences on the next line]
Moods: [Mood1 (Score), Mood2 (Score), Mood3 (Score), Mood4 (Score), Mood5 (Score)]
Themes: [Theme1 (Score), Theme2 (Score), Theme3 (Score), Theme4 (Score), Theme5 (Score)]
Language: [Language]
Explicit: [Yes/No]

MUSIC ANALYSIS
Genres: [Genre1 (Score), Genre2 (Score), Genre3 (Score), Genre4 (Score), Genre5 (Score)]
Subgenres: [Subgenre1 (Score), Subgenre2 (Score), Subgenre3 (Score), Subgenre4 (Score), Subgenre5 (Score)]
Moods: [Mood1 (Score), Mood2 (Score), Mood3 (Score), Mood4 (Score), Mood5 (Score)]
Instruments: [Instrument1, Instrument2, Instrument3, ...]
BPM & Key: [BPM]BPM, [Key] [Mode]
Vocals: [Description]

PRODUCTION DESCRIPTION
[Detailed prose description of the production -- see below]

Important:

  • Scores are always integers 0-100
  • Sort by score descending
  • Always exactly 5 entries for Genres, Subgenres, Moods, Themes (including Lyrics Moods)
  • Instruments without scores, comma-separated list only
  • No Markdown formatting in the output (no **, no #, no ```)
  • NEVER include artist names, band names, album names, song titles, or direct lyrics quotes in the output. Describe only the musical and production characteristics generically
  • Output as plain text

Production Description

Write a cohesive prose paragraph (3-6 sentences) describing the production in detail. Follow this style and cover these aspects:

  1. Genre + BPM + Key as introduction: "A [mood] [genre] production at [BPM] BPM in [Key]"
  2. Rhythm/Drums: Describe kick pattern, hi-hats, snare character based on onset_density and spectral contrast in low bands
  3. Bass: Bass character (sub-bass, warm, pulsating, dry) based on low spectral c

Como adicionar

/plugin marketplace add rtfirst/music-analyzer

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.