Best GPU Performance Skill
When to use this skill
Any time the user's task involves making AI/ML code run faster, use less memory, scale to more GPUs, or serve inference more efficiently. This includes writing new code with performance in mind, reviewing existing code for bottlenecks, debugging OOM or low utilization, and creating optimization plans.
Core Methodology: The 4-Step Loop
Every optimization follows this loop. Never skip steps.
1. MEASURE → 2. CLASSIFY → 3. APPLY
[Description truncada. Veja o README completo no GitHub.]