ml-content Skill
Generate publication-ready ML content — carousels, 3Blue1Brown-style explainer videos, infographics, posters, paper figures — with deep paper recon, real-3D-only design discipline, phone-readable annotations, and a full grounding pass before posting.
This skill compresses methodology developed across six published ML carousels. It treats ML content as applied research communication, not graphic design with science vocabulary on top.
Workflow Decision Tree
User has... → Start at...
────────────────────────────────────────────────────
A paper / topic only → Stage 1 (Recon) → full pipeline
A finished recon bundle → Stage 2 (Audit) → Build → Ground
A finished design-spec → Stage 4 (Build)
A finished carousel/video → Stage 5 (Grounding pass)
Edits on a posted piece → Grounding-pass + correction comment
Always ask what they have and what they need. Don't assume full pipeline.
The Four Locks (non-negotiable)
Skip any one of these and the output devolves into AI slop.
1. Grounding lock — every claim verifiable against a primary source. Run the grounding pass (Stage 5) before posting. CONFIRMED / WRONG / UNVERIFIED per claim.
2. 3D lock — real geometry only. No faux-3D parallelogram-on-rect. 3D is earned only when the math is genuinely a 3-axis quantity. Decide which slides earn 3D before any rendering.
3. Phone-readable lock — annotations work at 50% phone zoom. Inter Tight 600 24–32pt bold pills, lw 2.4 arrows, max 3 per chart.
4. Differentiation lock — each piece earns its own visual fingerprint. Same brand baseline, different aesthetic per project. Reuse the same fingerprint twice = the audience sees a template.
Stage 1: Recon (the 5-file bundle)
For each ML paper or topic, produce these five files in a project subfolder before any design begins. They are not interchangeable — each solves a specific problem.
File 1 — paper-summary.md
Audience: future-you reviewing what you actually understood about the paper.
Structure:
- Title + arXiv ID + authors with affiliations
- NOTE on uncertainty — flag every claim DIRECT (from abstract/extraction), INFERRED (from prior knowledge), or UNCERTAIN (needs verification)
- Section 1: The thing being studied
- Section 2: Problem framing (math / setup / objective, verbatim where possible)
- Section 3: Method (Stage 1, Stage 2, theorem/guarantee)
- Section 4: Numbers that matter (headline / supporting / pragmatism)
- Section 5: Lineage — closest predecessors (3-5), foundational (2-3), adjacent (2-3)
- Section 6: Limitations from the paper's own limitations section
- Section 7: Why this paper is interesting now (3 reasons in order of strength)
- Section 8: Hook angles already in tension (3-5 ranked)
Length: 1500–3000 words.
File 2 — related-work.md
Audience: a domain expert who wants to know how this paper sits in the field.
Structure:
- ASCII tree showing the lineage threads
- Thread 1: foundational papers, one-sentence delta each
- Thread 2: direct competitors with comparison table
- Thread 3: adjacent / practical / commercial context
- "What Paper #N contributes that wasn't already there" — 3 things ranked by strength
- "What it does NOT contribute and why that matters"
- "Counter-arguments worth pre-empting"
Length: 1500–2500 words.
File 3 — discussions.md
Audience: someone calibrating the social-media reception.
Structure:
- The macro frame: "[one-sentence consensus]"
- 3+ camps in the conversation, each with: who they are, atomic units, predicted reaction
- Specific surfaces to cite or push against
- Prediction: which hook will the conversation latch onto
- Things NOT yet in the public conversation that this content can introduce
Length: 1000–2000 words.
File 4 — brainstorm.md
Audience: future-you writing copy. This file decides everything that comes next.
Structure:
- Worldbuilder pass — audience personas with atomic units they nod at + what can break their head
- Hook tier list (5 tiers, 10+ hooks total)
- Final recommendation — which hook for slide 1, which for caption
- Slide arc / scene arc / poster layout (10 slides typical)
- Caption draft (separate writing surface — can be sharper than carousel)
- "What this content uniquely contributes vs prior content"
Length: 1500–2500 words.
File 5 — README.md
Audience: future-you returning to this project six months later, or a collaborator joining mid-project.
Structure:
- Index of files in the folder
- Recommended hook (from brainstorm.md Tier 1) with reasoning
- Top viral surfaces predicted (in order of likelihood)
- "Why this project matters in the series" — comparison table
- Status checklist
- Open uncertainties to resolve before posting
Length: 500–1000 words.
Discipline notes for the recon stage
- Always flag uncertainty. "DIRECT", "INFERRED", "UNCERTAIN". This is what makes the bundle re-usable for grounding.
- Citation counts always approximate. Scholar fluctuates. Never claim "348 citations" — say "widely cited" or "Tier 1 cited" or "~hundreds."
- Avoid generic adjectives. "Innovative" / "groundbreaking" / "exciting" mean nothing. Replace with specific shape.
- Numbered + bulleted lists work better than prose paragraphs in working documents.
Stage 2: Audit + Moodboard
Two files before any code:
3d-audit.md — when 3D is earned
For every slide / scene / panel, ask:
Does the math underneath have a third dimension that 3D would expose, or am I forcing depth onto something flat?
Three verdicts:
- YES — PRIME — hero 3D moment. Render it real.
- PARTIAL — small inset — data has 3 dimensions but the slide hero is something else. 3D goes in a corner.
- NO — flat data. Don't 3D it.
Aim for two hero 3D moments per carousel (≈10 slides). More than that and 3D loses its "look here" power.
Math that wants 3D:
| Math shape | 3D primitive | Example |
|---|---|---|
f(x, y) = z surface | plot_surface | accuracy(x_difficulty, b_budget) |
| Discrete bars across 2D grid | bar3d | model × benchmark gain |
| Volumetric structure | Poly3DCollection cuboids | KV cache cube split into PCIe / GPU / CPU |
| Two surfaces compared | two plot_surface calls | dense vs MoE polysemanticity gap |
| Step / piecewise function | plot_surface with discrete colormap | Lagrangian b*(x; λ) step-pyramid |
| Stacked translucent slabs | Poly3DCollection | two-tier serving stack |
Math that does NOT want 3D:
- Time series (use 2D line plot)
- Bar chart over single category axis (use 2D bars)
- Pipeline / flowchart (use 2D box-and-arrow)
- Tree structure (use 2D dendrogram)
- Citation count comparison (use 2D bars)
- Benchmark leaderboard (use 2D table)
- Confusion matrix (use 2D heatmap)
If you reach for 3D on any of these, reconsider.
moodboard.md — mining design references
Six phases:
Phase 0 — locate the audience's visual world. Name the audience and their visual literacy. Examples:
- Mech interp researchers → Distill.pub, Transformer Circuits, SAE feature dashboards
- Long-context infra → NVIDIA tech blog, vLLM benchmarks, FlashAttention figures, 3B1B
- Reasoning + inference operators → Boyd & Vandenberghe, Bertsekas, OpenAI usage console, Datadog APM
Phase 1 — survey 10–12 aesthetic ideologies. Score on audience match, differentiation, phone-readability, 3D compatibility. Pick top 2-3 and synthesize one direction: "[A] × [B] × [C], rendered for a phone."
Phase 2 — mine 30+ references across 5–6 buckets:
- A: canonical reference for the audience
- B: production analogs
- C: contemporary writing in the register
- D: 3D / animation references
- E: foundational / textbook visual idiom
- F: adjacent / texture
Each reference: URL + one-sentence note on what to steal.
Phase 3 — synthesize cross-cutting patterns (8–10 patterns observed across multiple buckets).
Phase 4 — lock the direction