Skill Benchmarking
Strict, agent-agnostic benchmark runner for evals.json skill evaluation. Produces benchmark-<model>.json with pass rates and a discriminating assertion list. Only assertions that actually discriminate between with-skill and without-skill responses are kept; non-discriminating noise is removed via the assertion hygiene process.
This skill works with any AI coding assistant -- Claude Code, Gemini CLI, GitHub Copilot, Cursor, Windsurf, or any agent that can read files
[Description truncada. Veja o README completo no GitHub.]