Agent Evaluation Framework Builder
What this skill does
This skill designs an evaluation framework for an LLM agent or pipeline. Most teams skip evals until something breaks in production — this skill helps you build evals before launch so you have a baseline, catch regressions, and measure quality improvements objectively. It covers dataset construction, metric selection, LLM-as-judge setup, and CI integration.
How to use
Claude Code / Cline
Copy this file to `.agents/skills/ag
[Description truncada. Veja o README completo no GitHub.]