DeepEval — Codex-native MBB-Grade Quality Framework
This skill scores any LLM-generated artifact against an MBB-grade rubric (BCG-calibrated). It works in any Codex project. No external API keys, no vendor SDKs. Codex itself is the judge.
Three core promises
- Tier stack: deterministic → heuristic → Codex judge → human, by cost/latency budget.
- BCG-calibrated rubric — 8 dimensions, 1–3 scale, verbatim BCG anchor language.
- Day/Week/30-day cadence only. NO
[Description truncada. Veja o README completo no GitHub.]