Eval
Test whether assembled context actually improves output. Not part of the automatic pipeline — this is an opt-in diagnostic tool for teams that want evidence, not vibes.
When to Use
- After running
gigo:gigo— does the assembled context actually help? - When output quality seems inconsistent — is context helping or hurting?
- When adding new personas — did they improve planning?
- When debugging — is the Persona Calibration heuristic working?
Two Modes
Pipeline Eval (defau
[Description truncada. Veja o README completo no GitHub.]