Iterative Skill Refinement
Core Principle
If you only improve against a fixed benchmark, you're training to the test. Every improvement must generalize beyond the tasks that revealed it.
The Improvement Loop
1. EXPERIMENT — Run baseline vs skill on diverse tasks
2. ASSESS — Blind assess (use blind-skill-assessment)
└─ Skill wins consistently? → DONE (see Convergence)
└─ Baseline wins consistently after 2+ cycles? → STOP (see When to Abandon)
└─ Use a separate agent/se
[Description truncada. Veja o README completo no GitHub.]