Blind Skill Assessment
Core Principle
No baseline, no experiment. Every assessment compares two versions under blinded conditions with structured scoring.
Process
1. BLIND — Randomly assign labels A/B. Record mapping privately.
Strip origin hints (filenames, "baseline"/"skill" comments).
2. RUBRIC — State scoring dimensions before reading the code.
3. JUDGE — Three personas score both versions (1-5 per dimension).
4. DECODE — Reveal A/B mapping only a
[Description truncada. Veja o README completo no GitHub.]