/tokenwise:ab — A/B test a task across model tiers
Run the same task at multiple tiers and compare outputs.
Parse $ARGUMENTS
Expected form: <task description> [--tiers haiku,sonnet,opus]
- The task description is everything before the first
--flag (or the whole string if no flags) --tiers haiku,sonnetis the default (skip Opus by default — it's the baseline)--tiers haiku,sonnet,opusruns all three
If $ARGUMENTS is empty, ask the user:
What task should I A/B test? Prov
[Description truncada. Veja o README completo no GitHub.]