Skills publicadas
bt-tournament
Rank competing hypotheses with online Bradley-Terry updates and LUCB intervals. Replaces elo-select. Use whenever there are 3 or more candidate hypotheses competing for the next experiment, or when the user wants to understand which branch is currently leading.
writeup-sop
Writing reports or papers. Use when producing any .md file that makes claims about experimental results.
elo-select
Rank competing hypotheses with pairwise Elo updates before choosing what to implement.
preregister
Lock the falsification target for a confirmatory hypothesis before promoting results to main claims. Records metric, direction, threshold, multiple-comparison correction, and seed budget into ver_preregistrations.
prove-sop
End-to-end statistical proof loop. Use when the user asks to prove a statistical proposition, when a hypothesis is promoted to a theorem, or when the empirical trunk's reviewer demands a theorem-side gate. This SOP is suggestion-only (per ADR 0007); skip, loop, or interleave with empirical work as the situation calls for.
replay
Run a counterfactual ("what if we had approved hyp_07 instead of pruning it") against a saved snapshot without mutating the live graph. Use when the user wants to second-guess a paused branch or audit a past pruning decision.
research-sop
End-to-end research loop. Use at the start of any new research task; triggers literature review, hypothesis generation, Elo selection, experimentation, and verification.
debug-sop
Systematic debugging. Use when a script errors or produces unexpected results.
Alerta por categoría