Prefill Sensitivity Analysis Pipeline
This skill documents the complete pipeline for measuring model susceptibility to reward hacking via prefill sensitivity analysis, including both token-based and logprob-based metrics.
Quick Start: Single Command Reproducibility
The full analysis can be run with a single command:
# Run on most recent sensitivity experiment (auto-discovers checkpoints from config.yaml)
python scripts/run_full_prefill_analysis.py
# Specify a particular sensiti
[Description truncada. Veja o README completo no GitHub.]