logprob-prefill-analysis

Name: logprob-prefill-analysis
Rating: 5 (2 reviews)
Author: bg-szy

Reproduces the full prefill sensitivity analysis pipeline for reward hacking indicators. Use when evaluating how susceptible model checkpoints are to exploit-eliciting prefills, computing token-based trajectories, or comparing logprob vs token-count as predictors of exploitability.

2stars

Updated last month

View on GitHub ↗

How to add

/plugin marketplace add bg-szy/TOP-SKILLS

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/logprob-prefill-analysis-bg-szy/svg)](https://www.skillteca.com.br/skills/logprob-prefill-analysis-bg-szy?utm_source=badge&utm_medium=readme&utm_campaign=badge)

Related skills

See all in DevOps e Infra →

internal-comms

153.1k

Resources to assist in writing various internal communications, adhering to company-preferred formats. Claude should utilize this skill for status reports, leadership updates, newsletters, FAQs, and other internal documents.

DevOps e Infraby anthropics

babysit

83.4k

Monitors a pull request or review cycle until it is ready to merge. This skill is used to track PR comments, reviews, and CI status until all actionable issues are resolved.

DevOps e Infra#aiby thedotmack

do

83.4k

Execute a phased implementation plan using subagents. Use when asked to execute, run, or carry out a plan — especially one created by make-plan.

DevOps e Infra#aiby thedotmack

smart-explore

83.4k

Token-optimized structural code search using tree-sitter AST parsing. Use this instead of reading full files when you need to understand code structure, find functions, or explore a codebase efficiently.

DevOps e Infra#aiby thedotmack

Category alert

Get new DevOps e Infra skills every Monday

One short email with only the new DevOps e Infra skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

Prefill Sensitivity Analysis Pipeline

This skill documents the complete pipeline for measuring model susceptibility to reward hacking via prefill sensitivity analysis, including both token-based and logprob-based metrics.

Quick Start: Single Command Reproducibility

The full analysis can be run with a single command:

# Run on most recent sensitivity experiment (auto-discovers checkpoints from config.yaml)
python scripts/run_full_prefill_analysis.py

# Specify a particular sensiti

[Description truncada. Veja o README completo no GitHub.]

ShareX LinkedIn

Comments · No comments

No comments yet. Be the first.