simpo-training

Name: simpo-training
Rating: 5 (7 reviews)
Author: braxtonROSE4

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

7stars

Updated 2 months ago

View on GitHub ↗License: MIT

How to add

/plugin marketplace add braxtonROSE4/zorro-agent

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/simpo-training-braxtonrose4/svg)](https://www.skillteca.com.br/skills/simpo-training-braxtonrose4?utm_source=badge&utm_medium=readme&utm_campaign=badge)

#llm #ai

Related skills

See all in DevOps e Infra →

internal-comms

143.8k

Resources to assist in writing various internal communications, adhering to company-preferred formats. Claude should utilize this skill for status reports, leadership updates, newsletters, FAQs, and other internal documents.

DevOps e Infraby anthropics

babysit

79.7k

Monitors a pull request or review cycle until it is ready to merge. This skill is used to track PR comments, reviews, and CI status until all actionable issues are resolved.

DevOps e Infra#aiby thedotmack

do

79.7k

Execute a phased implementation plan using subagents. Use when asked to execute, run, or carry out a plan — especially one created by make-plan.

DevOps e Infra#aiby thedotmack

smart-explore

79.7k

Token-optimized structural code search using tree-sitter AST parsing. Use this instead of reading full files when you need to understand code structure, find functions, or explore a codebase efficiently.

DevOps e Infra#aiby thedotmack

Category alert

Get new DevOps e Infra skills every Monday

One short email with only the new DevOps e Infra skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

SimPO - Simple Preference Optimization

Quick start

SimPO is a reference-free preference optimization method that outperforms DPO without needing a reference model.

Installation:

# Create environment
conda create -n simpo python=3.10 && conda activate simpo

# Install PyTorch 2.2.2
# Visit: https://pytorch.org/get-started/locally/

# Install alignment-handbook
git clone https://github.com/huggingface/alignment-handbook.git
cd alignment-handbook
python -m pip install .

# Insta

[Description truncada. Veja o README completo no GitHub.]

ShareX LinkedIn

Comments · No comments

No comments yet. Be the first.