Explore skills
5,474 skills found
Category alert
Get new Pesquisa e Web skills every Monday
mlflow
Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow, a framework-agnostic ML lifecycle platform.
constitutional-ai
Anthropic's method for training harmless AI through self-improvement. It employs a two-phase approach: supervised learning with self-critique/revision, followed by RLAIF, used for safety alignment and reducing harmful outputs without human labels, powering Claude's safety system.
ray-train
Orchestrates distributed training for PyTorch/TensorFlow/HuggingFace across clusters, scaling from laptops to thousands of nodes, with built-in hyperparameter tuning (Ray Tune), fault tolerance, and elastic scaling, ideal for massive models or distributed hyperparameter sweeps.
nnsight-remote-interpretability
Provides guidance for interpreting and manipulating neural network internals using nnsight, with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or with any PyTorch architecture.
grpo-rl-training
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training.
fine-tuning-with-trl
Fine-tune LLMs using reinforcement learning with TRL, employing SFT for instruction tuning, DPO for preference alignment, and PPO/GRPO for reward optimization and reward model training. This is ideal for RLHF, aligning models with preferences, or training from human feedback, and integrates with HuggingFace Transformers.
huggingface-tokenizers
Fast, Rust-based tokenizers optimized for research and production, processing 1GB in under 20 seconds. They support BPE, WordPiece, and Unigram, offering custom vocabulary training and seamless integration with transformers for high-performance tokenization.
openrlhf-training
A high-performance RLHF framework with Ray+vLLM acceleration for PPO, GRPO, RLOO, and DPO training of large models (7B-70B+). Built on Ray, vLLM, and ZeRO-3, it achieves 2x faster performance than DeepSpeedChat through distributed architecture and GPU resource sharing.
pyvene-interventions
Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.
miles-rl-training
Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
gptq
Post-training 4-bit quantization for LLMs with minimal accuracy loss. It enables deploying large models (70B, 405B) on consumer GPUs, offering 4x memory reduction with <2% perplexity degradation or 3-4x faster inference than FP16, and integrates with transformers and PEFT for QLoRA fine-tuning.
verl-rl-training
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL).