Explore skills

5,474 skills found

Category alert

Get new Pesquisa e Web skills every Monday

mlflow

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow, a framework-agnostic ML lifecycle platform.

Pesquisa e Web#deploy#aiby Orchestra-Research

constitutional-ai

9.1k

Anthropic's method for training harmless AI through self-improvement. It employs a two-phase approach: supervised learning with self-critique/revision, followed by RLAIF, used for safety alignment and reducing harmful outputs without human labels, powering Claude's safety system.

Pesquisa e Web#aiby Orchestra-Research

ray-train

9.1k

Orchestrates distributed training for PyTorch/TensorFlow/HuggingFace across clusters, scaling from laptops to thousands of nodes, with built-in hyperparameter tuning (Ray Tune), fault tolerance, and elastic scaling, ideal for massive models or distributed hyperparameter sweeps.

Pesquisa e Web#aiby Orchestra-Research

nnsight-remote-interpretability

9.1k

Provides guidance for interpreting and manipulating neural network internals using nnsight, with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or with any PyTorch architecture.

Pesquisa e Web#aiby Orchestra-Research

grpo-rl-training

9.1k

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training.

Pesquisa e Web#aiby Orchestra-Research

fine-tuning-with-trl

9.1k

Fine-tune LLMs using reinforcement learning with TRL, employing SFT for instruction tuning, DPO for preference alignment, and PPO/GRPO for reward optimization and reward model training. This is ideal for RLHF, aligning models with preferences, or training from human feedback, and integrates with HuggingFace Transformers.

Pesquisa e Web#llm#aiby Orchestra-Research

huggingface-tokenizers

9.1k

Fast, Rust-based tokenizers optimized for research and production, processing 1GB in under 20 seconds. They support BPE, WordPiece, and Unigram, offering custom vocabulary training and seamless integration with transformers for high-performance tokenization.

Pesquisa e Web#ai#wordby Orchestra-Research

openrlhf-training

9.1k

A high-performance RLHF framework with Ray+vLLM acceleration for PPO, GRPO, RLOO, and DPO training of large models (7B-70B+). Built on Ray, vLLM, and ZeRO-3, it achieves 2x faster performance than DeepSpeedChat through distributed architecture and GPU resource sharing.

Pesquisa e Web#llm#aiby Orchestra-Research

pyvene-interventions

9.1k

Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.

Pesquisa e Web#ai#testby Orchestra-Research

miles-rl-training

9.1k

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Pesquisa e Web#aiby Orchestra-Research

gptq

9.1k

Post-training 4-bit quantization for LLMs with minimal accuracy loss. It enables deploying large models (70B, 405B) on consumer GPUs, offering 4x memory reduction with <2% perplexity degradation or 3-4x faster inference than FP16, and integrates with transformers and PEFT for QLoRA fine-tuning.

Pesquisa e Web#llm#deployby Orchestra-Research

verl-rl-training

9.1k

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL).

Pesquisa e Web#llm#aiby Orchestra-Research