← Back to catalog
Orchestra-Research

Author in the catalog

Orchestra-Research

98 skills892,290 stars totalgithub.com/Orchestra-Research

Published skills

Showing 48 of 98

tensorrt-llm

9.1k

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Pesquisa e Web#llm#deployby Orchestra-Research

autogpt-agents

9.1k

Autonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous agents, or building complex multi-step AI automation systems.

Automação#deploy#aiby Orchestra-Research

guidance

9.1k

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework.

Pesquisa e Web#llm#aiby Orchestra-Research

nanogpt

9.1k

A ~300-line educational GPT implementation by Andrej Karpathy, reproducing GPT-2 (124M) on OpenWebText. It offers clean, hackable code perfect for learning transformers and understanding GPT architecture from scratch, trainable on Shakespeare (CPU) or OpenWebText (multi-GPU).

Pesquisa e Web#aiby Orchestra-Research

pytorch-lightning

9.1k

A high-level PyTorch framework featuring a Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), and a callbacks system, designed for minimal boilerplate. It scales from laptops to supercomputers with the same code, providing clean training loops with built-in best practices.

Pesquisa e Web#aiby Orchestra-Research

skypilot-multi-cloud-orchestration

9.1k

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or optimize GPU costs across providers.

Pesquisa e Web#aiby Orchestra-Research

serving-llms-vllm

9.1k

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Ideal for deploying production LLM APIs, optimizing inference, or serving models with limited GPU memory, it supports OpenAI-compatible endpoints, quantization, and tensor parallelism.

Pesquisa e Web#llm#deployby Orchestra-Research

hqq-quantization

9.1k

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

Dados e Análise#llm#deployby Orchestra-Research

weights-and-biases

9.1k

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - a collaborative MLOps platform.

Pesquisa e Web#aiby Orchestra-Research

evolving-ai-agents

9.1k

Provides guidance for automatically evolving and optimizing AI agents across any domain using LLM-driven evolution algorithms. Use when building self-improving agents, optimizing agent prompts and skills against benchmarks, or implementing automated agent evaluation loops.

Pesquisa e Web#llm#aiby Orchestra-Research

llama-cpp

9.1k

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware, ideal for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. It supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10x speedup vs PyTorch on CPU.

Pesquisa e Web#llm#deployby Orchestra-Research

sglang

9.1k

Fast structured generation and serving for LLMs using RadixAttention prefix caching. It's ideal for JSON/regex outputs, constrained decoding, agentic workflows, or when 5x faster inference than vLLM with prefix sharing is needed, powering over 300,000 GPUs at major tech companies.

Pesquisa e Web#llm#aiby Orchestra-Research

deepspeed

9.1k

Expert guidance for distributed training with DeepSpeed, covering ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, and sparse attention.

Pesquisa e Web#aiby Orchestra-Research

evaluating-llms-harness

9.1k

Evaluates LLMs across 60+ academic benchmarks like MMLU and HumanEval. It's an industry standard for benchmarking model quality, comparing models, and tracking training progress, supporting HuggingFace, vLLM, and APIs.

Pesquisa e Web#llm#aiby Orchestra-Research

nemo-guardrails

9.1k

NVIDIA's runtime safety framework for LLM applications features jailbreak, hallucination, and toxicity detection, alongside input/output validation, fact-checking, and PII filtering. It uses Colang 2.0 DSL for programmable rails, is production-ready, and runs on T4 GPUs.

Pesquisa e Web#llm#aiby Orchestra-Research

mlflow

9.1k

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow, a framework-agnostic ML lifecycle platform.

Pesquisa e Web#deploy#aiby Orchestra-Research

constitutional-ai

9.1k

Anthropic's method for training harmless AI through self-improvement. It employs a two-phase approach: supervised learning with self-critique/revision, followed by RLAIF, used for safety alignment and reducing harmful outputs without human labels, powering Claude's safety system.

Pesquisa e Web#aiby Orchestra-Research

ray-train

9.1k

Orchestrates distributed training for PyTorch/TensorFlow/HuggingFace across clusters, scaling from laptops to thousands of nodes, with built-in hyperparameter tuning (Ray Tune), fault tolerance, and elastic scaling, ideal for massive models or distributed hyperparameter sweeps.

Pesquisa e Web#aiby Orchestra-Research

nnsight-remote-interpretability

9.1k

Provides guidance for interpreting and manipulating neural network internals using nnsight, with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or with any PyTorch architecture.

Pesquisa e Web#aiby Orchestra-Research

grpo-rl-training

9.1k

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training.

Pesquisa e Web#aiby Orchestra-Research

fine-tuning-with-trl

9.1k

Fine-tune LLMs using reinforcement learning with TRL, employing SFT for instruction tuning, DPO for preference alignment, and PPO/GRPO for reward optimization and reward model training. This is ideal for RLHF, aligning models with preferences, or training from human feedback, and integrates with HuggingFace Transformers.

Pesquisa e Web#llm#aiby Orchestra-Research

huggingface-tokenizers

9.1k

Fast, Rust-based tokenizers optimized for research and production, processing 1GB in under 20 seconds. They support BPE, WordPiece, and Unigram, offering custom vocabulary training and seamless integration with transformers for high-performance tokenization.

Pesquisa e Web#ai#wordby Orchestra-Research

openrlhf-training

9.1k

A high-performance RLHF framework with Ray+vLLM acceleration for PPO, GRPO, RLOO, and DPO training of large models (7B-70B+). Built on Ray, vLLM, and ZeRO-3, it achieves 2x faster performance than DeepSpeedChat through distributed architecture and GPU resource sharing.

Pesquisa e Web#llm#aiby Orchestra-Research

gguf-quantization

9.1k

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible 2-8 bit quantization without GPU requirements.

DevOps e Infra#deploy#aiby Orchestra-Research

evaluating-code-models

9.1k

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. This industry standard from BigCode Project, used by HuggingFace leaderboards, is ideal for benchmarking code models, comparing coding abilities, and testing multi-language support.

Desenvolvimento#ai#testby Orchestra-Research

pyvene-interventions

9.1k

Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.

Pesquisa e Web#ai#testby Orchestra-Research

miles-rl-training

9.1k

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Pesquisa e Web#aiby Orchestra-Research

prompt-guard

9.1k

Meta's 86M prompt injection and jailbreak detector filters malicious prompts and third-party data for LLM applications. It boasts over 99% TPR, under 1% FPR, is fast (<2ms GPU), multilingual (8 languages), and can be deployed via HuggingFace or batch processing for RAG security.

DevOps e Infra#llm#deployby Orchestra-Research

gptq

9.1k

Post-training 4-bit quantization for LLMs with minimal accuracy loss. It enables deploying large models (70B, 405B) on consumer GPUs, offering 4x memory reduction with <2% perplexity degradation or 3-4x faster inference than FP16, and integrates with transformers and PEFT for QLoRA fine-tuning.

Pesquisa e Web#llm#deployby Orchestra-Research

ray-data

9.1k

Scalable data processing for ML workloads with streaming execution across CPU/GPU, supporting various formats like Parquet/CSV/JSON/images. It integrates with Ray Train, PyTorch, and TensorFlow, scaling from a single machine to hundreds of nodes for tasks like batch inference, data preprocessing, and distributed ETL.

Dados e Análise#aiby Orchestra-Research

verl-rl-training

9.1k

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL).

Pesquisa e Web#llm#aiby Orchestra-Research

lambda-labs-gpu-cloud

9.1k

Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent filesystems, or high-performance multi-node clusters for large-scale training.

Pesquisa e Web#aiby Orchestra-Research

instructor

9.1k

Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - a battle-tested structured output library.

Pesquisa e Web#llm#aiby Orchestra-Research

outlines

9.1k

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library.

Pesquisa e Web#llm#aiby Orchestra-Research

long-context

9.1k

Extend transformer model context windows using RoPE, YaRN, ALiBi, and position interpolation techniques. This is useful for processing long documents, extending pre-trained models, or implementing efficient positional encodings, covering various embedding and extrapolation strategies for LLMs.

Pesquisa e Web#llm#aiby Orchestra-Research

brainstorming-research-ideas

9.1k

Guides researchers through structured ideation frameworks to discover high-impact research directions. Use when exploring new problem spaces, pivoting between projects, or seeking novel angles on existing work.

Pesquisa e Web#aiby Orchestra-Research

qdrant-vector-search

9.1k

High-performance vector similarity search engine for RAG and semantic search. Use it for production RAG systems needing fast nearest neighbor search, hybrid search with filtering, or scalable Rust-powered vector storage.

Pesquisa e Web#aiby Orchestra-Research

ml-paper-writing

9.1k

Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions; for systems venues, use 'systems-paper-writing'.

Pesquisa e Web#aiby Orchestra-Research

model-merging

9.1k

Merge multiple fine-tuned models with mergekit to combine capabilities without retraining, ideal for creating specialized models by blending domain-specific expertise or improving performance. It covers various merging techniques like SLERP, TIES-Merging, DARE, Task Arithmetic, and linear merging, plus production deployment strategies.

DevOps e Infra#deploy#aiby Orchestra-Research

segment-anything-model

9.1k

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

Pesquisa e Web#aiby Orchestra-Research

presenting-conference-talks

9.1k

Generates conference presentation slides (Beamer LaTeX PDF and editable PPTX) from a compiled paper with speaker notes and talk script. Use when preparing oral talks, spotlight presentations, or invited talks for ML and systems conferences.

Documentos#pptx#pdfby Orchestra-Research

implementing-llms-litgpt

9.1k

Implements and trains LLMs using Lightning AI's LitGPT, supporting over 20 pretrained architectures like Llama, Gemma, Phi, Qwen, and Mistral. It's suitable for clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA, featuring single-file implementations without abstraction layers.

Pesquisa e Web#llm#aiby Orchestra-Research

awq-quantization

9.1k

This 4-bit LLM compression method, winner of the MLSys 2024 Best Paper Award, uses activation-aware weight quantization, providing a 3x speedup and minimal accuracy loss. It's ideal for deploying large models on limited GPU memory or for faster, more accurate inference than GPTQ, especially for instruction-tuned and multimodal models.

Pesquisa e Web#llm#deployby Orchestra-Research

nemo-evaluator-sdk

9.1k

NVIDIA's enterprise-grade platform evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. It provides scalable evaluation on local Docker, Slurm HPC, or cloud platforms, featuring a container-first architecture for reproducible benchmarking.

DevOps e Infra#llm#aiby Orchestra-Research

pytorch-fsdp2

9.1k

Adds PyTorch FSDP2 (fully_shard) to training scripts with correct init, sharding, mixed precision/offload config, and distributed checkpointing. Use when models exceed single-GPU memory or when you need DTensor-based sharding with DeviceMesh.

Pesquisa e Web#aiby Orchestra-Research

distributed-llm-pretraining-torchtitan

9.1k

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). It is ideal for pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs, leveraging Float8, torch.compile, and distributed checkpointing.

Pesquisa e Web#llm#aiby Orchestra-Research

tensorboard

9.1k

Visualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile performance with TensorBoard - Google's ML visualization toolkit.

Pesquisa e Web#aiby Orchestra-Research

optimizing-attention-flash

9.1k

Optimizes transformer attention with Flash Attention, achieving 2-4x speedup and 10-20x memory reduction. Ideal for long sequences (>512 tokens), addressing GPU memory issues, or accelerating inference, it supports PyTorch native SDPA, flash-attn, H100 FP8, and sliding window attention.

Pesquisa e Web#aiby Orchestra-Research

Category alert

Get new Pesquisa e Web skills every Monday