SSkillteca byclaudinhocode

← Back to catalog

Author in the catalog

Orchestra-Research

98 skills969,122 stars totalgithub.com/Orchestra-Research

Yours? Claim it

Published skills

Showing 48 of 98

tensorrt-llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Pesquisa e Web#llm#deployby Orchestra-Research

autogpt-agents

Autonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous agents, or building complex multi-step AI automation systems.

Automação#deploy#aiby Orchestra-Research

guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework.

Pesquisa e Web#llm#aiby Orchestra-Research

nanogpt

A ~300-line educational GPT implementation by Andrej Karpathy, reproducing GPT-2 (124M) on OpenWebText. It offers clean, hackable code perfect for learning transformers and understanding GPT architecture from scratch, trainable on Shakespeare (CPU) or OpenWebText (multi-GPU).

Pesquisa e Web#aiby Orchestra-Research

pytorch-lightning

A high-level PyTorch framework featuring a Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), and a callbacks system, designed for minimal boilerplate. It scales from laptops to supercomputers with the same code, providing clean training loops with built-in best practices.

Pesquisa e Web#aiby Orchestra-Research

skypilot-multi-cloud-orchestration

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or optimize GPU costs across providers.

Pesquisa e Web#aiby Orchestra-Research

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Ideal for deploying production LLM APIs, optimizing inference, or serving models with limited GPU memory, it supports OpenAI-compatible endpoints, quantization, and tensor parallelism.

Pesquisa e Web#llm#deployby Orchestra-Research

hqq-quantization

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

Dados e Análise#llm#deployby Orchestra-Research

weights-and-biases

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - a collaborative MLOps platform.

Pesquisa e Web#aiby Orchestra-Research

evolving-ai-agents

Provides guidance for automatically evolving and optimizing AI agents across any domain using LLM-driven evolution algorithms. Use when building self-improving agents, optimizing agent prompts and skills against benchmarks, or implementing automated agent evaluation loops.

Pesquisa e Web#llm#aiby Orchestra-Research

llama-cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware, ideal for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. It supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10x speedup vs PyTorch on CPU.

Pesquisa e Web#llm#deployby Orchestra-Research

sglang

Fast structured generation and serving for LLMs using RadixAttention prefix caching. It's ideal for JSON/regex outputs, constrained decoding, agentic workflows, or when 5x faster inference than vLLM with prefix sharing is needed, powering over 300,000 GPUs at major tech companies.

Pesquisa e Web#llm#aiby Orchestra-Research

deepspeed

Expert guidance for distributed training with DeepSpeed, covering ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, and sparse attention.

Pesquisa e Web#aiby Orchestra-Research

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks like MMLU and HumanEval. It's an industry standard for benchmarking model quality, comparing models, and tracking training progress, supporting HuggingFace, vLLM, and APIs.

Pesquisa e Web#llm#aiby Orchestra-Research

nemo-guardrails

NVIDIA's runtime safety framework for LLM applications features jailbreak, hallucination, and toxicity detection, alongside input/output validation, fact-checking, and PII filtering. It uses Colang 2.0 DSL for programmable rails, is production-ready, and runs on T4 GPUs.

Pesquisa e Web#llm#aiby Orchestra-Research

mlflow

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow, a framework-agnostic ML lifecycle platform.

Pesquisa e Web#deploy#aiby Orchestra-Research

constitutional-ai

Anthropic's method for training harmless AI through self-improvement. It employs a two-phase approach: supervised learning with self-critique/revision, followed by RLAIF, used for safety alignment and reducing harmful outputs without human labels, powering Claude's safety system.

Pesquisa e Web#aiby Orchestra-Research

ray-train

Orchestrates distributed training for PyTorch/TensorFlow/HuggingFace across clusters, scaling from laptops to thousands of nodes, with built-in hyperparameter tuning (Ray Tune), fault tolerance, and elastic scaling, ideal for massive models or distributed hyperparameter sweeps.

Pesquisa e Web#aiby Orchestra-Research

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL).

Pesquisa e Web#llm#aiby Orchestra-Research

huggingface-tokenizers

Fast, Rust-based tokenizers optimized for research and production, processing 1GB in under 20 seconds. They support BPE, WordPiece, and Unigram, offering custom vocabulary training and seamless integration with transformers for high-performance tokenization.

Pesquisa e Web#ai#wordby Orchestra-Research

nnsight-remote-interpretability

Provides guidance for interpreting and manipulating neural network internals using nnsight, with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or with any PyTorch architecture.

Pesquisa e Web#aiby Orchestra-Research

pyvene-interventions

Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.

Pesquisa e Web#ai#testby Orchestra-Research

ray-data

Scalable data processing for ML workloads with streaming execution across CPU/GPU, supporting various formats like Parquet/CSV/JSON/images. It integrates with Ray Train, PyTorch, and TensorFlow, scaling from a single machine to hundreds of nodes for tasks like batch inference, data preprocessing, and distributed ETL.

Dados e Análise#aiby Orchestra-Research

grpo-rl-training

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training.

Pesquisa e Web#aiby Orchestra-Research

miles-rl-training

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Pesquisa e Web#aiby Orchestra-Research

openrlhf-training

A high-performance RLHF framework with Ray+vLLM acceleration for PPO, GRPO, RLOO, and DPO training of large models (7B-70B+). Built on Ray, vLLM, and ZeRO-3, it achieves 2x faster performance than DeepSpeedChat through distributed architecture and GPU resource sharing.

Pesquisa e Web#llm#aiby Orchestra-Research

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL, employing SFT for instruction tuning, DPO for preference alignment, and PPO/GRPO for reward optimization and reward model training. This is ideal for RLHF, aligning models with preferences, or training from human feedback, and integrates with HuggingFace Transformers.

Pesquisa e Web#llm#aiby Orchestra-Research

prompt-guard

Meta's 86M prompt injection and jailbreak detector filters malicious prompts and third-party data for LLM applications. It boasts over 99% TPR, under 1% FPR, is fast (<2ms GPU), multilingual (8 languages), and can be deployed via HuggingFace or batch processing for RAG security.

DevOps e Infra#llm#deployby Orchestra-Research

lambda-labs-gpu-cloud

Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent filesystems, or high-performance multi-node clusters for large-scale training.

Pesquisa e Web#aiby Orchestra-Research

modal-serverless-gpu

Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.

DevOps e Infra#deploy#aiby Orchestra-Research

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible 2-8 bit quantization without GPU requirements.

DevOps e Infra#deploy#aiby Orchestra-Research

gptq

Post-training 4-bit quantization for LLMs with minimal accuracy loss. It enables deploying large models (70B, 405B) on consumer GPUs, offering 4x memory reduction with <2% perplexity degradation or 3-4x faster inference than FP16, and integrates with transformers and PEFT for QLoRA fine-tuning.

Pesquisa e Web#llm#deployby Orchestra-Research

evaluating-code-models

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. This industry standard from BigCode Project, used by HuggingFace leaderboards, is ideal for benchmarking code models, comparing coding abilities, and testing multi-language support.

Desenvolvimento#ai#testby Orchestra-Research

crewai-multi-agent

A multi-agent orchestration framework for autonomous AI collaboration, ideal for building teams of specialized agents on complex tasks, role-based collaboration with memory, or production workflows requiring sequential/hierarchical execution. It's built without LangChain dependencies for lean, fast execution.

Pesquisa e Web#aiby Orchestra-Research

qdrant-vector-search

High-performance vector similarity search engine for RAG and semantic search. Use it for production RAG systems needing fast nearest neighbor search, hybrid search with filtering, or scalable Rust-powered vector storage.

Pesquisa e Web#aiby Orchestra-Research

instructor

Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - a battle-tested structured output library.

Pesquisa e Web#llm#aiby Orchestra-Research

outlines

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library.

Pesquisa e Web#llm#aiby Orchestra-Research

fine-tuning-openvla-oft

Fine-tunes and evaluates OpenVLA-OFT and OpenVLA-OFT+ policies for robot action generation using continuous action heads, LoRA adaptation, and FiLM conditioning on LIBERO simulation and ALOHA real-world setups. This is useful for reproducing paper results, training custom VLA action heads, deploying ALOHA inference, or debugging related components.

DevOps e Infra#deploy#aiby Orchestra-Research

segment-anything-model

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

Pesquisa e Web#aiby Orchestra-Research

long-context

Extend transformer model context windows using RoPE, YaRN, ALiBi, and position interpolation techniques. This is useful for processing long documents, extending pre-trained models, or implementing efficient positional encodings, covering various embedding and extrapolation strategies for LLMs.

Pesquisa e Web#llm#aiby Orchestra-Research

model-merging

Merge multiple fine-tuned models with mergekit to combine capabilities without retraining, ideal for creating specialized models by blending domain-specific expertise or improving performance. It covers various merging techniques like SLERP, TIES-Merging, DARE, Task Arithmetic, and linear merging, plus production deployment strategies.

DevOps e Infra#deploy#aiby Orchestra-Research

ml-paper-writing

Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions; for systems venues, use 'systems-paper-writing'.

Pesquisa e Web#aiby Orchestra-Research

presenting-conference-talks

Generates conference presentation slides (Beamer LaTeX PDF and editable PPTX) from a compiled paper with speaker notes and talk script. Use when preparing oral talks, spotlight presentations, or invited talks for ML and systems conferences.

Documentos#pptx#pdfby Orchestra-Research

brainstorming-research-ideas

Guides researchers through structured ideation frameworks to discover high-impact research directions. Use when exploring new problem spaces, pivoting between projects, or seeking novel angles on existing work.

Pesquisa e Web#aiby Orchestra-Research

implementing-llms-litgpt

Implements and trains LLMs using Lightning AI's LitGPT, supporting over 20 pretrained architectures like Llama, Gemma, Phi, Qwen, and Mistral. It's suitable for clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA, featuring single-file implementations without abstraction layers.

Pesquisa e Web#llm#aiby Orchestra-Research

awq-quantization

This 4-bit LLM compression method, winner of the MLSys 2024 Best Paper Award, uses activation-aware weight quantization, providing a 3x speedup and minimal accuracy loss. It's ideal for deploying large models on limited GPU memory or for faster, more accurate inference than GPTQ, especially for instruction-tuned and multimodal models.

Pesquisa e Web#llm#deployby Orchestra-Research

nemo-evaluator-sdk

NVIDIA's enterprise-grade platform evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. It provides scalable evaluation on local Docker, Slurm HPC, or cloud platforms, featuring a container-first architecture for reproducible benchmarking.

DevOps e Infra#llm#aiby Orchestra-Research

pytorch-fsdp2

Adds PyTorch FSDP2 (fully_shard) to training scripts with correct init, sharding, mixed precision/offload config, and distributed checkpointing. Use when models exceed single-GPU memory or when you need DTensor-based sharding with DeviceMesh.

Pesquisa e Web#aiby Orchestra-Research

Category alert

Get new Pesquisa e Web skills every Monday