Explore skills
5,474 skills found
Category alert
Get new Pesquisa e Web skills every Monday
guidance
Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework.
nanogpt
A ~300-line educational GPT implementation by Andrej Karpathy, reproducing GPT-2 (124M) on OpenWebText. It offers clean, hackable code perfect for learning transformers and understanding GPT architecture from scratch, trainable on Shakespeare (CPU) or OpenWebText (multi-GPU).
pytorch-lightning
A high-level PyTorch framework featuring a Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), and a callbacks system, designed for minimal boilerplate. It scales from laptops to supercomputers with the same code, providing clean training loops with built-in best practices.
skypilot-multi-cloud-orchestration
Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or optimize GPU costs across providers.
serving-llms-vllm
Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Ideal for deploying production LLM APIs, optimizing inference, or serving models with limited GPU memory, it supports OpenAI-compatible endpoints, quantization, and tensor parallelism.
weights-and-biases
Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - a collaborative MLOps platform.
evolving-ai-agents
Provides guidance for automatically evolving and optimizing AI agents across any domain using LLM-driven evolution algorithms. Use when building self-improving agents, optimizing agent prompts and skills against benchmarks, or implementing automated agent evaluation loops.
llama-cpp
Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware, ideal for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. It supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10x speedup vs PyTorch on CPU.
sglang
Fast structured generation and serving for LLMs using RadixAttention prefix caching. It's ideal for JSON/regex outputs, constrained decoding, agentic workflows, or when 5x faster inference than vLLM with prefix sharing is needed, powering over 300,000 GPUs at major tech companies.
deepspeed
Expert guidance for distributed training with DeepSpeed, covering ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, and sparse attention.
evaluating-llms-harness
Evaluates LLMs across 60+ academic benchmarks like MMLU and HumanEval. It's an industry standard for benchmarking model quality, comparing models, and tracking training progress, supporting HuggingFace, vLLM, and APIs.
nemo-guardrails
NVIDIA's runtime safety framework for LLM applications features jailbreak, hallucination, and toxicity detection, alongside input/output validation, fact-checking, and PII filtering. It uses Colang 2.0 DSL for programmable rails, is production-ready, and runs on T4 GPUs.