Explore skills

5,474 skills found

Category alert

Get new Pesquisa e Web skills every Monday

tensorboard

Visualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile performance with TensorBoard - Google's ML visualization toolkit.

Pesquisa e Web#aiby Orchestra-Research

optimizing-attention-flash

9.1k

Optimizes transformer attention with Flash Attention, achieving 2-4x speedup and 10-20x memory reduction. Ideal for long sequences (>512 tokens), addressing GPU memory issues, or accelerating inference, it supports PyTorch native SDPA, flash-attn, H100 FP8, and sliding window attention.

Pesquisa e Web#aiby Orchestra-Research

axolotl

9.1k

Expert guidance for fine-tuning LLMs with Axolotl, covering YAML configurations, over 100 models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, and multimodal support.

Pesquisa e Web#llm#aiby Orchestra-Research

huggingface-accelerate

9.1k

The simplest distributed training API, enabling distributed support for any PyTorch script in just 4 lines. It offers a unified API for DeepSpeed/FSDP/Megatron/DDP, automatic device placement, mixed precision, and is a HuggingFace ecosystem standard.

Pesquisa e Web#ai#apiby Orchestra-Research

training-llms-megatron

9.1k

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. It's ideal for models >1B parameters, maximum GPU efficiency (47% MFU on H100), or requiring various parallelism types, and is a production-ready framework used for Nemotron, LLaMA, and DeepSeek.

Pesquisa e Web#llm#aiby Orchestra-Research

peft-fine-tuning

9.1k

Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use for fine-tuning large models (7B-70B) with limited GPU memory, training <1% of parameters with minimal accuracy loss, or for multi-adapter serving, as it's HuggingFace's official library integrated with the transformers ecosystem.

Pesquisa e Web#llm#aiby Orchestra-Research

mamba-architecture

9.1k

Mamba is a state-space model with O(n) complexity, offering 5x faster inference, million-token sequences, and no KV cache, contrasting with Transformers' O(n²) complexity. It employs selective SSM with a hardware-aware design, with Mamba-1 and Mamba-2 models available on HuggingFace.

Pesquisa e Web#aiby Orchestra-Research

sparse-autoencoder-training

9.1k

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.

Pesquisa e Web#aiby Orchestra-Research

autoresearch

9.1k

Orchestrates end-to-end autonomous AI research projects using a two-loop architecture. The inner loop runs rapid experiment iterations with optimization targets, while the outer loop synthesizes results to steer research direction.

Pesquisa e Web#ai#apiby Orchestra-Research

rwkv-architecture

9.1k

An RNN+Transformer hybrid with O(n) inference, offering linear time and infinite context without a KV cache. It trains like GPT and infers like an RNN, used in Windows, Office, and NeMo, with models up to 14B parameters.

Pesquisa e Web#aiby Orchestra-Research

nemo-curator

9.1k

GPU-accelerated data curation for LLM training, supporting text, image, video, and audio. It features fuzzy deduplication (16x faster), quality filtering, semantic deduplication, PII redaction, and NSFW detection, scaling across GPUs with RAPIDS to prepare high-quality datasets.

Pesquisa e Web#llm#aiby Orchestra-Research

quantizing-models-bitsandbytes

9.1k

Quantizes LLMs to 8-bit or 4-bit, reducing memory by 50-75% with minimal accuracy loss, ideal for limited GPU memory or faster inference. It supports INT8, NF4, FP4, QLoRA training, 8-bit optimizers, and works with HuggingFace Transformers.

Pesquisa e Web#llm#aiby Orchestra-Research