optimizing-attention-flash

Name: optimizing-attention-flash
Rating: 5 (7 reviews)
Author: braxtonROSE4

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

7stars

Updated 2 months ago

View on GitHub ↗License: MIT

How to add

/plugin marketplace add braxtonROSE4/zorro-agent

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/optimizing-attention-flash-braxtonrose4/svg)](https://www.skillteca.com.br/skills/optimizing-attention-flash-braxtonrose4?utm_source=badge&utm_medium=readme&utm_campaign=badge)

#ai

Related skills

See all in Outros →

template-skill

143.8k

Replace with a description of the skill and when Claude should use it.

Outrosby anthropics

slack-gif-creator

143.8k

Knowledge and utilities for creating animated GIFs optimized for Slack. It provides constraints, validation tools, and animation concepts, useful when users request animated GIFs for Slack like "make me a GIF of X doing Y for Slack".

Outros#aiby anthropics

baoyu-compress-image

19.9k

Compresses images to WebP (default) or PNG with automatic tool selection. Use when the user asks to compress image, optimize image, convert to webp, or reduce image file size.

Outrosby JimLiu

zzz-one-dragon-player

6.4k

Zenless Zone Zero's all-in-one automatic game assistant, enabling AI Agents to fully automate daily game routines.

Outros#aiby OneDragon-Anything

Category alert

Get new Outros skills every Monday

One short email with only the new Outros skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

Flash Attention - Fast Memory-Efficient Attention

Quick start

Flash Attention provides 2-4x speedup and 10-20x memory reduction for transformer attention through IO-aware tiling and recomputation.

PyTorch native (easiest, PyTorch 2.2+):

import torch
import torch.nn.functional as F

q = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16)  # [batch, heads, seq, dim]
k = torch.randn(2, 8, 512, 64, device='cuda', dtype=torch.float16)
v = torch.randn(2, 8, 512, 64, de

[Description truncada. Veja o README completo no GitHub.]

ShareX LinkedIn

Comments · No comments

No comments yet. Be the first.