← Back to the catalog

optimizing-attention-flash

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

7stars
Updated 2 months ago

View on GitHub ↗License: MIT

How to add

/plugin marketplace add braxtonROSE4/zorro-agent

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

Listada na Skillteca
[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/optimizing-attention-flash-braxtonrose4/svg)](https://www.skillteca.com.br/skills/optimizing-attention-flash-braxtonrose4?utm_source=badge&utm_medium=readme&utm_campaign=badge)

Related skills

See all in Outros

Category alert

Get new Outros skills every Monday

One short email with only the new Outros skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

ShareXLinkedIn

Comments · No comments

Sign in to comment. Sign in

  • No comments yet. Be the first.