TurboQuant — KV Cache Compression Skill
A skill for implementing, using, and explaining Google's TurboQuant algorithm — a data-oblivious vector quantization framework that achieves 6x memory reduction and up to 8x speedup for LLM KV caches with zero accuracy loss.
What TurboQuant Does
TurboQuant compresses the key-value (KV) cache in transformer-based LLMs. During inference, the KV cache grows linearly with sequence length and becomes the primary memory bottleneck for long-context genera
[Description truncada. Veja o README completo no GitHub.]