RotorQuant: Accelerated Vector Quantization with Clifford Algebra

RotorQuant: A Faster Alternative to TurboQuant

RotorQuant is a novel vector quantization technique that utilizes Clifford rotors to achieve superior performance compared to TurboQuant. Early results show a speed increase between 10 and 19 times, with a 44-fold reduction in the number of parameters.

The key idea is to replace the d×d random orthogonal matrix with Clifford rotors in Cl(3,0). Instead of a dense matrix multiplication, the vector is divided into groups of 3 dimensions and each is rotated with a 4-parameter rotor. This approach drastically reduces the number of operations required.

Results and Performance

Tests on Qwen2.5-3B-Instruct KV cache show:

Cosine similarity: 0.990 (vs 0.991 for TurboQuant)
44x fewer parameters (372 vs 16,399 for d=128)
Fused CUDA kernel: 10-19x faster than cuBLAS matmul on RTX PRO 4000
Fused Metal shader: 9-31x faster on Apple M4
Perfect performance in needle-in-haystack tests

The implementation leverages fused kernels that keep data in registers, avoiding memory accesses and outperforming TurboQuant despite the latter's optimization.

Implications

RotorQuant represents a promising step forward in vector quantization, offering a significant improvement in performance with a reduced memory footprint. This could have a notable impact on LLM inference applications, especially in resource-constrained contexts.

RotorQuant: Accelerated Vector Quantization with Clifford Algebra

RotorQuant: A Faster Alternative to TurboQuant

Results and Performance

Implications

💻 Need GPU Cloud Infrastructure?

💬 Comments (0)

🔍 Continue Exploring

Explore LLM On-Premise

Physics-Informed Neural Solvers for Periodic Quantum Eigenproblems

SanityBoard: New LLM Models and Open Source Agents Compared

TAISIC Materials shifts focus to high-end SiC substrates

👥 Join 160+ AI explorers