Photonic Chip to Accelerate KV Cache

A novel approach to accelerate block selection in the KV cache of large language models (LLMs) has been proposed by a researcher in the field of nanophotonics. The solution is based on the use of a photonic chip, named PRISM, which promises to overcome the limitations of traditional GPU scans.

PRISM: O(1) Optical Scan

The PRISM method replaces the linear scan (O(N)) of KV cache blocks with optical broadcast. The query is encoded as light and split simultaneously to all N blocks via a passive splitter. Similarity is calculated instantly, making the selection independent of context size (O(1)).

Performance and Consumption

Simulations on TFLN photonic chips indicate a 944x improvement in selection speed and an 18,000x reduction in energy consumption compared to GPU scans with a context of 1 million tokens. In scenarios with 100 million tokens, PRISM proves to be 5.3x faster than Quest (batch=128, Qwen2.5-7B) in the total decoding process.