An anonymous user from a Korean AI forum has published a mathematical proof challenging the current understanding of the Attention mechanism in large language models (LLMs).
The d^2 Pullback Theorem
The author, who claims not to work in the LLM industry, presents a paper titled "The d^2 Pullback Theorem: Why Attention is a d^2-Dimensional Problem." The central thesis is that the true optimization geometry of Attention is d^2-dimensional, where 'd' represents the dimension of the latent space, and not n^2, where 'n' is the length of the input sequence. The apparent n X n bottleneck would be an illusion caused by softmax normalization.
Softmax and Euclidean matching
The proof suggests that previous O(n) linear Attention models failed because removing the exponential function (softmax) destroyed the contrast needed for matching. Softmax creates this "matching" but artificially inflates the rank to n, causing the O(n^2) complexity.
CSQ Attention: a possible solution
The author proposes an architecture called CSQ (Centered Shifted-Quadratic) Attention, which replaces softmax with a degree-2 polynomial kernel (x^2). This approach would retain the Euclidean matching properties, stabilizing training and reducing computational complexity in both training and inference to O(nd^3).
The publication concludes with an appeal to the scientific community to verify the validity of the proof and explore its potential applications in the development of more efficient Transformer architectures.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!