Wave Field LLM: A New Approach to Attention

A new attention mechanism, called Wave Field LLM, has been presented, aiming to overcome the scalability limitations of traditional O(nยฒ) self-attention mechanisms. The innovative approach treats language as a physical field system, leveraging the dynamics of wave equations.

How it works

The model maps tokens into a one-dimensional continuous field. Information propagates through this field via damped wave equations, described by the formula k(t) = exp(-ฮฑยทt)ยทcos(ฯ‰ยทt + ฯ†). Each attention head has only three trainable physical parameters: frequency, damping, and phase. Convolution is computed via FFT in O(n log n). The attention heads self-organize into different roles, managing local grammar, medium context, and long-range dependencies.

Results and limitations

Preliminary results on WikiText-2 (with 6 million parameters and a character-level tokenizer) show that Wave Field V3.5 achieves a perplexity of 6.2 and an accuracy of 50.5%, compared to 5.9 and 51.0% of the standard transformer. The advantage of Wave Field LLM increases with sequence length: a factor of 31x at 2,000 tokens, 107x at 8,000, and 367x at 32,000.

A known limitation is a significant capacity gap compared to standard transformers when using a BPE tokenizer with an 8,000 token vocabulary. The developers believe this is a model capacity issue at a small scale, and they are working to scale the model to 100 million parameters to close this gap.

Unique features

A distinctive aspect of this project is that every bug during development was identified through physics-based diagnostics (energy flow, conservation, causality tests), rather than through trial and error. The model uses cross-head field coupling and wave interference for information routing. The authors emphasize that this is not a variant of Mamba/Hyena, but a completely different approach.

The code is available at https://github.com/badaramoni/wave-field-llm.