AMOR: A Hybrid Approach to Attention in Language Models

A new study introduces AMOR (Adaptive Metacognitive Output Router), a hybrid architecture that combines State Space Models (SSMs) with sparse attention mechanisms. The goal is to overcome the limitations of traditional transformers, which allocate computational resources uniformly to each position, regardless of its importance.

AMOR is inspired by dual-process theories of cognition and uses prediction entropy as an indicator of uncertainty. When an SSM model shows uncertainty, AMOR dynamically engages sparse attention to improve retrieval accuracy. This approach allows reusing the O(n) computations of the SSM model, projecting keys and values from hidden states (Ghost KV) instead of requiring O(n^2) attention at every layer, as in standard transformers.

In synthetic tests, AMOR outperformed both SSM-only and transformer-only models, achieving perfect retrieval accuracy by engaging attention on only 22% of positions. Prediction entropy proved to be a reliable signal for attention activation, with a significant gap between retrieval and local positions. AMOR's routing decisions are interpretable in information-theoretic terms, offering a clearer understanding of the model's decision-making process.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.