## Adaptive-K: efficient routing for MoE models A new routing system, called Adaptive-K, has been developed, aiming to reduce the computational load of Mixture of Experts (MoE) models. Initial results indicate savings between 30% and 52% on models such as Mixtral, Qwen, and OLMoE. ## Resources and implementation The source code for the project is available on GitHub. You can also test a live demo on Hugging Face. NVIDIA engineers are evaluating the integration of Adaptive-K into TensorRT-LLM, as highlighted by the relevant pull request. MoE models, as the name suggests, use a combination of smaller models (the "experts") to handle different aspects of a complex problem. Routing, in this context, is the process of assigning specific inputs to the most suitable experts, with the aim of optimizing both accuracy and computational efficiency.