Liquid AI releases LFM2-24B-A2B: a 24 billion parameter MoE model

Liquid AI has announced the release of LFM2-24B-A2B, their largest LFM2 model to date.

LFM2-24B-A2B is a sparse Mixture-of-Experts (MoE) model with 24 billion total parameters, with 2 billion active per token. This shows that the LFM2 hybrid architecture scales effectively to larger sizes while maintaining quality without inflating per-token compute.

This release expands the LFM2 family from 350M to 24B parameters, demonstrating predictable scaling across nearly two orders of magnitude.

Key highlights:

MoE architecture: 40 layers, 64 experts per MoE block with top-4 routing, maintaining the hybrid conv + GQA design
2.3B active parameters per forward pass
Designed to run within 32GB of RAM, enabling deployment on high-end consumer laptops and desktops
Day-zero support for inference through llama.cpp, vLLM, and SGLang
Multiple GGUF quantizations available

Across benchmarks including GPQA Diamond, MMLU-Pro, IFEval, IFBench, GSM8K, and MATH-500, quality improves log-linearly as we scale from 350M to 24B, confirming that the LFM2 architecture does not plateau at small sizes.

LFM2-24B-A2B is released as an instruct model and is available open-weight on Hugging Face. We designed this model to concentrate capacity in total parameters, not active compute, keeping inference latency and energy consumption aligned with edge and local deployment constraints.

This is the next step in making fast, scalable, efficient AI accessible in the cloud and on-device.

Liquid AI releases LFM2-24B-A2B: a 24 billion parameter MoE model

Key highlights:

💻 Need GPU Cloud Infrastructure?

💬 Comments (0)

🔍 Continue Exploring

Explore LLM On-Premise

Liquid AI released the best thinking Language Model Under 1GB

OpenAI signs $10 billion compute deal with Cerebras

Nscale hits $14.6B valuation after $2B funding round