Liquid AI has announced the release of LFM2-24B-A2B, their largest LFM2 model to date.
LFM2-24B-A2B is a sparse Mixture-of-Experts (MoE) model with 24 billion total parameters, with 2 billion active per token. This shows that the LFM2 hybrid architecture scales effectively to larger sizes while maintaining quality without inflating per-token compute.
This release expands the LFM2 family from 350M to 24B parameters, demonstrating predictable scaling across nearly two orders of magnitude.
Key highlights:
- MoE architecture: 40 layers, 64 experts per MoE block with top-4 routing, maintaining the hybrid conv + GQA design
- 2.3B active parameters per forward pass
- Designed to run within 32GB of RAM, enabling deployment on high-end consumer laptops and desktops
- Day-zero support for inference through llama.cpp, vLLM, and SGLang
- Multiple GGUF quantizations available
Across benchmarks including GPQA Diamond, MMLU-Pro, IFEval, IFBench, GSM8K, and MATH-500, quality improves log-linearly as we scale from 350M to 24B, confirming that the LFM2 architecture does not plateau at small sizes.
LFM2-24B-A2B is released as an instruct model and is available open-weight on Hugging Face. We designed this model to concentrate capacity in total parameters, not active compute, keeping inference latency and energy consumption aligned with edge and local deployment constraints.
This is the next step in making fast, scalable, efficient AI accessible in the cloud and on-device.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!