Laguna M.1: A 225B MoE Model for Agentic Coding and Extended Contexts

Poolside has introduced Laguna M.1, a new Large Language Model (LLM) based on a Mixture-of-Experts (MoE) architecture, specifically designed for agentic coding and tasks requiring an extended context window. With a total of 225 billion parameters and 23 billion activated parameters per token, Laguna M.1 positions itself as a powerful solution for developers and companies needing advanced reasoning and automation capabilities.

This model stands out for its ability to tackle complex challenges typical of software development environments, where deep code understanding and the ability to interact with external tools are crucial. Its MoE architecture, with a high total parameter count but a subset activated for each processing step, aims to balance performance and computational requirements, a fundamental aspect for those evaluating on-premise deployment.

In-Depth Architectural and Technical Specifications

Laguna M.1 is a 70-layer MoE transformer. The first three layers are dense and utilize SwiGLU activation, while the remaining 67 layers are sparse MoE, incorporating 256 experts. The routing system employs a top-k=16 approach, with auxiliary-loss-free load balancing, optimizing efficiency and work distribution among experts. This configuration is designed to handle intensive workloads while maintaining flexibility.

The model integrates a global attention architecture across all layers, with 64 Q-heads and 8 KV-heads, and softplus attention output gating. Positional encoding is based on RoPE with YaRN, supporting an exceptionally wide context window of 262,144 tokens. This extended context window is particularly relevant for coding, where the ability to analyze large codebases or long sequences of interactions is a primary requirement. Native reasoning support, with “interleaved thinking” between tool calls and the option to enable or disable thinking per-request, further strengthens its agentic capabilities.

Performance and Market Context

Laguna M.1 has demonstrated competitive performance on several key benchmarks for agentic coding. It achieved a score of 74.6% on SWE-bench Verified, 63.1% on SWE-bench Multilingual, 49.2% on SWE-Bench Pro, and 45.8% on Terminal-Bench 2.0. These results place it in line with other open-weight and frontier models in the sector, such as Devstral 2, GLM-4.7, DeepSeek-V4 Flash, and Qwen3.5-397B-A17B, and even with proprietary models like Claude Sonnet 4.6 in some metrics.

The Apache 2.0 license allows free use and modification for commercial and non-commercial purposes, making it an attractive option for companies seeking flexibility and control. Its MoE architecture and size make it particularly suitable for scenarios where customization and deep integration with existing infrastructure are priorities.

Implications for On-Premise Deployment

Laguna M.1's specifications, particularly its 225 billion total parameters and 23 billion activated parameters, raise significant considerations for on-premise deployment. A model of this size requires substantial hardware resources, especially in terms of VRAM and compute capacity. For inference, high-end GPUs, such as NVIDIA H100 or A100, may be necessary, configured in clusters to manage the load. Managing a context window of 262,144 tokens, while advantageous, also implies proportionally high memory consumption.

For organizations prioritizing data sovereignty, regulatory compliance, or the need for air-gapped environments, self-hosted deployment of an LLM like Laguna M.1 offers unprecedented control. However, this involves a careful evaluation of the Total Cost of Ownership (TCO), which includes not only the initial hardware investment but also operational costs related to energy, cooling, and maintenance. AI-RADAR provides analytical frameworks on /llm-onpremise to help evaluate the trade-offs between performance, costs, and infrastructural requirements for such AI workloads.