Nex-AGI Introduces New Nex-N2 Pro and Mini LLMs

Nex-AGI recently announced the release of two new Large Language Models (LLMs) that join the landscape of solutions available for on-premise and hybrid deployment. These are Nex-N2 Pro, a model with 397 billion parameters, and Nex-N2 Mini, a more compact version with 35 billion parameters. Both models are derived from Qwen3.5, a recognized base in the industry, and have undergone a Fine-tuning process to optimize their performance.

The availability of models with different scales is a crucial factor for companies evaluating AI adoption strategies. The choice between a "Pro" and a "Mini" model is not just a matter of computational capacity, but also a balance between performance, hardware requirements, and Total Cost of Ownership (TCO).

Technical Details and Inference Implications

The Nex-N2 Pro and Nex-N2 Mini models, being Fine-tuned versions of Qwen3.5, benefit from specific optimization for certain tasks or domains. Fine-tuning allows a pre-trained model to be adapted to more specific needs, improving its accuracy and relevance for targeted applications, without having to train a model from scratch.

The size difference, 397B versus 35B parameters, has a direct impact on VRAM memory requirements and the computational power needed for Inference. A 397B model will require significantly more robust GPU infrastructure, often with multi-GPU configurations and high-speed interconnects like NVLink, to ensure acceptable latency and throughput. Conversely, the 35B model could run on more accessible hardware, making it a more practical solution for resource-constrained scenarios or edge deployments. Initial benchmarks, described as "pretty good," suggest that both models offer solid performance relative to their base.

On-Premise Deployment Considerations

For CTOs, DevOps leads, and infrastructure architects, the choice between different model sizes like Nex-N2 Pro and Mini is fundamental for on-premise deployment. A 397B parameter model, while potentially offering greater capacity and precision, entails a higher TCO due to hardware acquisition costs (high-end GPUs with ample VRAM), energy consumption, and management complexity. This is particularly true for those seeking to maintain data sovereignty and operate in air-gapped environments.

The 35B model, on the other hand, can represent an interesting compromise. While having fewer parameters, it might be sufficient for many enterprise applications, drastically reducing hardware requirements and operational costs. The ability to perform Inference on fewer GPUs or on cards with less VRAM makes self-hosted deployment more accessible. For those evaluating on-premise deployments, AI-RADAR offers analytical frameworks on /llm-onpremise to assess the trade-offs between performance, costs, and data sovereignty constraints, without providing direct recommendations.

Perspectives and Strategic Trade-offs

The release of LLMs with different dimensional scales by players like Nex-AGI underscores a market trend: the need for flexibility and adaptability. Companies are not looking for a single solution, but an ecosystem of models that can be optimized for specific workloads and infrastructural constraints. The choice between a massive model and a more compact one depends strictly on application needs, latency requirements, desired throughput, and, not least, the budget available for infrastructure.

Evaluating these new models will require an in-depth analysis of benchmarks specific to enterprise use cases and a realistic estimate of hardware requirements. The ability to perform Inference efficiently on-premise, maintaining control over data and complying with regulations, remains a priority for many organizations.