MiniMax-2.5 is an open-source large language model (LLM) that is achieving state-of-the-art results in several areas, including code generation, the use of automated tools, and office automation tasks.
Technical details
The full model has 230 billion parameters (of which 10 billion are active) and, in its unquantized bf16 version, requires 457GB of memory. A notable aspect is its large context window of 200,000 tokens, which allows it to process and generate longer and more complex texts.
Quantization and hardware requirements
To reduce hardware requirements, a 3-bit quantized version is available through Unsloth Dynamic. This GGUF version reduces the model size to just 101GB, a 62% reduction. This optimization makes it possible to run MiniMax-2.5 on systems with more limited memory resources, opening up new possibilities for on-premise deployment.
For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!