GigaChat has announced the release of the weights for its large language models (LLM) GigaChat-3.1-Ultra and GigaChat-3.1-Lightning under the MIT license.
Key Features
- GigaChat-3.1-Ultra: A 702 billion parameter Mixture of Experts (MoE) model, designed for environments with high computational resources. Trained with native FP8 during the DPO (Direct Preference Optimization) phase, it supports MTP (Mixed Tensor Parallelism) and can be run on three HGX instances.
- GigaChat-3.1-Lightning: A 10 billion parameter MoE model, optimized for local inference. Thanks to native FP8 DPO and MTP support, it offers high throughput with a context window of 256k.
- Training: Both models were pre-trained from scratch using proprietary data and compute resources. This is not a DeepSeek fine-tune.
- Languages: Optimized for English and Russian, but trained on 14 languages to achieve good multilingual results.
- Tool calling: GigaChat-3.1-Lightning excels in tool calling, achieving a score of 0.76 on the BFCLv3 benchmark.
Performance
The models were evaluated on a series of benchmarks, demonstrating competitive performance compared to other open source models. In particular, GigaChat-3.1-Ultra outperforms DeepSeek-V3-0324 and Qwen3-235B, while GigaChat-3.1-Lightning outperforms Qwen3-4B-Instruct-2507 and Gemma-3-4B-it. Throughput tests of GigaChat-3.1-Lightning show an increase of up to 38.1% with FP8 and MTP on an H100 80GB SXM5 GPU.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!