Z.ai has released GLM-4.7-Flash, a 30 billion parameter MoE (Mixture of Experts) reasoning model designed specifically for local inference.

Key Features

  • Performance: Optimized for coding, agentic workflows, and chat, delivering best-in-class performance.
  • Efficiency: Uses approximately 3.6 billion active parameters.
  • Extended Context: Supports context windows up to 200,000 tokens.
  • Benchmarks: Excellent results in SWE-Bench and GPQA benchmarks, as well as reasoning and chat tests.

The official guide for using and fine-tuning GLM-4.7-Flash is available on Unsloth.ai.