A quantized version of Qwen3-Coder-Next has been released in NVFP4 format. This version significantly reduces the model size from 149GB to 45GB.

Details

  • Model: Qwen3-Coder-Next
  • Quantization: NVFP4
  • Size: 45GB
  • Calibration Dataset: ultrachat_200k
  • Accuracy Loss: 1.63% in MMLU Pro+

Quantization is a fundamental technique for reducing the memory footprint of large language models (LLMs), making them more accessible for inference on hardware with limited resources. For those evaluating on-premise deployments, there are trade-offs between accuracy and hardware requirements that AI-RADAR helps to evaluate.