A user reported a significant performance increase for the Qwen3 Coder Next model after updating Llama.cpp. The tests were performed on a hardware configuration equipped with NVIDIA RTX GPUs, highlighting an increase in tokens generated per second.

Configuration Details

  • GPU 1: NVIDIA RTX 6000 Ada Generation (compute capability 8.9)
  • GPU 2: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (compute capability 12.0)

Benchmark Results

Benchmarks performed with llama-bench show an increase in the number of tokens per second (t/s) generated. For example, in dual-GPU mode, the speed increased from approximately 80 t/s to over 110 t/s. Using only the RTX PRO, over 130 t/s were achieved. Specific results vary depending on the test parameters, as highlighted in the benchmark tables reported by the user.