A user reported a significant performance increase for the Qwen3 Coder Next model after updating Llama.cpp. The tests were performed on a hardware configuration equipped with NVIDIA RTX GPUs, highlighting an increase in tokens generated per second.
Configuration Details
- GPU 1: NVIDIA RTX 6000 Ada Generation (compute capability 8.9)
- GPU 2: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (compute capability 12.0)
Benchmark Results
Benchmarks performed with llama-bench show an increase in the number of tokens per second (t/s) generated. For example, in dual-GPU mode, the speed increased from approximately 80 t/s to over 110 t/s. Using only the RTX PRO, over 130 t/s were achieved. Specific results vary depending on the test parameters, as highlighted in the benchmark tables reported by the user.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!