A Reddit user reported significant speed increases in running the Qwen3-Coder-Next model, leveraging the --fit option in Llama.cpp. The test was performed on a hardware configuration equipped with two RTX 3090 graphics cards.

Configuration Details

  • Model: Qwen3-Coder-Next (Unsloth's UD_Q4_K_XL)
  • Hardware: 2x RTX 3090
  • Software: Llama.cpp (version b7941)

The results suggest that using the --fit parameter in Llama.cpp can lead to higher performance compared to the --ot option for this specific model and hardware configuration. Further details and graphs are available in the original Reddit thread.