A user shared their experience running the Qwen-Coder-Next model on a Strix Halo platform using ROCm.

Configuration Details

The test was conducted using llamacpp-rocm b1170, with a context size set to 16k. The parameters --flash-attn on --no-mmap were used to optimize performance.

This result demonstrates the feasibility of running large language models, such as Qwen-Coder-Next (80B with 3B active), on consumer hardware with ROCm. For those evaluating on-premise deployments, there are trade-offs to consider, and AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.