A user shared their experience running the Qwen-Coder-Next model on a Strix Halo platform using ROCm.
Configuration Details
The test was conducted using llamacpp-rocm b1170, with a context size set to 16k. The parameters --flash-attn on --no-mmap were used to optimize performance.
This result demonstrates the feasibility of running large language models, such as Qwen-Coder-Next (80B with 3B active), on consumer hardware with ROCm. For those evaluating on-premise deployments, there are trade-offs to consider, and AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!