A user shared their experience running the Qwen-Coder-Next model on a Strix Halo platform using ROCm.
Configuration Details
The test was conducted using llamacpp-rocm b1170, with a context size set to 16k. The parameters --flash-attn on --no-mmap were used to optimize performance.
This result demonstrates the feasibility of running large language models, such as Qwen-Coder-Next (80B with 3B active), on consumer hardware with ROCm. For those evaluating on-premise deployments, there are trade-offs to consider, and AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!