LLM Inference Benchmarks on Strix Halo iGPU
A user from the LocalLLaMA community has published the results of a series of benchmarks performed on the Strix Halo's iGPU (integrated GPU), using different software configurations and llama.cpp builds. A total of 13 LLM models were tested with 15 different llama.cpp builds, varying options such as ROCm, Vulkan, gfx versions, hipblaslt (on/off), and rocWMMA.
The approach used was to create Docker images containing the different llama.cpp builds, to avoid dependency issues and simplify the testing process. Some builds failed, but these results were also considered useful data.
The complete results are available in the form of interactive tables, which allow comparison of the performance of different configurations.
For those evaluating on-premise deployments, there are trade-offs between performance, TCO, and compliance requirements. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!