AMD Strix Halo NPU Now Works with ROCm: Hybrid GPU+NPU for Local LLMs
The AMD Ryzen AI Max+ 395 (Strix Halo) finally gets its NPU running for LLM inference thanks to tools like Lemonade, enabling a hybrid NPU+iGPU mode. This leverages the NPU’s speed for prompt processing while the GPU handles token generation in paral...