A Strix Halo community enthusiast, identified as kyuz0, has implemented a two-node cluster based on AMD Strix Halo processors. This cluster uses an Intel E810 connection with RDMA RoCE v2 protocol to enable distributed inference of large language models (LLM) via Tensor Parallelism.

Configuration Details

The cluster consists of two AMD Strix Halo machines, connected via Intel E810 network cards that support RDMA over Converged Ethernet (RoCE) v2. This configuration allows distributing the inference workload between the two nodes, improving overall performance.

Resources and Guides

Kyuz0 has made available detailed benchmarks, a comprehensive guide for cluster setup, and an explanatory video. These resources provide users with all the necessary tools to replicate the configuration and leverage the power of a Strix Halo cluster for LLM inference.