AMD Strix Halo RDMA Cluster Setup Guide
Key takeaways
- This guide details how to configure a two-node AMD Strix Halo cluster linked via Intel E810 (Ro CE v2) for distributed v LLM inference using Tensor Parallelism.
- Key Note: The refresh_toolbox.sh script detects your Infiniband/RDMA devices and automatically configures the container to expose them.
- To fully utilize the Strix Halo cluster, it is helpful to understand the technologies involved:
This guide details how to configure a two-node AMD Strix Halo cluster linked via Intel E810 (Ro CE v2) for distributed v LLM inference using Tensor Parallelism.
TL;DR (Quick Start) Concepts & Architecture Hardware Prerequisites Host Configuration (Fedora) 4.1 Install Packages 4.2 Check Native Firmware 4.3 Network Configuration 4.4 BIOS & Kernel Configuration 4.5 Firewall Rules Toolbox Installation & Network Verification 5.1 Prerequisites: Passwordless SSH 5.2 Installation 5.3 Verify RDMA Connection Running the Cluster 6.1 Setup & Verify 6.2 Launching vLLM Troubleshooting References & Acknowledgements 1. TL;DR (Quick Start) On Both Nodes:
Key Note: The refresh_toolbox.sh script detects your Infiniband/RDMA devices and automatically configures the container to expose them.