6-GPU local LLM workstation: scaling and orchestration advice

A user on Reddit shared their experience building a local workstation for LLMs (Large Language Models) equipped with 6 GPUs, totaling approximately 200GB of VRAM. The configuration includes a Threadripper PRO platform, 256GB of ECC RAM, Gen4 and Gen5 NVMe storage, and redundant power supplies.

Configuration and Objectives

The main goal is to run large reasoning models for internal data analysis and workflow automation. The user is experimenting with concurrent execution of multiple models and different GPU assignment strategies. The operating system used is Ubuntu 24.04.

Challenges and Questions

The user asks the community about the main bottlenecks in similar configurations (VRAM, PCIe bandwidth, CPU orchestration, memory bandwidth), whether using different GPUs can create long-term problems, and how to manage model scheduling between GPUs (static pinning vs. dynamic routing). Another question concerns the convenience of consolidating the system into fewer GPUs with more VRAM, compared to a distributed multi-card configuration.

For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to support these evaluations.

🔍 Continue Exploring

6-GPU local LLM workstation: scaling and orchestration advice

Configuration and Objectives

Challenges and Questions

💻 Need GPU Cloud Infrastructure?

💬 Comments (0)

🔍 Continue Exploring

Explore LLM On-Premise

Hardware setup with 3 V620 GPUs for 96GB of VRAM

LLM: Which local model on 24GB GPU in 2026?

Local AI inference: possible even without a GPU

👥 Join 160+ AI explorers