A user on Reddit shared their experience building a local workstation for LLMs (Large Language Models) equipped with 6 GPUs, totaling approximately 200GB of VRAM. The configuration includes a Threadripper PRO platform, 256GB of ECC RAM, Gen4 and Gen5 NVMe storage, and redundant power supplies.
Configuration and Objectives
The main goal is to run large reasoning models for internal data analysis and workflow automation. The user is experimenting with concurrent execution of multiple models and different GPU assignment strategies. The operating system used is Ubuntu 24.04.
Challenges and Questions
The user asks the community about the main bottlenecks in similar configurations (VRAM, PCIe bandwidth, CPU orchestration, memory bandwidth), whether using different GPUs can create long-term problems, and how to manage model scheduling between GPUs (static pinning vs. dynamic routing). Another question concerns the convenience of consolidating the system into fewer GPUs with more VRAM, compared to a distributed multi-card configuration.
For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to support these evaluations.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!