An enthusiast shared their hardware setup for running large language models (LLMs) locally. The system consists of four Lenovo P620 workstations, each equipped with two NVIDIA RTX 3090 graphics cards, for a total of 192 GB of VRAM.
Configuration Details
- Hardware: 4 x Lenovo P620
- GPU: 8 x NVIDIA RTX 3090 (2 per workstation)
- Total VRAM: 192 GB
- Interconnection: 10Gbit network (planned upgrade to 100Gbit)
- Framework: vLLM with Ray
- Limitations: GPUs limited to 200W
Usage
The cluster is used for code development and testing, with the goal of leveraging the VRAM capacity for running LLMs. The user plans to integrate the CPUs (4x 3975WX) and 1TB of RAM in the future, potentially with llama.cpp or IK-llama.
For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!