Local LLM performance is gated almost entirely by VRAM: if the model plus its context fits in GPU memory, it runs fast; if it does not, you spill to system RAM and speed collapses. So choosing a GPU is really choosing a VRAM tier.
VRAM tiers
| VRAM | Example cards | Runs (4-bit) |
|---|---|---|
| 8-12GB | RTX 3060/4060, A2000 | up to 7-8B |
| 16GB | RTX 4060 Ti 16GB, A4000 | up to 13B |
| 24GB | RTX 3090 / 4090, A5000 | 7B-34B comfortably |
| 48GB | RTX A6000, dual 3090/4090 | up to ~70B |
| 80GB | A100 / H100 | 70B+ and training |
Best value picks
Individuals/hobbyists: a used RTX 3090 (24GB) is the price-performance king. Prosumers: RTX 4090. Small teams needing 70B: an A6000 (48GB) or dual-4090 rig. Renting an 80GB card by the hour often beats buying unless utilization is high.
Frequently asked questions
How much VRAM do I need?
Roughly params(B) × 0.5 for 4-bit. 13B ≈ 8-10GB, 70B ≈ 40-48GB.