Local LLM performance is gated almost entirely by VRAM: if the model plus its context fits in GPU memory, it runs fast; if it does not, you spill to system RAM and speed collapses. So choosing a GPU is really choosing a VRAM tier.

VRAM tiers

VRAMExample cardsRuns (4-bit)
8-12GBRTX 3060/4060, A2000up to 7-8B
16GBRTX 4060 Ti 16GB, A4000up to 13B
24GBRTX 3090 / 4090, A50007B-34B comfortably
48GBRTX A6000, dual 3090/4090up to ~70B
80GBA100 / H10070B+ and training

Best value picks

Individuals/hobbyists: a used RTX 3090 (24GB) is the price-performance king. Prosumers: RTX 4090. Small teams needing 70B: an A6000 (48GB) or dual-4090 rig. Renting an 80GB card by the hour often beats buying unless utilization is high.

Frequently asked questions

How much VRAM do I need?
Roughly params(B) × 0.5 for 4-bit. 13B ≈ 8-10GB, 70B ≈ 40-48GB.