The amount of VRAM a model needs is driven by two things: how many parameters it has, and how many bytes each parameter takes — which depends on quantization. Quantization trades a little quality for a large memory saving, and is how 70B models fit on a single card.

Llama 70B by quantization

PrecisionVRAM neededFits on
4-bit (Q4)~40-48GB48GB card, or 2×24GB
8-bit (Q8)~70GB80GB card
FP16~140GB+2× 80GB (multi-GPU)

Add ~10-20% on top for the KV cache, which grows with context length — long contexts need noticeably more memory.

The sizing formula for any model

VRAM (GB) ≈ params(B) × bytes/weight × 1.15
where bytes/weight ≈ 0.5 (4-bit), 1 (8-bit), 2 (FP16). Example: 13B at 4-bit ≈ 13 × 0.5 × 1.15 ≈ 7.5GB.