"How much does it cost to run an LLM locally?" has no single answer — it depends almost entirely on how busy you keep the hardware. The mistake most teams make is comparing the GPU's purchase price to a cloud hourly rate. The right comparison is total cost of ownership over the GPU's useful life, divided by the tokens you actually generate.

What goes into local cost

Cost component Notes
GPU hardware Largest upfront cost; depreciate over 2-4 years
Electricity A 700W GPU running 24/7 is real money, especially in the EU
Cooling & power PSU, cooling, possibly room/rack
Maintenance Engineer time: drivers, uptime, updates
Utilization THE key variable — idle GPUs waste the whole investment

When local wins

High, predictable, sustained load; strict privacy or data-residency requirements; or wanting capped, predictable monthly cost instead of variable cloud bills.

When the cloud wins

Bursty, occasional, or experimental workloads; needing the latest GPUs without capex; or scaling up and down fast. Per-second GPU clouds bill only for what you use.

Frequently asked questions

Local or cloud — cheaper?
Local only at high sustained utilization; cloud for bursty/occasional use.

Hidden costs?
Electricity, cooling, depreciation, downtime, engineer time.