"How much does it cost to run an LLM locally?" has no single answer — it depends almost entirely on how busy you keep the hardware. The mistake most teams make is comparing the GPU's purchase price to a cloud hourly rate. The right comparison is total cost of ownership over the GPU's useful life, divided by the tokens you actually generate.
What goes into local cost
| Cost component | Notes |
|---|---|
| GPU hardware | Largest upfront cost; depreciate over 2-4 years |
| Electricity | A 700W GPU running 24/7 is real money, especially in the EU |
| Cooling & power | PSU, cooling, possibly room/rack |
| Maintenance | Engineer time: drivers, uptime, updates |
| Utilization | THE key variable — idle GPUs waste the whole investment |
When local wins
High, predictable, sustained load; strict privacy or data-residency requirements; or wanting capped, predictable monthly cost instead of variable cloud bills.
When the cloud wins
Bursty, occasional, or experimental workloads; needing the latest GPUs without capex; or scaling up and down fast. Per-second GPU clouds bill only for what you use.
Frequently asked questions
Local or cloud — cheaper?
Local only at high sustained utilization; cloud for bursty/occasional use.
Hidden costs?
Electricity, cooling, depreciation, downtime, engineer time.