AI Hardware: GPUs, CPUs, and Accelerators
Hardware is the foundation of AI deployment. This guide covers GPU selection, CPU requirements, memory considerations, and infrastructure planning for machine learning workloads and LLM inference.
On This Page
GPU Fundamentals for AI
GPUs (Graphics Processing Units) excel at parallel computation, making them ideal for AI workloads. Modern AI GPUs feature thousands of CUDA or ROCm cores optimized for matrix multiplication and tensor operations.
Key GPU Specifications
- VRAM (Video RAM): Determines maximum model size; critical for LLM inference
- CUDA/Tensor Cores: Specialized units for AI computation; more cores = faster processing
- Memory Bandwidth: Speed of data transfer to/from VRAM; affects inference speed
- FP16/BF16 Performance: Half-precision math performance; key metric for modern AI
- Power Consumption (TDP): Thermal design power; impacts cooling and power requirements
NVIDIA vs AMD vs Intel
NVIDIA: Market leader with CUDA ecosystem, best software support (PyTorch, TensorFlow). RTX series for consumers, A/H-series for datacenters.
AMD: Competitive pricing with ROCm framework. Radeon RX for consumers, Instinct MI series for enterprise.
Intel: Emerging player with Arc GPUs and Gaudi accelerators. Growing oneAPI ecosystem.
Consumer vs Datacenter GPUs
Consumer GPUs
Examples: RTX 4090, RTX 4080, RTX 3090
VRAM: 12-24GB
Price: $800-$1,800
Best for: Personal research, small-scale deployment, development
โ Cost-effective ยท โ Easy to source ยท โ Good for <13B models
Datacenter GPUs
Examples: A100, H100, A40
VRAM: 40-80GB
Price: $10,000-$40,000
Best for: Production deployment, large models, enterprise workloads
โ High VRAM ยท โ Better reliability ยท โ ECC memory ยท โ Multi-GPU scaling
๐ก Rule of Thumb: Consumer GPUs offer best price/performance for development and small-scale deployment. Datacenter GPUs become cost-effective at scale and for models >30B parameters.
CPU Considerations for AI
While GPUs handle inference, CPUs manage orchestration, data preprocessing, and can run smaller models directly. Modern CPUs with AVX-512 or AVX2 instructions significantly improve AI performance.
CPU-Only Inference
For models <7B parameters, CPU inference with quantization (GGUF format) is viable. Frameworks like llama.cpp enable efficient CPU deployment on commodity hardware. Expect 5-20 tokens/second on modern desktop CPUs.
Recommended CPU Specs
- Cores: 8+ physical cores for multi-user scenarios
- Instructions: AVX2 minimum, AVX-512 for best performance
- Cache: Larger L3 cache improves inference speed
- RAM: 32GB minimum for LLM hosting, 64GB+ recommended
Memory Requirements by Model Size
VRAM requirements depend on model parameters and precision. Use this table as a quick reference:
| Model Size | FP16 (Full) | INT8 (Quantized) | INT4 (GGUF) |
|---|---|---|---|
| 7B parameters | ~14GB | ~7GB | ~4GB |
| 13B parameters | ~26GB | ~13GB | ~7GB |
| 30B parameters | ~60GB | ~30GB | ~16GB |
| 70B parameters | ~140GB | ~70GB | ~35GB |
โ ๏ธ Note: Add 10-20% overhead for context cache and system operations. For multi-user scenarios, multiply by concurrent user count.
Infrastructure Planning
Power and Cooling
High-end GPUs consume 300-700W under load. Factor in PSU efficiency (80+ Gold/Platinum), CPU power, and cooling overhead. Budget 1.3-1.5x GPU TDP for total system power.
Multi-GPU Setups
For models exceeding single-GPU VRAM, use tensor parallelism to split across multiple GPUs. Requires NVLink (NVIDIA) or Infinity Fabric (AMD) for optimal performance. PCIe 4.0 x16 per GPU minimum.
Storage Considerations
- Model Storage: 10-150GB per model; use NVMe SSDs for fast loading
- Dataset Storage: Variable; consider network-attached storage for large datasets
- Log/Cache Storage: 50-500GB for operational data and caching layers
Hardware Selection Matrix
Use our interactive Hardware Matrix tool to compare 24+ hardware configurations across different use cases, budgets, and performance requirements.
Quick Recommendations
Resources and Further Reading
On AI-Radar
- Latest hardware news and reviews
- Interactive hardware comparison matrix
- Hardware requirements for on-premise LLMs
- LLM models and deployment guides
Recent Hardware Articles
Last updated: January 2026 | Hardware recommendations updated quarterly