Fine-Tuning – LLM Glossary

Fine-tuning adapts a general-purpose pre-trained model (like Llama 3) to a specific domain, task, or communication style by training it further on curated data. Unlike RAG, fine-tuning changes the model's weights permanently.

Fine-Tuning Methods

Full Fine-Tuning

All model weights are updated. Best quality results but requires enormous VRAM (70B model = ~140GB+ for training). Rarely practical on-premise.

SFT (Supervised Fine-Tuning)

Training on paired instruction-response examples. Standard first step for chat models. Can be full or PEFT.

LoRA / QLoRA

Parameter-efficient: only small adapter matrices are trained. 7B model fine-tunable on a single 24GB consumer GPU with QLoRA. Industry default.

Continued Pre-training

Run next-token prediction on raw domain text (manuals, code, scientific papers) before SFT. Injects domain knowledge at the weight level.

Fine-Tuning vs RAG

	Fine-Tuning	RAG
Updates weights	Yes	No
Update cost	High (retrain)	Low (re-embed)
Knowledge freshness	Static post-training	Real-time
Hallucination risk	Baked-in errors	Grounded in source
Best for	Style, tone, task logic	Factual, changing data

Dataset Requirements

For SFT, 500–2000 high-quality examples are sufficient for task adaptation. For continued pre-training, millions of tokens of domain text improve results. Curate carefully — garbage data degrades the base model. Tools: Axolotl, LLaMA-Factory, Unsloth (fastest, 2× faster than standard transformers).