Fine-Tuning

Training

Continuing to train a pre-trained model on a domain-specific dataset to permanently improve its performance on specialised tasks.

Fine-tuning adapts a general-purpose pre-trained model (like Llama 3) to a specific domain, task, or communication style by training it further on curated data. Unlike RAG, fine-tuning changes the model's weights permanently.

Fine-Tuning Methods

Full Fine-Tuning

All model weights are updated. Best quality results but requires enormous VRAM (70B model = ~140GB+ for training). Rarely practical on-premise.

SFT (Supervised Fine-Tuning)

Training on paired instruction-response examples. Standard first step for chat models. Can be full or PEFT.

LoRA / QLoRA

Parameter-efficient: only small adapter matrices are trained. 7B model fine-tunable on a single 24GB consumer GPU with QLoRA. Industry default.

Continued Pre-training

Run next-token prediction on raw domain text (manuals, code, scientific papers) before SFT. Injects domain knowledge at the weight level.

Fine-Tuning vs RAG

Fine-TuningRAG
Updates weightsYesNo
Update costHigh (retrain)Low (re-embed)
Knowledge freshnessStatic post-trainingReal-time
Hallucination riskBaked-in errorsGrounded in source
Best forStyle, tone, task logicFactual, changing data

Dataset Requirements

For SFT, 500–2000 high-quality examples are sufficient for task adaptation. For continued pre-training, millions of tokens of domain text improve results. Curate carefully — garbage data degrades the base model. Tools: Axolotl, LLaMA-Factory, Unsloth (fastest, 2× faster than standard transformers).