Chain-of-Thought prompting asks the model to show its reasoning process before giving a final answer. It was shown (Wei et al., 2022) to dramatically improve accuracy on math, logic, and multi-step reasoning tasks — for free, just from prompt design.
Variants
Zero-Shot CoT
Simply append "Let's think step by step." to any question. The model produces a reasoning trace unprompted. Works on most instruction-tuned models ≥7B.
Few-Shot CoT
Provide 2–5 worked examples with reasoning traces in the prompt. More effective than zero-shot for domain-specific tasks but consumes more context.
Tree of Thought (ToT)
Explore multiple reasoning branches in parallel, evaluating and pruning. Implemented as multi-call orchestration rather than a single prompt. Strong on puzzle-type tasks.
ReAct (Reason + Act)
Interleaves reasoning steps with tool calls. The basis of most modern agent frameworks. Requires a model with reliable function-calling ability.
When CoT Helps (and When It Doesn't)
CoT helps most on tasks that benefit from intermediate steps: arithmetic, symbolic reasoning, multi-constraint planning, code debugging. It has little effect on simple factual retrieval or sentiment classification — and can actually hurt if the reasoning chain introduces errors that override a correct "gut" answer. Turn it off for high-volume, simple classification tasks to save tokens and latency.
Why It Matters for On-Premise
On-premise models tend to be smaller than frontier cloud models. CoT is one of the cheapest ways to close the capability gap, requiring no fine-tuning — just smarter prompting. A Llama 3 8B with a well-crafted CoT prompt can outperform a naive call to a 70B model on structured reasoning tasks.