Prompt Engineering

Technique

The practice of designing LLM inputs to maximise output quality — including system prompts, few-shot examples, chain-of-thought triggers, and output format instructions.

Prompt engineering is the disciplined design of model inputs to achieve reliable, high-quality outputs. On-premise deployments lack the cloud-side guardrails built into services like ChatGPT — making thoughtful prompting even more critical.

Core Techniques

Zero-Shot

Just give the instruction. Works well for instruction-tuned models on common tasks. No examples needed.

Few-Shot

Include 3–10 input/output examples. Dramatically improves consistency on formatting and domain-specific tasks.

Chain-of-Thought

Ask the model to reason step-by-step. Append "Let's think step by step" or use explicit reasoning format.

Output Format Constraints

Specify output format explicitly: "Respond only in JSON with keys: name, score, reason." Reduces post-processing burden.

Role Prompting

"You are a senior Python engineer specialising in FastAPI. Review this code strictly." Contextualises the model's persona.

ReAct

Interleave "Thought:", "Action:", "Observation:" labels to structure tool-using agent loops.

Prompt Injection Risks

Any system that feeds external data (documents, search results, user inputs) into a prompt is vulnerable to prompt injection — malicious text that attempts to override the system prompt. Mitigations: strict input sanitisation, separate user and system scopes, output confidence scoring, human review for high-stakes decisions.

Why It Matters for On-Premise

On-premise models are often smaller than frontier APIs. A well-engineered prompt can close 80% of the quality gap between a local 7B model and GPT-4o for specific tasks. Invest time in benchmarking prompt variants against your actual dataset before changing model sizes or quantization levels.