AI Techniques

Prompt Engineering

Prompt engineering is the practice of designing and refining the text inputs given to an AI model to reliably produce accurate, useful, and well-formatted outputs — without changing the model's weights.

Prompt engineering is the discipline of communicating effectively with large language models. Because LLMs generate their next token by predicting what follows the input text, the exact wording, structure, and context of that input has an outsized effect on output quality. A well-engineered prompt can turn a vague or unreliable response into a precise, consistent, production-ready one — with no retraining required.

As LLMs have become embedded in products and workflows, prompt engineering has evolved from an informal art into a structured practice with repeatable techniques, evaluation methods, and tooling.

Why Prompts Matter So Much

LLMs are statistical text predictors. They don’t “understand” intent — they complete text. This means:

  • Ambiguity is expensive: An underspecified prompt forces the model to guess at your intent. Small rewording changes can shift outputs dramatically.
  • Context is everything: The model has no persistent memory. Every piece of context it needs must be in the prompt or the conversation history.
  • Format shapes format: If you want structured output (JSON, a numbered list, a table), demonstrating that format in the prompt is the most reliable way to get it.

Core Techniques

Zero-Shot Prompting

Ask the model to perform a task with no examples, relying purely on its training. Works well for straightforward tasks (“Summarise this article in three bullet points.”). Fails for tasks requiring specific reasoning patterns or output formats the model hasn’t been asked for in exactly this way.

Few-Shot Prompting

Provide two to five worked examples of the input-output pattern you want before presenting your actual query. The model pattern-matches on the examples and applies the same structure to your input. Especially effective for classification, formatting, and domain-specific tasks where the model needs to learn your output schema from demonstrations rather than description.

System Prompts

Most production deployments use a system prompt — a privileged block of instructions processed before the user turn. System prompts set the model’s persona, constraints, output format, and behavioural guardrails. Well-crafted system prompts are the primary lever for building consistent product behaviour across many user interactions.

Role Prompting

Assigning the model a role (“You are a senior data analyst…”) consistently improves output quality on domain-specific tasks. The role activates related patterns from training data — an analyst’s language, conventions, and reasoning habits — making responses more coherent with professional expectations.

Chain-of-Thought Prompting

Adding “think step by step” or demonstrating multi-step reasoning in examples significantly improves performance on arithmetic, logic, and multi-step tasks. The model externalises intermediate reasoning before reaching a conclusion, reducing errors that arise from jumping directly to an answer. See the dedicated Chain-of-Thought Prompting entry for full detail.

Instruction Clarity and Constraint Specification

LLMs follow instructions literally. Explicit constraints — length limits, forbidden topics, required sections, tone — dramatically reduce output variance. “Write a product description” produces something; “Write a 60-word product description for a B2B SaaS audience, emphasising time savings, without using the word ‘revolutionary’” produces something usable.

Output Format Specification

Specifying the exact output format — JSON schema, markdown headers, a specific table structure — is one of the highest-leverage prompt engineering moves for production systems. It enables downstream parsing and reduces post-processing work. For JSON output, providing the schema (or a filled example) is more reliable than describing it in prose.

Advanced Techniques

Self-Consistency

Generate multiple independent responses to the same prompt, then select the most common answer by majority vote. Significantly improves accuracy on reasoning tasks without changing the model — the diversity of outputs surfaces the most stable conclusion. Trades latency and cost for reliability.

ReAct (Reasoning + Acting)

A prompting pattern for agentic tasks that interleaves reasoning steps with tool calls. The model alternates between “Thought:” (internal reasoning) and “Action:” (tool call) until it reaches a final answer. Reduces hallucination in tool-using agents by forcing the model to make its reasoning explicit before acting.

Prompt Chaining

Break a complex task into a sequence of simpler prompts, where each step’s output feeds into the next. More reliable than asking one prompt to do everything, and makes debugging straightforward — you can isolate which step produced a bad output.

Contextual Stuffing and RAG

For tasks requiring external knowledge, the prompt includes retrieved passages from a knowledge base. This is the basis of Retrieval-Augmented Generation — the retrieval step determines what goes into the prompt, and the prompt structure determines how the model uses it.

Prompt Engineering vs. Fine-Tuning

Prompt EngineeringFine-Tuning
Changes model weightsNoYes
CostLow (inference only)High (training + inference)
Speed to iterateMinutesHours to days
Best forFormat, tone, task specificationDeep domain knowledge, style
PortabilityModel-specificModel-specific
Data requiredNoneHundreds to thousands of examples

The general guidance: reach for prompt engineering first. Fine-tune only when the capability gap cannot be closed with prompting alone — typically when the domain vocabulary, output style, or reasoning pattern is genuinely absent from the base model’s training data.

Evaluation and Iteration

Good prompt engineering requires a test set. Without evaluation:

  • You optimise for one example and unknowingly break others
  • Improvements are unverifiable and non-transferable

A minimal evaluation loop: collect 20–50 representative inputs with expected outputs, run your prompt against all of them, measure accuracy or quality, and track changes across iterations. LLM-as-judge (using a capable model to grade outputs) is a practical way to scale evaluation beyond manual review.

The Practitioner’s Workflow

  1. Start with the simplest possible prompt and observe failure modes
  2. Add constraints and instructions to address observed failures
  3. Add few-shot examples if the format or reasoning pattern isn’t landing
  4. Evaluate on a representative test set
  5. Iterate — one change at a time, so you know what helped
  6. Lock the prompt in version control like any other code artifact

2025–2026: What’s Changed

Prompt engineering for reasoning models is different. A June 2025 Wharton research report found that chain-of-thought instructions add only marginal benefit for models like o3, Claude 4, and Gemini 2.5 Pro — and in some cases increase latency by 20–80% with no accuracy gain. For these models, the guidance is to describe the task clearly and let the model’s built-in reasoning do the work rather than scaffolding the thinking process manually.

Multimodal prompting is now a first-class concern. With Gemini 2.5, GPT-4o, and Claude’s vision capabilities mainstream, prompts frequently include images, PDFs, and audio alongside text. Best practices for multimodal prompts: provide detailed context about what the image/document contains, specify what aspect the model should focus on, and position image references early in the prompt rather than at the end.

Graph-of-Thought and extended reasoning structures have extended the CoT/ToT family further — the model generates a reasoning graph rather than a linear chain, enabling parallel exploration of sub-problems before synthesis. Relevant for highly complex analytical tasks but adds significant token cost.

Prompt scaffolding as a security practice. As LLMs handle more sensitive workflows, prompt scaffolding — wrapping user inputs in structured, validated templates that constrain what the model will act on — has emerged as the primary defence against prompt injection and jailbreak attacks. Production systems treat the system prompt as a security boundary, not just an instruction layer.

Model-specific prompt tuning matters more. Google’s prompt engineering whitepaper notes that Gemini models prefer shorter, more direct prompts with few-shot examples placed before the question. Anthropic’s guidance for Claude emphasises explicit XML-structured instructions. OpenAI models respond well to numbered instruction lists. As models diverge architecturally, prompt strategies that transfer universally are giving way to model-specific best practices — a reason to pin production apps to specific model snapshots.

How to Use — Structured system prompt with role, format, and constraints

python
from anthropic import Anthropic

client = Anthropic()

SYSTEM = """You are a senior data analyst. Your job is to answer questions about business metrics.

Rules:
- Always cite which metric you are referencing.
- If a question is ambiguous, ask one clarifying question before answering.
- Format numerical answers with commas and two decimal places.
- Keep answers under 150 words unless the user asks for detail.
- Never speculate about data you were not given."""

def ask(question: str, context: str = "") -> str:
    messages = []
    if context:
        messages.append({"role": "user", "content": f"Here is the dataset context:\n{context}"})
        messages.append({"role": "assistant", "content": "Understood. I have the dataset context. What would you like to know?"})
    messages.append({"role": "user", "content": question})

    response = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=512,
        system=SYSTEM,
        messages=messages,
    )
    return response.content[0].text

data_context = "Monthly revenue (USD): Jan 1,240,000 | Feb 980,000 | Mar 1,560,000"
print(ask("What was the highest revenue month?", context=data_context))
print(ask("What does ARR mean?"))  # triggers clarification request

Ready to build?

Leverage AI technologies to build your product stack

Superteams can help you build, deploy and launch AI application stacks using open source technologies — from architecture through to production.

Talk to Superteams