Few-shot prompting is the technique of including a handful of input/output examples in the prompt to teach the model the desired pattern before it sees the actual query. The model sees three to five worked examples, infers the pattern, and applies it to the new input. The phrase comes from machine learning, where "few-shot learning" historically meant training a classifier from a few examples; in the LLM era it means demonstrating in-context.
A few-shot prompt typically looks like:
- A system instruction explaining the task.
- 3–5 example pairs, each clearly delimited (often with Input: / Output: labels or XML-style tags).
- The new input the model should respond to.
Why few-shot works so well:
- Format demonstration — examples teach the structure better than any instruction. "Output as JSON with these fields" plus three JSON examples beats either alone.
- Style transfer — copywriting in your brand voice, code in your codebase's idioms, summaries with the level of detail you want.
- Edge case handling — examples can show how to handle ambiguity, missing input, or boundary conditions.
- Fewer wasted tokens than instructions — three good examples often replace 200 tokens of careful instructions.
When to use few-shot:
- Format-sensitive tasks — JSON schemas, structured extraction, code in a specific style.
- Domain-specific writing — legal language, medical notes, brand voice.
- Classification with custom labels — when zero-shot would require explaining all labels in detail.
- Anything with an "I'll know it when I see it" quality — examples convey intent that words struggle with.
When few-shot is wasted effort:
- Very capable models on common tasks — GPT-5 and Claude Sonnet 4 zero-shot many tasks at near-few-shot quality.
- Tasks where the examples themselves leak the answer pattern — bad examples can teach the model the wrong thing.
- High-volume cheap workloads — example tokens cost real money; for billions of queries, fine-tuning is more economical.
The variant dynamic few-shot retrieves the most relevant examples from a library at query time using embeddings, then assembles them into the prompt. This combines the format precision of few-shot with the breadth of an example library, and is increasingly the default pattern for serious production LLM apps.
For a US team building an LLM feature, the rule of thumb in 2026: start zero-shot, measure quality, add 3–5 examples if quality is short, retrieve dynamic examples if the task is varied. Fine-tune only when neither prompting nor retrieval gets you there.