Context in an LLM workflow refers to everything the model can see when generating its next response: the system prompt, the conversation history, any retrieved documents, the current user message, and any examples or tool definitions you have included. Context is the model's working memory for that single request — it has nothing else to draw on outside its own weights.
The size of the context the model can handle is its context window, measured in tokens. Modern frontier models in 2026 have window sizes ranging from 128k tokens (a couple of hundred pages) to 2 million tokens (a few books) to over 10 million in some research releases. Longer is generally better but never free — both inference cost and latency scale with context length.
Building a context for a request is a craft. The components you typically assemble:
- System prompt — instructions, role, tone, constraints, output format. Usually fixed across requests; benefits hugely from prompt caching.
- Few-shot examples — input/output pairs that demonstrate the desired behaviour.
- Retrieved documents (RAG) — chunks pulled from a knowledge base based on the user's query.
- Conversation history — previous turns, optionally summarised to fit budget.
- Tools and functions — schemas the model can call.
- User input — the actual current message.
Two failure modes haunt context-building. Lost in the middle is the well-documented finding that LLMs pay more attention to the start and end of a long context than the middle, so important information buried in position 60 of 100 may be ignored. Context contamination is when irrelevant or contradictory chunks confuse the model — more retrieved documents is not always better.
For a US team building production LLM apps, the practical discipline is: keep the context as short as possible while still containing the needed information, put critical instructions at the top and bottom, and evaluate retention on adversarial cases (questions whose answers live in the middle of the context). A 200k-token context that you actually need is healthy; a 200k-token context full of low-relevance noise is just paying to confuse the model.