In machine learning, a model is the output of a training process: a collection of learned numbers (the weights or parameters) plus the architectural plan for how to use them. When you call ChatGPT, you are invoking a model. When you fine-tune Llama on your support tickets, you are producing a new model. When a fraud system scores a transaction, it is running a model.
A model is essentially a frozen function. You feed it an input — a sentence, an image, a row of numbers — and it produces an output. The function was learned, not written. The architecture defines the shape of the function (transformer, CNN, decision tree); the weights are what was discovered during training; together they form the model.
Frontier LLMs in 2026 ship as model families: a flagship model (GPT-5, Claude Sonnet 4, Gemini 2.5 Pro), smaller and cheaper variants (mini, haiku, flash), and specialised editions for coding, vision or long context. Choosing the right model for a workflow is now a core product decision: a routing layer that sends easy queries to a small model and hard queries to a flagship can cut costs by 5–10x without users noticing.
A model has several properties worth tracking:
- Size — measured in parameters (billions). Bigger is usually smarter but slower and pricier.
- Context window — how much input it can read in one request, from a few thousand tokens to over a million.
- Modality — text-only, multimodal (text+image+audio+video), or specialised (speech, image generation).
- Licence — closed (OpenAI, Anthropic), open weights (Llama, Mistral) or fully open source (a small minority).
- Knowledge cutoff — the date its training data ended; for fresh information the model needs retrieval.
Treating a model as a swappable component, not a religion, is the mark of mature AI engineering. Build around an interface, keep evaluation harnesses honest, and switch when something cheaper or smarter ships — which in this market means roughly every quarter.