A token is the atomic unit a language model reads and writes. It is rarely a whole word. The text "tokenization is wild" might be split into ["token", "ization", " is", " wild"] — four tokens. A modern English text averages roughly 0.75 tokens per word, but the number varies wildly by language: code, JSON and non-Latin scripts often use 2–4x more tokens than equivalent English prose.
Tokens matter because every commercial decision around an LLM is denominated in them. API pricing is per million input tokens and per million output tokens. Context windows are measured in tokens. Latency is roughly proportional to output tokens. Throughput is measured in tokens per second. If you cannot estimate token counts in your workload, you cannot estimate cost or performance.
Each model family uses its own tokenizer, which is essentially a vocabulary plus a splitting algorithm. OpenAI uses Byte-Pair Encoding (BPE) variants; Anthropic uses a related approach; Llama and Mistral use SentencePiece. The vocabulary size is usually 30k–200k tokens. Larger vocabularies handle uncommon words and non-English languages better but make the model bigger.
The practical implications for a US developer in 2026:
- Estimate before you build — tools like tiktoken (OpenAI) or the Anthropic SDK count tokens for any input. Run a representative sample through them before sizing a budget.
- Long inputs cost more than long outputs — sometimes. Different APIs price input and output very differently; check your provider's table.
- Prompt caching changes the math — frontier APIs now reuse the encoded prefix of long system prompts at a fraction of the price. Engineering your prompts so the cacheable part comes first can cut costs by an order of magnitude.
- Languages matter — Spanish costs 1.2x more than English in tokens; Arabic 2x; Korean nearly 3x. Global products budget per locale.
- Output token caps protect your budget — always set max_tokens. A runaway response is a runaway invoice.
Tokens also explain weird LLM behaviours. The classic "strawberry has two Rs" mistake (it has three) is a tokenization artefact: the model sees "straw" and "berry" as separate chunks, not individual letters. Counting characters is genuinely hard for LLMs because they do not see characters — they see tokens.