Variance in machine learning is the amount a model's predictions change when you train it on a different sample of data from the same underlying distribution. A high-variance model is sensitive: train it on slightly different data and you get a noticeably different model. A low-variance model is stable: same predictions regardless of which sample you happened to draw.
Variance is one half of the famous bias-variance tradeoff. Total error on new data can be decomposed into bias squared (how systematically wrong the model is on average) plus variance (how much it bounces around) plus irreducible noise (the inherent unpredictability of the world). Reducing one tends to increase the other; finding the sweet spot is the entire art of model selection.
A high-variance model in practice looks like this:
- Decision trees grown to full depth — every leaf memorises a few training points.
- A neural network with too many parameters for too little data.
- A k-nearest-neighbours model with k=1.
- A polynomial regression of high degree on a small sample.
The fixes for variance are essentially the fixes for overfitting — more data, simpler models, regularization, ensembling. Ensembles are the most direct counter: train many slightly different models and average their predictions. Random Forests apply this to decision trees; bagging and boosting apply it more generally; LLM "self-consistency" (sample many chains of thought and take the majority answer) is the same idea applied to language models.
For LLMs in 2026, variance shows up in two places. Sampling variance — the model produces different answers on different runs because of temperature and top-p — is what makes "did you reproduce that bug?" genuinely hard. Fine-tuning variance — a small dataset can drag a base model in many different directions depending on the random seed — is why serious teams report mean and standard deviation across multiple training runs, not a single number.
The practical rule for a US engineer: variance is your friend when you can average it out, and your enemy when you cannot. Single-prediction systems hate variance; ensembled or self-consistent systems love it. Knowing which regime you are in dictates whether you should fight variance or harness it.