Large Language Models

Fine-tuning

Continuing to train a pretrained model on your own data so it specialises in your task or style.

In common use since 2018

Fine-tuning is the process of taking a pretrained model and continuing to train it on your own dataset so it specialises in your task, your style or your domain. It sits between two extremes: pure prompting (cheap and flexible but limited) and training from scratch (massively expensive and almost always unjustified).

For LLMs in 2026, fine-tuning has split into a clear hierarchy:

  • Prompt engineering first — never fine-tune what a good prompt can do. 80% of "we need fine-tuning" cases are solved by better prompting and a few examples.
  • LoRA / QLoRA adapters — train a tiny set of low-rank matrices on top of a frozen base model. Costs in the hundreds of dollars, fits on a single consumer GPU, and gets you 90% of the value of full fine-tuning for most tasks.
  • Full supervised fine-tuning (SFT) — unfreeze the whole model and train. Expensive, slow, and necessary only for serious domain shift or very large datasets.
  • Continued pretraining — when you need the model to internalise a new corpus before SFT. Used in legal, biomedical and code-specialised models.
  • RLHF / DPO — preference-based training to align outputs with rankings. Mostly the territory of frontier labs, but DPO has made it accessible to mid-sized teams.

The right candidates for fine-tuning are tasks with consistent format requirements (always emit a specific JSON schema), stylistic constraints (write like our brand voice), domain language (medical or legal terminology where the base model is fuzzy), or performance gaps (the base model is wrong in a specific predictable way and you have labelled examples).

Bad candidates: anything you can do with a few-shot prompt, anything the model is already good at, and anything where the data changes weekly (you would need to retrain constantly).

For a US business team, the economics in 2026 favour LoRA fine-tuning for almost every use case where prompting is not enough. A typical project: collect 500–5,000 high-quality input/output pairs, fine-tune Llama 3.1 70B or Mistral Large with QLoRA on rented GPUs for a few hundred dollars, evaluate against a held-out set, and ship a model that runs at a fraction of GPT-5 cost while matching it on your specific task.

The risk to manage: fine-tuning bakes assumptions in. When the base model upgrades, your fine-tune may be left behind. Plan for periodic re-tunes and keep your training pipeline reproducible.

Keep exploring

Looking for something else? The full glossary covers 120+ AI terms updated for 2026.

Open the glossary
Chat on WhatsApp