Large Language Models

Top-p (Nucleus Sampling)

A sampling method that restricts the model to the smallest set of tokens whose probabilities sum to p.

In common use since 2019

Top-p sampling, also called nucleus sampling, is a token-selection method that restricts the model to the smallest set of candidate tokens whose cumulative probability adds up to a threshold p. Set p to 0.9 and the model only samples from the most likely tokens whose probabilities together cover 90% of the distribution; tokens beyond that "nucleus" are excluded.

The motivation behind top-p is that pure temperature sampling can produce strange outputs by occasionally selecting very unlikely tokens. Top-p clips the long tail of the distribution adaptively: in a "high confidence" position where one token has 80% probability, only that token is in the nucleus. In an ambiguous position with twenty roughly equal options, the nucleus expands to include all of them. The shape of the cut adapts to the situation.

Top-p is usually combined with temperature. The standard recipe in production:

  • Temperature 0–0.3, top-p 1.0 — deterministic tasks; top-p does nothing because temperature already collapses the distribution.
  • Temperature 0.7, top-p 0.9 — the chat default in most APIs; produces fluent, varied but not chaotic output.
  • Temperature 1.0+, top-p 0.95 — creative tasks; allows variation but still excludes outright nonsense.

The cousin to top-p is top-k, which always keeps the top k tokens regardless of their probabilities. Top-k is simpler but worse: in flat distributions it cuts off useful candidates, and in sharp distributions it lets in tokens that have essentially zero probability. Top-p adapts; top-k does not. Most modern APIs default to top-p over top-k.

A few practical notes for a US developer:

  • Setting both temperature 0 and top-p 0.9 is redundant — at temperature 0 there is no sampling. Either drives determinism.
  • Top-p only affects sampled tokens, not deterministic ones. Lowering p too aggressively can make outputs feel formulaic.
  • Top-p is provider-specific in implementation — OpenAI, Anthropic and Google may treat edge cases differently. If you swap providers, re-evaluate.
  • For JSON output, low temperature beats clever sampling — schemas care about format consistency, which sampling cannot guarantee. Use structured-output mode where available.

The general advice in 2026: leave top-p at the default unless you have a specific reason to change it. Most quality improvements come from better prompts, better data and better models, not from sampling-parameter alchemy.

Keep exploring

Looking for something else? The full glossary covers 120+ AI terms updated for 2026.

Open the glossary
Chat on WhatsApp