Large Language Models

Embedding

A dense vector that represents the meaning of a piece of text, image or other content.

In common use since 2013

An embedding is a list of numbers — typically 256 to 4096 of them — that represents the meaning of a piece of content in a way a computer can compare. Two pieces of text with similar meaning end up with embedding vectors that are close together in geometric space. Two pieces with unrelated meanings end up far apart. That single property powers most of modern semantic search, retrieval-augmented generation and recommendation.

Embeddings come from embedding models, which are usually smaller, cheaper relatives of LLMs trained specifically to map content to vectors. OpenAI's text-embedding-3-large, Cohere's embed-v3, and open-source models like bge-large-en-v1.5 and e5-mistral are the workhorses you will see in production in 2026.

A typical retrieval pipeline using embeddings:

  1. Embed every document chunk in your knowledge base, store the vectors in a vector database.
  2. When a user asks a question, embed the question with the same model.
  3. Find the documents whose embeddings are closest to the question (cosine similarity, dot product or Euclidean distance).
  4. Stuff those documents into the LLM prompt as context.
  5. Generate the answer.

For a US engineering team, embeddings are usually the cheapest, highest-leverage AI primitive after prompting. They cost a fraction of a cent per thousand chunks, run in milliseconds, and turn any pile of unstructured text into a searchable knowledge base. If you have FAQ documents, support tickets, marketing copy or product manuals sitting in cloud storage, you can build a working semantic search in an afternoon.

Three things to watch:

  • Use one model consistently — embeddings from different models live in different geometric spaces and cannot be compared.
  • Chunk size matters — too small and chunks lack context; too large and similarity gets diluted. 200–800 tokens is a sane default.
  • Re-embed when the model changes — a new embedding model means re-embedding the entire corpus, which can be operationally painful at scale.

Embeddings extend beyond text. Image embeddings (CLIP), audio embeddings (Whisper, CLAP) and multi-modal embeddings let you do "find me images that look like this sketch" or "find me podcast moments where someone says X" with the same primitive.

Keep exploring

Looking for something else? The full glossary covers 120+ AI terms updated for 2026.

Open the glossary
Chat on WhatsApp