One of the hardest things about building AI systems is that the things humans care about (words, sentences, images, ideas, etc) aren’t naturally something a computer can do math on. A computer doesn’t inherently know that “happy” and “joyful” are similar, or that a photo of a dog and the word “dog” are related. It just sees raw data.
Embeddings are the solution to that problem.
An embedding is a representation of something (like a word, a sentence, an image, a piece of audio, etc) as a list of numbers. Not an arbitrary list, but one where the numbers encode meaning. Things with similar meanings get similar numbers. Things with different meanings get different ones.
That’s the whole idea. Everything else follows from it.
Why Turn Things Into Numbers?
Because once you have numbers, you can do math. And math unlocks a lot.
With embeddings, you can measure how similar two sentences are by calculating the distance between their number lists. You can find the most relevant document in a library in response to a question. You can group thousands of customer reviews by topic without reading any of them. You can compare a text description to an image and determine how closely they match.
With raw text, none of that is straightforward. With embeddings, it becomes a geometry problem. And computers are very good at geometry.
A Simple Way to Think About It

Imagine plotting words on a map. Words with similar meanings cluster together in the same neighbourhood. “Dog” and “cat” are close. “Dog” and “democracy” are far apart. “Happy”, “joyful”, and “delighted” are practically on the same street.
An embedding is just that map, but with many more dimensions than two. Instead of an x and y coordinate, a typical embedding might have 768 or 1536 numbers representing position across hundreds of dimensions simultaneously. You can’t visualise that many dimensions, but the concept is the same. Similar meanings live close together, different meanings live far apart.
The distance between two embeddings is a measure of how semantically similar they are. That’s a powerful and flexible property to have.
What Can Be Embedded?
Almost anything can be embedded, as long as you have a model trained to embed it:
- Text — words, sentences, paragraphs, entire documents
- Images — photos, illustrations, screenshots
- Audio — speech, music, sound clips
- Video — frames or entire clips
- Structured data — in some systems, even tabular records or user behaviour patterns
Multimodal embedding models can embed different types of content into the same shared space, which means you can directly compare a text description to an image, or find images that match a spoken query. That shared space is what makes cross-modal search and retrieval possible.
How Embeddings Are Created
Embeddings are produced by embedding models. These are neural networks trained on large amounts of data to map inputs into vector space in a meaningful way.
During training, the model learns which things belong near each other by being shown enormous numbers of examples (pairs of related sentences, image-caption pairs, documents on similar topics, etc). Over time it builds up an internal map of meaning that generalises well. Once trained, you can pass any new input through the model and get back a vector that places it accurately in that map, even if the model has never seen that specific input before.
Building your own embedding model from scratch isn’t necessary. And for most applications, this is not something you’d want to do. There are many pre-trained models available (both through APIs and as open-source models you can run yourself) that produce high-quality embeddings out of the box for most general-purpose use cases.
Where Embeddings Show Up in AI
Embeddings are foundational to a wide range of AI applications. For example:
- Semantic search is one of the most common uses. Traditional keyword search looks for exact word matches. Semantic search uses embeddings to find results that match the meaning of a query, even if the words are completely different. Search for “fixing a leaky tap” and get results about “plumbing repair” and “stopping a dripping faucet”.
- Retrieval-Augmented Generation (RAG) depends on embeddings. When a RAG system needs to find relevant documents to pass to a language model, it embeds the user’s question and searches a database of embedded documents for the closest matches. Without embeddings, that retrieval step wouldn’t work.
- Recommendation systems use embeddings to represent both users and content, then find content whose embedding is close to a user’s embedding based on their behaviour and preferences.
- Duplicate detection uses embeddings to find records, support tickets, or documents that say the same thing in different words. This is something exact matching completely misses.
- Classification and clustering become much easier once data is in vector form. You can group thousands of items by topic, detect anomalies, or train classifiers on top of embeddings with relatively little effort.
Embeddings vs. What a Language Model Does
This is a common point of confusion, so it’s worth taking a moment to understand.
- A large language model like ChatGPT or Claude takes text in and produces text out. It generates.
- An embedding, by contrast, is not generated text. It’s a numerical representation. When you embed a sentence, nothing is written. A vector comes back.
The two things are related (language models use embedding-like representations internally as part of how they process text) but they serve different purposes and are used in different ways. A language model is for producing output. An embedding is for representing input in a form that’s useful for search, comparison, and retrieval.
Many AI systems use both. The embedding handles retrieval, finding the right information. The language model handles generation, turning that information into a useful response.
Storing and Searching Embeddings
One practical consideration worth knowing about is that embeddings need to be stored somewhere, and searching them efficiently at scale requires specialised infrastructure.
A list of 1536 numbers per document might sound manageable until you have a million documents. At that point, finding the nearest neighbours to a query embedding quickly (across millions of vectors) is a non-trivial engineering problem. This is what vector databases are built to solve. They’re optimised specifically for storing embeddings and performing fast similarity search across large collections of them.
For small projects, simpler solutions work fine. But for production systems at scale, the storage and retrieval infrastructure around embeddings matters almost as much as the embeddings themselves.
Conclusion
An embedding is a way of representing meaning as numbers so that computers can work with it mathematically. It sounds like a technical detail, but it’s actually one of the more important ideas in modern AI. It’s the foundation that makes semantic search, retrieval, recommendations, and a lot of other capabilities possible.
If you’ve ever used a search engine that seemed to understand what you were looking for rather than just matching your exact words, there’s a good chance embeddings were involved.