Representing data as vectors

Modern AI systems need a way to work with meaning in a form that computers can process efficiently. In this part of the syllabus, we move away from symbolic values like strings and toward numerical representations that allow comparison, clustering, and search. Vectors are the foundation that makes embeddings and similarity-based reasoning possible.

What a vector representation is

A vector is an ordered collection of numbers that represents something else. Instead of describing an object with words or labels, we describe it with numeric values arranged in a fixed order.

In Python, vectors are usually represented as lists or arrays of numbers. What matters is not the container, but the idea that each position in the vector has a specific meaning and the vector as a whole can be treated as a single unit of data.

vector = [0.12, -0.34, 0.89, 0.05]

This list is not meaningful on its own, but it becomes useful once we agree on what each position represents.

Why embeddings are used to represent meaning

Embeddings are vectors designed to capture meaning. Similar items are represented by vectors that are close together, while dissimilar items are far apart.

This allows programs to work with concepts like “similar text” or “related items” without hard-coded rules. Instead of comparing strings directly, we compare their vector representations.

The key idea is that meaning becomes something we can measure numerically, which is essential for search, retrieval, and ranking in AI-enabled systems.

Creating vector representations of data

Vectors can be created in many ways. Sometimes they are computed by mathematical formulas. In AI systems, they are often produced by models that transform input data into numeric form.

At a basic level, we can treat vectors as data produced by some embedding step and focus on how they are represented in Python.

planet_embedding = [0.21, 0.77, -0.15]
moon_embedding = [0.19, 0.74, -0.10]

Each list represents a piece of source data in a form that downstream code can process numerically.

Storing vectors alongside their source data

Vectors are rarely useful without the data they represent. In practice, we store them together so we can retrieve both the original item and its vector when needed.

A common pattern is to pair a vector with an identifier or record.

planet = {
    "name": "Mars",
    "embedding": [0.21, 0.77, -0.15]
}

This keeps the vector close to its source, making it easier to use later for comparison or retrieval.

Using vectors as inputs to downstream computations

Once data is represented as vectors, it can be fed into other computations. These might include similarity checks, clustering, ranking, or selection logic.

At this stage, the vector is treated as just another numeric input. The program no longer needs to know about the original text or object, only the vector that represents it.

embeddings = [
    [0.21, 0.77, -0.15],
    [0.19, 0.74, -0.10],
    [0.88, -0.02, 0.44]
]

This shift—from symbolic data to numeric vectors—is what enables many AI techniques to scale and generalize.

Conclusion

We now have a clear mental model of what vector representations are and why they matter. By representing data as vectors, we make meaning computable and reusable across many parts of an AI system. This orientation is enough to recognize how embeddings fit into real programs and prepares us to work with similarity and retrieval next.