Computing similarity

Once data has been represented as vectors, the next practical question is how to tell whether two pieces of data are related. In AI-enabled systems, this usually means comparing vectors to estimate how similar their meanings are. Similarity calculations are the bridge between raw embeddings and useful behaviors like matching, retrieval, and ranking.

What similarity means in a vector space

When data is represented as vectors, each vector occupies a position in a multi-dimensional space. Similarity is about how close or aligned two vectors are within that space. Vectors that represent related content tend to cluster together, while unrelated vectors tend to be farther apart.

Common similarity measures

Several mathematical measures are commonly used to compare vectors. Cosine similarity focuses on the angle between vectors rather than their length. Euclidean distance measures straight-line distance between points. Dot products measure alignment directly. In practice, cosine similarity is widely used for embeddings because it reflects semantic closeness well.

Computing similarity scores between vectors

Similarity measures produce a numeric score. That score represents how similar two vectors are according to the chosen measure. Higher scores usually mean greater similarity, while lower scores indicate weaker relationships.

import numpy as np

a = np.array([0.2, 0.4, 0.6])
b = np.array([0.1, 0.3, 0.7])

similarity = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

Comparing one vector against many others

In real programs, a single vector is often compared against a collection of vectors. This allows a program to find which stored items are closest to a query vector. The result is a set of similarity scores that can be inspected, filtered, or sorted.

import numpy as np

query = np.array([0.2, 0.4, 0.6])
vectors = [
    np.array([0.1, 0.3, 0.7]),
    np.array([0.9, 0.1, 0.0]),
    np.array([0.2, 0.5, 0.5]),
]

scores = [
    np.dot(query, v) / (np.linalg.norm(query) * np.linalg.norm(v))
    for v in vectors
]

Interpreting similarity scores programmatically

Similarity scores are not meaningful on their own. Programs interpret them relative to one another. A higher score indicates a closer match compared to lower-scoring items. These scores can be used to select top matches, apply thresholds, or rank results for downstream logic.

Conclusion

By computing similarity between vectors, we gain a concrete way to compare meaning numerically. This turns abstract embeddings into actionable signals that programs can sort, filter, and reason about. At this point, similarity is no longer theoretical—it is a practical tool ready to be used in real systems.