Cosine Similarity
Measuring meaning by measuring angles.
The intuition
Imagine two arrows in a high-dimensional space. Each arrow represents the meaning of a piece of text — its embedding vector. Cosine similarity measures the angle between those arrows, ignoring their length. Two texts about the same topic point in roughly the same direction, so the angle between them is small and the cosine is close to 1.
It doesn't matter how long the vectors are (how many words, how much emphasis). What matters is the direction — the shape of meaning.
The math
In practice, embedding models produce normalized vectors, so the magnitudes are 1 and cosine similarity reduces to a simple dot product.
In Membot
Cosine similarity carries 70% of the search weight. It is the semantic backbone — it understands that "canine" and "dog" are related even though they share no characters. The embeddings come from Nomic Embed, a local 768-dimensional model that runs without API calls.
Why not cosine alone?
Cosine captures meaning but can miss exact keywords. A query for "error code 4012" might match texts about errors in general but miss the one that contains the exact code. That's why Membot blends cosine with Hamming distance (which provides a fast, coarse semantic check on binary signatures) and keyword boosting (which catches exact term matches).
Further reading
Embeddings — how text becomes vectors
Hamming Distance — the binary complement
Sign-Zero Encoding — compressing vectors to bits