Hamming Distance
Binary comparison via XOR + popcount.
The intuition
Richard Hamming introduced this metric in 1950 while working on error-correcting codes at Bell Labs. The idea is simple: given two binary strings of the same length, count the number of positions where they differ.
If two strings are identical, Hamming distance is 0. If every bit differs, the distance equals the string length. It measures structural similarity at the most fundamental level — individual bits.
Why it's fast
The comparison requires exactly two CPU instructions:
- XOR — highlights every differing bit (produces 1 where bits differ, 0 where they match)
- popcount — counts the number of 1s in the result
Both are single-cycle operations on modern CPUs. Comparing two 384-byte binary vectors takes nanoseconds.
In Membot
Hamming distance carries 30% of the search weight. It operates on sign-zero encoded vectors — binary fingerprints derived from the full float embeddings. This approach is based on SimHash (Charikar, 2002), which proved that the Hamming distance on sign-encoded vectors correlates with cosine similarity.
The measured correlation: r = 0.891. This means 30% of the signal comes from a comparison that costs almost nothing computationally.
Population coding
Biological neurons also use population codes — patterns of activation across groups of neurons. The brain doesn't compare individual neuron firing rates; it compares patterns. Hamming distance on binary vectors is the computational analogue: comparing activation patterns, not magnitudes.
This biological parallel is why Hamming distance fits naturally into Membot's neuromorphic architecture.
Further reading
Sign-Zero Encoding — how vectors become bits
Cosine Similarity — the semantic complement
Neuromorphic Substrate — the physics layer