← Back to overview

Hamming Distance

Binary comparison via XOR + popcount.

The intuition

Richard Hamming introduced this metric in 1950 while working on error-correcting codes at Bell Labs. The idea is simple: given two binary strings of the same length, count the number of positions where they differ.

If two strings are identical, Hamming distance is 0. If every bit differs, the distance equals the string length. It measures structural similarity at the most fundamental level — individual bits.

Why it's fast

The comparison requires exactly two CPU instructions:

  1. XOR — highlights every differing bit (produces 1 where bits differ, 0 where they match)
  2. popcount — counts the number of 1s in the result

Both are single-cycle operations on modern CPUs. Comparing two 384-byte binary vectors takes nanoseconds.

# XOR + popcount example A = 11010110 B = 10011010 A XOR B = 01001100 # bits that differ popcount = 3 # Hamming distance = 3

In Membot

Hamming distance carries 30% of the search weight. It operates on sign-zero encoded vectors — binary fingerprints derived from the full float embeddings. This approach is based on SimHash (Charikar, 2002), which proved that the Hamming distance on sign-encoded vectors correlates with cosine similarity.

The measured correlation: r = 0.891. This means 30% of the signal comes from a comparison that costs almost nothing computationally.

Population coding

Biological neurons also use population codes — patterns of activation across groups of neurons. The brain doesn't compare individual neuron firing rates; it compares patterns. Hamming distance on binary vectors is the computational analogue: comparing activation patterns, not magnitudes.

This biological parallel is why Hamming distance fits naturally into Membot's neuromorphic architecture.

Further reading

Sign-Zero Encoding — how vectors become bits
Cosine Similarity — the semantic complement
Neuromorphic Substrate — the physics layer