← Back to overview

Embeddings

Turning text into searchable vectors.

What is an embedding?

An embedding is a list of numbers that represents the meaning of a piece of text. "Dog" and "puppy" get similar lists. "Dog" and "quantum mechanics" get very different lists. The model learns these representations from vast amounts of text, encoding semantic relationships as geometric relationships — similar meanings become nearby points in a high-dimensional space.

The model

Membot uses nomic-ai/nomic-embed-text-v1.5:

The model runs on CPU. A single embedding takes a few milliseconds. Batch embedding (for building cartridges) processes thousands of texts per second.

Contrastive training

Embedding models are trained on pairs of related texts. The training objective: make related texts produce similar vectors and unrelated texts produce dissimilar vectors. This is called contrastive learning.

The result is a vector space where geometric proximity encodes semantic similarity. You can search this space using cosine similarity (angle between vectors) or Hamming distance (on binarized versions).

Important constraint

The model used to build a cartridge must be the same model used to search it. Different models produce different vector spaces — "dog" might point in completely different directions. Membot records the model name in the cartridge manifest to prevent mismatches.

Further reading

Cosine Similarity — measuring distance in vector space
Sign-Zero Encoding — compressing embeddings to bits
Neuromorphic Substrate — the physics layer on top