Six independent signals, one comparison page
Given any artwork in the Rijksmuseum collection, find_similar
discovers related works through six orthogonal similarity signals —
visual appearance, catalogue description, Iconclass subject codes, creator lineage,
depicted persons, and depicted places — then renders an interactive comparison
page with IIIF thumbnails. Each signal captures a different dimension of “similar”,
and artworks that appear across three or more signals are pooled as the strongest matches.
When a researcher says “find artworks similar to The Night Watch”, they could mean any of several things: works that look similar, works that depict the same people, works from the same artistic circle, or works that share the same subject matter. No single similarity metric captures all of these.
Each signal uses a different data source and scoring algorithm. They run concurrently and produce independent result lists, which are then rendered side-by-side.
| Signal | Data source | Scoring | Coverage |
|---|---|---|---|
| Visual | Rijksmuseum image similarity API | Rank order (external model) | ~87% |
| Description | 511K Dutch catalogue descriptions → e5-small 384d int8 embeddings | Cosine similarity (1 − distance) | ~61% |
| Iconclass | 658K artworks with Iconclass notation codes | depth × IDF per shared notation | ~79% |
| Lineage | 208K artworks with attribution qualifiers (after, workshop of, circle of …) | qualifier strength × creator IDF | ~25% |
| Person | 217K artworks with depicted person metadata | IDF per shared person | ~26% |
| Place | 116K artworks with depicted place metadata (broad regions excluded) | IDF per shared place | ~14% |
Not every artwork produces results for every signal. An artwork with no depicted persons will have an empty Person row; a primary attribution (no qualifier) will have an empty Lineage row. The comparison page only shows signal rows that produced results.
Calls the Rijksmuseum’s own visual search API, which uses an internal image
embedding model. The object number is first resolved to an internal node ID, then
passed to the /api/v1/collection/visualsearch endpoint. Best-effort:
API failures are silently skipped. Results are rank-ordered (no numeric similarity score).
Each of 511K artworks has a Dutch catalogue description embedded with
intfloat/multilingual-e5-small (384 dimensions, int8 quantized).
At query time, the embedding of the query artwork is looked up and used for
KNN search via sqlite-vec (vec0 virtual table). Score is cosine
similarity (1 − distance). Artworks with similar compositional
vocabulary (“links een X, rechts een Y”) cluster together.
Finds artworks sharing the same
Iconclass notation codes.
Deeper codes (more specific subjects) that appear on fewer artworks contribute more —
the score is depth × IDF (inverse document frequency).
Generic top-level notations like “historical persons” (61B2, 90K artworks) are
noise-filtered. Single shallow matches (depth < 5) are pruned to avoid overly broad hits.
45(+26) "citizen militia",
45C1 "weapons", 48C7341 "drum"45(+26) score highestFinds artworks that share visual-style lineage through attribution qualifiers. Works “after” the same artist, from the same “workshop”, “attributed to” the same hand, or in the same “circle” are connected. Score is weighted by qualifier strength × creator IDF: rarer creators contribute more.
| Qualifier | Weight | Meaning |
|---|---|---|
| after / copyist of | 3.0× | direct visual derivation |
| workshop of | 2.0× | produced in the master’s studio |
| attributed to | 1.5× | probably by this artist |
| circle of / follower of | 1.0× | stylistic sphere of influence |
Finds artworks that depict the same named individuals. Scoring uses IDF: people who appear in fewer artworks contribute more. A rarer figure like Jan Pietersen Bronchorst scores higher than a frequently depicted person. Results link to Wikidata where available.
Known limitation: Due to a harvest-level classification issue (#145), 74 organisations and institutions (e.g. the VOC, Binnenhof) are currently misclassified as depicted persons in the vocabulary database. These entities may appear in Person signal results. A fix is tracked but requires a database update.
Finds artworks depicting the same specific locations — streets, buildings, monuments, waterways. Broad administrative regions (countries, provinces, and cities with >20 child places in the hierarchy) are excluded to avoid matching everything in Amsterdam or the Netherlands. Scoring uses place IDF.
Artworks that appear in three or more signal modes are pooled into a combined row at the top of the comparison page. These are the works most deeply connected to the query artwork — similar in multiple independent dimensions simultaneously.
Any individual signal can produce false positives: the description signal may match works with similar Dutch phrasing but unrelated subjects; Iconclass may match on a generic shared code; visual similarity may surface works with similar colour palettes but no thematic connection.
When three or more signals independently identify the same artwork, the connection is unlikely to be noise. A work that is visually similar, shares subject codes, and depicts the same people is almost certainly a meaningful match — a copy, a companion piece, or a closely related work.
The pooled row shows which signals matched for each artwork via coloured mode badges (V Desc IC Lin Per Pl), sorted by the number of matching signals.
When find_similar is called, all six signals run concurrently.
The visual signal is best-effort (external API); the other five are local database queries.
Results are assembled into an HTML page and served via a time-limited URL.
Look up artwork metadata, types, descriptions, and IIIF image ID
Run all signals concurrently: 5 local DB queries + 1 external API call
Identify artworks appearing in 3+ signals, sort by count
Generate self-contained page with IIIF thumbnails, signal rows, metadata
HTTP: /similar/:uuid (30-min TTL). Stdio: temp file
The HTML page is a self-contained document with no external dependencies. It displays the query artwork with its full metadata, followed by horizontal scroll rows for each signal that produced results.
IIIF thumbnail (300px), title, creator, date, type badge. Below: description (Dutch), Iconclass codes (linked to iconclass.org), lineage qualifiers (linked to Getty AAT), depicted persons and places (linked to Wikidata).
One horizontal scroll strip per signal, in fixed order: Visual → Lineage → Iconclass → Description → Person → Place. Each row has a coloured header, methodology note, and result cards. Empty rows are hidden.
200px-wide cards with IIIF thumbnails (3:4 aspect ratio), rank + score, title (2-line clamp), creator. Detail line varies by signal: Iconclass codes, qualifier + creator, description snippet, or shared person/place names.
Artworks in 3+ signals, sorted by count descending. Each card shows coloured mode badges indicating which signals matched. This row appears first when present, highlighting the strongest connections.
Three artworks from different periods illustrating how different signal combinations reveal different kinds of connections.
48C513).
Each signal uses a scoring formula appropriate to its data. All vocabulary-based signals use IDF (inverse document frequency) as a core building block: terms that appear on fewer artworks get higher weights, preventing generic matches from dominating results.
| Signal | Formula | Precision | Notes |
|---|---|---|---|
| Visual | (rank order) | — | No numeric score; ordered by the external API |
| Description | 1 − cosine_dist | 3 dp | Embedding similarity, normalised to [0, 1] |
| Iconclass | depth × ln(N/df) | 1 dp | Deeper + rarer notations score higher |
| Lineage | strength × ln(N/df) | 2 dp | “After” (3×) beats “circle of” (1×) |
| Person | ln(N/df) | 2 dp | Rare people score higher |
| Place | ln(N/df) | 2 dp | Broad regions (>20 children) excluded |
The comparison page is generated server-side as self-contained HTML. Two serving modes support both Claude Desktop (stdio, file-based) and claude.ai (HTTP, URL-based).
The page is stored in a module-scope similarPages Map,
keyed by a random UUID. The route GET /similar/:uuid
serves the HTML. A 30-minute TTL sweeper runs every 60 seconds,
deleting pages not accessed within the window.
The page is written to os.tmpdir() as a local HTML file.
Same 30-minute TTL sweep applies. The LLM returns the file path for the
user to open in a browser.
The tool is enabled by default. Set ENABLE_FIND_SIMILAR=false
to disable it (e.g. in environments without the vocabulary database).
When disabled, the tool is not registered and does not appear in the tool list.
Requires the vocabulary database (1.1 GB) for all DB-based signals, plus the embeddings database (997 MB) for the Description signal. Visual signal needs internet access. Total: ~2.2 GB across three SQLite files, all memory-mapped.