AND / OR / NOT, match exact phrases, require words near each other, and grow word-stems with * — with three real examples of what it unlocks.
textQuery would let you (or an LLM searching for you) compose real boolean, phrase, proximity and prefix searches — safely — over the catalogue's free-text fields.
Our collection search runs on SQLite's full-text engine, FTS5, which has its own little query language: AND, OR, NOT, proximity (NEAR), quoted phrases, and prefixes like paint*. But the server doesn't let any of it through. Before your words reach the database we strip out the operators and wrap everything in quotes, so the whole thing is matched as one exact phrase. Your text becomes pure data to look up, never instructions for how to search.
This is a deliberate safety choice: raw FTS5 input can crash on a stray bracket, silently change the meaning of ordinary words like "and", or let callers poke at the internal shape of our index. Force-quoting kills all three risks at once. The cost is expressiveness — and there are four common things a researcher simply cannot ask for today:
sculp* to catch sculpsit, sculptor, sculp. all at once
The current workaround is our separate flat filters (title, description, inscription, curatorialNarrative), which are simply combined with AND, plus semantic_search for fuzzy meaning. That covers a great deal — but not precise, composed text logic.
The fix (detailed in #363) keeps the user's words inert — always quoted, always stripped of operators, exactly as today — but lets the structure around them be built by our own code from a small, validated request. The user's text only ever lands inside a safe quoted slot; the logic is ours, so it can never be malformed. Here are the building blocks, each with a tiny example:
{ should: [ A, B ] } → "A or B". This is the only way to search two different fields with an either/or, which the flat filters can't do.
must requires every clause; mustNot excludes. { must: [A], mustNot: [B] } → "A but not B".
{ phrase: "cum privilegio" } matches those two words in that order; { any: ["x", "y"] } matches either word anywhere.
{ near: { terms: ["gesigneerd", "gedateerd"], distance: 4 } } → the two words within four words of each other.
{ prefix: "excud" } matches excudit, excud., exc. — useful when the same idea is spelled many ways.
The catalogue's long-form text lives in three quite different corpora, which is exactly why composed queries help — a concept may be phrased one way in one field and another way (or another language) in the next.
| Field | Roughly | Character |
|---|---|---|
description | ~512,000 works | Dutch, denotative — what is shown, the materials, the signature facts |
curatorialNarrative | ~14,000 works | Mostly English, interpretive — meaning, theme, the story behind the work |
inscription | ~502,000 works | Literal transcription — the words actually written on the object |
Each example below leans on a different building block, and each reaches works the flat filters and semantic_search genuinely cannot. (These appear as scenarios 26–28 in docs/research-scenarios.md.)
"Find works about the Beeldenstorm — the 1566 wave of iconoclasm — whether a Dutch cataloguer called it beeldenstorm in the description, or an English curator called it iconoclasm in the wall text."
textQuery: {
should: [ { field: "description", phrase: "beeldenstorm" },
{ field: "curatorialNarrative", any: ["iconoclasm","iconoclastic"] } ],
mustNot: [ { field: "title", phrase: "geschiedenis" } ]
}
The two corpora describe the same event with different words, in different languages, and the matches barely overlap. Today's flat filters can only AND the two fields — giving the near-empty overlap — never the union. The should (either/or across fields) is the missing piece.
"Find paintings the museum describes as both signed and dated in the same breath — the strongest statement that a work is genuinely by the artist's own hand."
textQuery: { field: "description",
near: { terms: ["gesigneerd","gedateerd"], distance: 4 } }
A description that mentions a signature in one place and a date somewhere else is not the same as one asserting both together ("gesigneerd … en gedateerd 1749"). Requiring the two Dutch words to sit close drops the loose coincidences. Neither vocabulary filters nor semantic_search can demand that two words be near each other.
"Old prints name their makers on the plate in Latin: who designed it (invenit), who engraved it (sculpsit/fecit), who published it (excudit). Find prints that record all three roles."
textQuery: { field: "inscription",
must: [ { anyPrefix: ["inven","delineav"] }, // designer
{ anyPrefix: ["sculp","incid"], any: ["fecit"] }, // engraver
{ prefix: "excud" } ] } // publisher
The same role is written many ways — sculpsit, sculptor, Sculpt., sculp. — because engravers abbreviated to fit the margin. A prefix like sculp* finds many times more works than the exact word sculpsit alone. The structured "production role" filter can't help here: it normalises everything into a few tidy English labels, so it loses the literal Latin and can't spot one person doing two jobs ("sculptor et excudit"). The inscription is the original source; this reads it directly.
expandFtsQuery takes one word and builds a safe ("paint" OR "paints" OR "painted") group — each variant individually quoted, the OR supplied by us. A structured textQuery just generalises that pattern: walk the small request, quote every leaf word, and emit the operators from our own code. The user's words can never break out of their quotes.
Just stop stripping operators; let clients send FTS5 query text directly.
~10 lines. But it re-opens every risk — crashes, confusion, index probing. Not recommended.
Accept the small { should, must, mustNot, near, prefix, field } request and compile it into a guaranteed-safe query.
~80–150 lines. User words stay inert; we generate only well-formed queries. As a bonus, combining everything into one match means relevance ranking finally covers the whole text query, not just the first field.
It would run entirely on the full-text indexes we already ship — no Elasticsearch, no new database, no re-harvest. It would be added as an opt-in field, so the plain string search stays exactly as it is.
semantic_search already satisfy it?docs/research-scenarios.md ·
Code: src/utils/db.ts — escapeFts5, escapeFts5Token, expandFtsQuery;
src/api/VocabularyDb.ts — text-FTS filter block (~lines 4423–4459).
The four full-text indexes already shipped: vocabulary_fts, artwork_texts_fts, person_names_fts, entity_alt_names_fts.