gleann exposes a full OpenAPI-compatible REST API for programmatic index management, semantic search, RAG queries, and memory operations.
Default: http://localhost:8080. Configure via server_addr in config.json
or GLEANN_SERVER_ADDR environment variable.
Create a new vector index from a list of text passages.
{
"name": "my-docs",
"texts": ["passage one", "passage two"],
"metadata": [{"source": "doc.md", "section": "intro"}]
}
List all available indexes with metadata including passage count, embedding model, and backend.
Remove an index and all associated embeddings. Irreversible.
Semantic vector search with optional BM25 hybrid reranking.
{
"query": "Byzantine fault tolerant consensus",
"top_k": 10,
"hybrid_alpha": 0.7,
"filters": [{"field": "source", "op": "contains", "value": "protocol"}]
}
Store facts, summaries, or any text content into the hierarchical memory system.
{
"content": "The user prefers concise technical answers with code examples",
"tier": "long",
"tags": ["preference", "style"],
"project": "my-project"
}
Retrieve relevant memories using semantic similarity and tier/tag filters.
gleann implements the OpenAI Chat Completions API, allowing any OpenAI-compatible client
to use gleann indexes as knowledge sources by setting model to gleann/{index-name}.
POST /v1/chat/completions
{
"model": "gleann/my-docs",
"messages": [{"role": "user", "content": "What is the token bucket algorithm?"}]
}
Configure an API key via GLEANN_API_KEY environment variable.
When set, all API requests must include Authorization: Bearer {key}.