lilbee v0.6.66b
MIT License 100% coverage typed

A local AI search engine for your files, your code, and the web. lilbee finds, runs, and manages the models, then talks to everything you point it at. Every answer cites the source, it all runs on your own machine, and it's one program with nothing else to set up.

A built-in model manager browses Hugging Face, downloads models, gives each one a role, and runs them on your own GPU, on Metal, Vulkan, or CUDA. Any coding agent can also use it over MCP.

one program, one install terminal app command-line tool MCP server web API Python library

It runs on your hardware. Your files stay on disk, and a cloud model runs only when you pick one. No Docker, no servers, no web stack to maintain, just one file you run on demand.

see it work

Index lilbee's own README, ask "what is lilbee in one sentence?", and get a cited answer drawn straight from the source.

install

macOSLinuxWindows
$ pip install --pre lilbee
NVIDIA · CUDA
$ pip install --pre lilbee --extra-index-url https://lilbee.sh/cu125/
older CPU · no AVX2
$ pip install --pre lilbee 'lancedb==0.33.0+compat' --extra-index-url https://lilbee.sh/compat/
optional extras
For a pip / uv install, add the name in brackets, e.g. 'lilbee[crawler,litellm]'. The binary, Homebrew, AUR, Nix, Docker, Flatpak, and Snap builds bundle all three already. lilbee works without them.
[crawler]index websites too: crawl a docs site or wiki to markdown, then search it offline
[litellm]bridge to popular hosted model providers for chat, vision, or embeddings; you bring the key, the terminal app flags when one's active
[graph]concept-graph search: finds matches plain keyword search misses, with no extra model calls

every release is a pre-release. latest on PyPI →  .  full install guide →

it's early. a ★ on GitHub helps people find it; bug reports and issues are very welcome.

your hardware, put to work

Your machine can do a lot more than you're using it for. lilbee runs local models on hardware you already own. No token budgets, no provider to depend on; the cloud's there when you want it.

one program

It's the model, the search through your files, and the chat, all in one program. Run it when you want, close it when you're done; by default nothing's left running in the background, no container to keep alive. Want something long-running? Use the command line and manage it yourself.

the usual local-AI setup
  • a model server, always running
  • model files fetched by hand
  • a vector database to stand up
  • code to connect them
  • a separate app for the interface
  • often a container around it all
a deployment to stand up and keep alive.
lilbee
  • the model runtime (llama.cpp) and the vector index (LanceDB) run inside lilbee, not as separate services to stand up
  • use it as a full-screen terminal app, a command-line tool, a Model Context Protocol server, a web API, or a Python library
  • a built-in model catalog: browse and pull straight from Hugging Face Hub, no hunting for model files yourself
  • a scoped library per project, so each domain stays its own clean encyclopedia
  • runs on a laptop or headless over a remote shell; move it between machines
one install command, sane defaults. point it at a folder, ask.

what it does

runs local AI models itself

Browse Hugging Face, pull a model, assign it to a role. lilbee runs it on Metal, Vulkan, or CUDA; you never point it at a server you set up.

a real retrieval pipeline (RAG)

A real search engine, built on published research: it ranks results by how well they answer you, so the best match comes back first. 50+ settings to tune.

search your files, code, and PDFs

Point it at a folder: your man pages, a pile of PDFs, your notes, a codebase. Then talk to them. Every answer tells you the file and line. Each project gets its own library, so nothing bleeds across.

chunked so it makes sense

Prose and code get chunked differently, so each piece keeps its meaning instead of getting cut mid-thought. A search engine is only as good as the chunks underneath it.

MCP server for your coding agent

Pair it with your favorite agent over MCP. It reads the real code and docs before it answers, cites the file and line, and says "I don't know" instead of guessing.

crawl websites for offline search

Crawl a docs site or a wiki, turn it into markdown, and keep it. Search and chat with it offline, even after it goes down.

scans & OCR

Old scans and photos go through OCR or a local vision model and come out as searchable markdown, layout intact.

remembers what you tell it

Turn on memory and lilbee holds onto durable facts about you and how you like your answers, then recalls the relevant ones on later turns, no matter which conversation they came from. Off by default, and it never leaks into your citations.

a note on answers

Answers are only as good as the model you pick and the settings behind it. lilbee ships sane defaults, but exposes 50+ settings you can tune: search, the answers, how your files get read.

go deeper

built on

Kreuzberg · llama.cpp · llama-cpp-python · Hugging Face Hub · huggingface_hub · LanceDB · tree-sitter · tree-sitter-language-pack · crawl4ai · Playwright · Tesseract · LiteLLM · Textual · Litestar · MCP Python SDK · Typer · Pydantic · Nuitka

lilbee  .  MIT License