A local AI search engine for your files, your code, and the web. lilbee finds, runs, and manages the models, then talks to everything you point it at. Every answer cites the source, it all runs on your own machine, and it's one program with nothing else to set up.
A built-in model manager browses Hugging Face, downloads models, gives each one a role, and runs them on your own GPU, on Metal, Vulkan, or CUDA. Any coding agent can also use it over MCP.
It runs on your hardware. Your files stay on disk, and a cloud model runs only when you pick one. No Docker, no servers, no web stack to maintain, just one file you run on demand.
see it work
Index lilbee's own README, ask "what is lilbee in one sentence?", and get a cited answer drawn straight from the source.
First run: a setup wizard pulls a chat model and an embedder, then drops you straight into chat.
One-minute sweep through every screen: setup wizard, chat with citations, model catalog, settings, task center, palette.
Streaming replies with clickable citations back to the file and line.
/add <path> copies a file or folder into your library and indexes it; ask questions while it syncs.
/crawl <url> fetches a page into your library, then answers against it with a page citation.
Recursive crawl (depth 1) of a whole site: hundreds of Wikipedia pages indexed (fast-forwarded), then one multipart question synthesized across them, cited, with Qwen3-8B and a reranker.
Browse models on Hugging Face Hub, pull one live, switch a role without leaving the terminal.
Already running Ollama? Point lilbee at it. The catalog labels the model "ollama"; index the Crown Victoria manual on camera, then get a cited answer.
Same flow with LM Studio: lms ls shows the model, the catalog labels it "lm studio", and lilbee answers from the manual with a citation.
50+ settings: search depth, reranking, sampling, parsers. Sane defaults; tune the moment you want to.
lilbee talking to lilbee. An agent indexes lilbee's own source through lilbee's MCP server, then answers questions about how lilbee works, with file:line citations.
Same setup, talking to a PDF: the agent asks lilbee to index cv-manual.pdf, then builds a fuse table with page citations.
Ctrl+P opens the command palette; /help opens the slash-command catalog. Every action is discoverable.
Agent indexes a small Godot 4 pathfinding subset live, then cites methods method-by-method against the local files.
Same shape against the full Godot class reference (810 XMLs, pre-indexed). Cited codegen for a procedural level generator.
install
needs Python 3.11+ . intel mac: add --extra-index-url https://lilbee.sh/cpu/ . extras below, e.g. pip install --pre 'lilbee[crawler,litellm]'
uv fetches a Python for you if needed . extras: uv tool install --prerelease=allow 'lilbee[crawler]'
prebuilt bundle, built with Nuitka: its own Python interpreter and llama.cpp backend, nothing for you to compile . clears macOS quarantine for you . the [crawler] / [litellm] / [graph] extras are already included
package lilbee on the AUR . works with yay / pacaur / any helper . wraps the Linux x86_64 release binary, so the [crawler] / [litellm] / [graph] extras are already included
no compat image; run the self-contained lilbee-compat binary instead.
image on the GitHub Container Registry . data lives at /home/lilbee/data, REST API on port 8000 . wraps the release binary, so the [crawler] / [litellm] / [graph] extras are already included
flake at github:tobocop2/lilbee . wraps the release binary, so the [crawler] / [litellm] / [graph] extras are already included . on Linux it bundles glibc and the Vulkan loader via an FHS env so it runs on bare NixOS
run with flatpak run io.github.tobocop2.lilbee, worth an alias . needs the Flathub remote for the runtime . flatpak update picks up new releases . wraps the release binary, so the [crawler] / [litellm] / [graph] extras are already included
--dangerous just means sideloaded and unsigned . no auto-update: rerun the same command to upgrade . wraps the release binary, so the [crawler] / [litellm] / [graph] extras are already included
scoop install lilbee auto-installs the CUDA build when an NVIDIA driver is present.
Windows package manager . scoop update lilbee upgrades . wraps the release binary, so the [crawler] / [litellm] / [graph] extras are already included
Single self-contained binary, compiled with Nuitka. Bundles its own Python runtime and the [crawler] / [litellm] / [graph] extras. Click to download, or use the terminal one-liner.
unsigned: the macOS arm64 and Windows builds aren't code-signed. The macOS one-liner clears quarantine for you; Homebrew does the same. On Windows, SmartScreen may warn the first time.
older CPU on Windows also via scoop install lilbee-compat . CUDA cu124 / cu121 for older drivers on the releases page
driven via Vulkan by default; for the CUDA path use pip or the binary.
add --extra-index-url https://lilbee.sh/compat/ and pin lancedb==0.33.0+compat to your sync.
for hacking on it or contributing . needs git and uv
pip / uv install, add the name in brackets, e.g. 'lilbee[crawler,litellm]'. The binary, Homebrew, AUR, Nix, Docker, Flatpak, and Snap builds bundle all three already. lilbee works without them.your hardware, put to work
Your machine can do a lot more than you're using it for. lilbee runs local models on hardware you already own. No token budgets, no provider to depend on; the cloud's there when you want it.
one program
It's the model, the search through your files, and the chat, all in one program. Run it when you want, close it when you're done; by default nothing's left running in the background, no container to keep alive. Want something long-running? Use the command line and manage it yourself.
- a model server, always running
- model files fetched by hand
- a vector database to stand up
- code to connect them
- a separate app for the interface
- often a container around it all
- the model runtime (llama.cpp) and the vector index (LanceDB) run inside lilbee, not as separate services to stand up
- use it as a full-screen terminal app, a command-line tool, a Model Context Protocol server, a web API, or a Python library
- a built-in model catalog: browse and pull straight from Hugging Face Hub, no hunting for model files yourself
- a scoped library per project, so each domain stays its own clean encyclopedia
- runs on a laptop or headless over a remote shell; move it between machines
what it does
runs local AI models itself
Browse Hugging Face, pull a model, assign it to a role. lilbee runs it on Metal, Vulkan, or CUDA; you never point it at a server you set up.
a real retrieval pipeline (RAG)
A real search engine, built on published research: it ranks results by how well they answer you, so the best match comes back first. 50+ settings to tune.
search your files, code, and PDFs
Point it at a folder: your man pages, a pile of PDFs, your notes, a codebase. Then talk to them. Every answer tells you the file and line. Each project gets its own library, so nothing bleeds across.
chunked so it makes sense
Prose and code get chunked differently, so each piece keeps its meaning instead of getting cut mid-thought. A search engine is only as good as the chunks underneath it.
MCP server for your coding agent
Pair it with your favorite agent over MCP. It reads the real code and docs before it answers, cites the file and line, and says "I don't know" instead of guessing.
crawl websites for offline search
Crawl a docs site or a wiki, turn it into markdown, and keep it. Search and chat with it offline, even after it goes down.
scans & OCR
Old scans and photos go through OCR or a local vision model and come out as searchable markdown, layout intact.
remembers what you tell it
Turn on memory and lilbee holds onto durable facts about you and how you like your answers, then recalls the relevant ones on later turns, no matter which conversation they came from. Off by default, and it never leaks into your citations.
a note on answers
Answers are only as good as the model you pick and the settings behind it. lilbee ships sane defaults, but exposes 50+ settings you can tune: search, the answers, how your files get read.
go deeper
built on
Kreuzberg · llama.cpp · llama-cpp-python · Hugging Face Hub · huggingface_hub · LanceDB · tree-sitter · tree-sitter-language-pack · crawl4ai · Playwright · Tesseract · LiteLLM · Textual · Litestar · MCP Python SDK · Typer · Pydantic · Nuitka