ollama · run open models locally

Run large language models on your own machine.

Pull an open model, run it offline, and chat with it from your terminal or your app. No accounts, no API keys, no tokens billed by the thousand — your machine is the server.

$ ollama run llama3.2

what ollama does

One command. Any model. On your hardware.

Pull a model

Browse a catalog of open models — Llama, Mistral, Gemma, Qwen, DeepSeek — and download with a single command. Models live on your disk, not on someone else's server.

llama3.2 mistral gemma3

Talk to it

Chat from the terminal, or call the OpenAI-compatible local API from any language. Streaming, tool calls, and structured output work the same as the hosted runtimes your app already targets.

http://localhost:11434

Plug it in

Drop Ollama behind the agent framework, IDE plugin, or chat client you already use — LangChain, OpenWebUI, Continue, your homegrown CLI. The runtime stays out of the way.

langchain opencode continue
© Ollama · run open models locally v0.5.0 · changelog