AI Skill Hub 推荐使用:轻量代理 是一款优质的Agent工作流。AI 综合评分 7.5 分,在同类工具中表现稳健。如果你正在寻找可靠的Agent工作流解决方案,这是一个值得深入了解的选择。
高性能轻量代理和负载均衡器,适用于LLM基础设施
轻量代理 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
高性能轻量代理和负载均衡器,适用于LLM基础设施
轻量代理 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:go install(推荐) go install github.com/thushan/olla@latest # 方式二:从源码编译 git clone https://github.com/thushan/olla cd olla go build -o olla . # 方式三:下载预编译二进制 # 访问 Releases 页面下载对应平台二进制文件 # https://github.com/thushan/olla/releases
# 查看帮助 olla --help # 基本运行 olla [options] <input> # 详细使用说明请查阅文档 # https://github.com/thushan/olla
# olla 配置说明 # 查看配置选项 olla --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export OLLA_CONFIG="/path/to/config.yml"
Olla is a high-performance, low-overhead, low-latency proxy and load balancer for managing LLM infrastructure. It intelligently routes LLM requests across local and remote inference nodes with a wide variety of natively supported endpoints and extensible enough to support others. Olla provides model discovery and unified model catalogues within each provider, enabling seamless routing to available models on compatible endpoints.
Olla works alongside API gateways like LiteLLM or orchestration platforms like GPUStack, focusing on making your existing LLM infrastructure reliable through intelligent routing and failover. You can choose between two proxy engines: Sherpa for simplicity and maintainability or Olla for maximum performance with advanced features like circuit breakers and connection pooling.

Single CLI application and config file is all you need to go Olla!
For Large GPU deployments, Enterprise & Data-Centre use, see TensorFoundry FoundryOS.
```bash
docker run -t \ --name olla \ -p 40114:40114 \ ghcr.io/thushan/olla:latest
go install github.com/thushan/olla@latest
bash
git clone https://github.com/thushan/olla.git && cd olla && make build-release
git clone https://github.com/thushan/olla.git && cd olla && make docker-build-local docker run -p 40114:40114 ghcr.io/thushan/olla:local ```
We've also got ready-to-use Docker Compose setups for common scenarios:
In-depth guides on the TensorFoundry blog:
curl http://localhost:40114/internal/status/endpoints
Olla's Anthropic Messages API support (v0.0.20+) is enabled by default, allowing you to use CLI tools like Claude Code with local AI models on your machine via /olla/anthropic. It operates in two modes depending on your backend:
Still actively being improved -- please report any issues or feedback.
We have examples for:
Learn more about Anthropic API Translation.
Complete setup with OpenWebUI + Olla load balancing multiple Ollama instances or unify all OpenAI compatible models.
- See: examples/ollama-openwebui/ - Services: OpenWebUI (web UI) + Olla (proxy/load balancer) - Use Case: Web interface with intelligent load balancing across multiple Ollama servers with Olla - Quick Start:
cd examples/ollama-openwebui
# Edit olla.yaml to configure your Ollama endpoints
docker compose up -d
# Access OpenWebUI at http://localhost:3000
You can learn more about OpenWebUI Ollama with Olla or see OpenWebUI OpenAI with Olla.
高性能轻量代理,适用于LLM基础设施优化
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
总体来看,轻量代理 是一款质量良好的Agent工作流,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。
| 原始名称 | olla |
| 原始描述 | 开源AI工作流:High-performance lightweight proxy and load balancer for LLM infrastructure. Int。⭐238 · Go |
| Topics | AIGo代理负载均衡 |
| GitHub | https://github.com/thushan/olla |
| License | Apache-2.0 |
| 语言 | Go |
收录时间:2026-06-06 · 更新时间:2026-06-08 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端