经 AI Skill Hub 精选评估,LLM-D基准测试 获评「推荐使用」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 7.5 分,适合有一定技术背景的用户使用。
LLM-D基准测试 是一款基于 Python 开发的开源工具,专注于 AI、benchmark、Python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
LLM-D基准测试 是一款基于 Python 开发的开源工具,专注于 AI、benchmark、Python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install llm-d-benchmark
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install llm-d-benchmark
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/llm-d/llm-d-benchmark
cd llm-d-benchmark
pip install -e .
# 验证安装
python -c "import llm_d_benchmark; print('安装成功')"
# 命令行使用
llm-d-benchmark --help
# 基本用法
llm-d-benchmark input_file -o output_file
# Python 代码中调用
import llm_d_benchmark
# 示例
result = llm_d_benchmark.process("input")
print(result)
# llm-d-benchmark 配置文件示例(config.yml) app: name: "llm-d-benchmark" debug: false log_level: "INFO" # 运行时指定配置文件 llm-d-benchmark --config config.yml # 或通过环境变量配置 export LLM_D_BENCHMARK_API_KEY="your-key" export LLM_D_BENCHMARK_OUTPUT_DIR="./output"
| Google Kubernetes Engine | Coreweave Kubernetes Services | OpenShift | |
|---|---|---|---|
| Standalone | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-standalone.yaml) | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-standalone.yaml) | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-standalone.yaml) |
| Modelservice | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-modelservice.yaml) | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-modelservice.yaml) | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-modelservice.yaml) |
| Fast Model Actuator | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-fma.yaml) | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-fma.yaml) | [](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-fma.yaml) |
| Kustomize | NA | NA | NA |
This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.
[!TIP] We acknowledge many users are still utilizing our previous (now deprecated) library, and to make the transition easier, we still have that library available. It can be found in our v0.5.2 version tag.
Please refer to the official llm-d prerequisites for the most up-to-date requirements. For the client setup, the provided install.sh will install the necessary tools.
Deploying the llm-d stack requires cluster-level admin privileges, as you will be configuring cluster-level resources. However, the scripts can be executed by namespace-level admin users, as long as the Kubernetes infrastructure components are configured and the target namespace already exists.
Please refer to the official llm-d prerequisites for the most up-to-date requirements.
- Python 3.11+ - kubectl -- Kubernetes CLI - helm (>= 4.x) -- Helm package manager - curl, git -- Standard system tools - helmfile (>= 1.5) -- Required for modelservice deployments. Older helmfile is incompatible with Helm 4 (it probes helm with the removed helm version --client flag and panics). ./install.sh installs the pinned Helm 4 / helmfile combination for you. - kustomize, jq, yq -- Required for template rendering - skopeo, crane -- Required for container image management - oc (optional) -- Required for OpenShift clusters (either kubectl or oc must be present)
[!IMPORTANT] Deploying the llm-d stack requires cluster-level admin privileges for configuring cluster-level resources. Namespace-level admin users can run the tool if Kubernetes infrastructure components are configured and the target namespace already exists. Use --non-admin to skip admin-only steps.
The install script supports both uv and the standard python3 -m venv for virtual environment creation. When run interactively, it will prompt you to choose; in non-interactive mode (e.g. curl pipe), it auto-selects uv if your system Python is missing or older than 3.11. You can also pass --uv or --no-uv to skip the prompt.
Quick install (one-liner):
curl -sSL https://raw.githubusercontent.com/llm-d/llm-d-benchmark/main/install.sh | bash
cd llm-d-benchmark
source .venv/bin/activate
llmdbenchmark --version
Or clone manually:
git clone https://github.com/llm-d/llm-d-benchmark.git
cd llm-d-benchmark
./install.sh # or: --uv / --no-uv
source .venv/bin/activate
llmdbenchmark --version
Install a specific branch:
LLMDBENCH_BRANCH=main \
curl -sSL https://raw.githubusercontent.com/llm-d/llm-d-benchmark/main/install.sh | bash
The install script auto-detects if the repo is present -- if not, it clones it first. It creates a virtualenv, validates system tools (kubectl, helm, Python 3.11+), and installs the llmdbenchmark package. See Installation for manual install and flags.
[!TIP] The last line of output from llmdbenchmark standup shows the workspace path where all rendered configs, manifests, and results are stored.
Stand up the llm-d stack, run a quick sanity benchmark, and tear down:
```bash
llmdbenchmark --spec gpu --dry-run standup
llmdbenchmark --spec gpu standup
llmdbenchmark --spec gpu run -l inference-perf -w sanity_random.yaml
The multi-model-wva scenario deploys N models under a single gateway, each with its own EPP + InferencePool + VariantAutoscaling + HPA, sharing one WVA controller and one HTTPRoute with N backendRefs:
```bash
llmdbenchmark --spec guides/multi-model-wva standup -p my-namespace
llmdbenchmark --spec guides/multi-model-wva run -p my-namespace --list-endpoints
```bash
git clone https://github.com/llm-d/llm-d-benchmark.git
cd llm-d-benchmark
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
pip install "git+https://github.com/llm-d-incubation/llm-d-planner.git@92b14fe09fea0ec9ff36539326b7a8df00f1022c"
llmdbenchmark --version
The standup phase supports two deployment paths:
Both paths share steps 00-05 (infrastructure, namespaces, secrets) and step 10 (smoketest).
llmdbenchmark --spec guides/multi-model-wva run -p my-namespace \ --stack qwen3-06b \ -l guidellm \ -w sanity_random.yaml \ -j 2
Breakdown of the Example:
- `--stack qwen3-06b` filters per-stack steps to that pool. Endpoint
detection (step 03) runs only for that stack and auto-resolves to
`http://<gateway>:80/qwen3-06b` - including the routing prefix - so every
downstream step targets the qwen3-06b InferencePool.
- `-l guidellm` selects the guidellm harness
([workload/harnesses/guidellm-llm-d-benchmark.sh](workload/harnesses/guidellm-llm-d-benchmark.sh)).
- `-j 2` launches two guidellm pods hitting the same endpoint. Both pods
run the same treatment (`-w`) but write to distinct result
subdirectories (`{experiment_id}_1`, `{experiment_id}_2`) on the
workload PVC.
Want to compare pools side-by-side? Launch two invocations in parallel
shells (different `--workspace` each):
bash
export LLMDBENCH_SPEC=inference-scheduling export LLMDBENCH_NAMESPACE=my-team-ns export LLMDBENCH_KUBECONFIG=~/.kube/my-cluster
llm-d-benchmark supports all available Well-Lit Path Guides. Each guide has a corresponding specification:
llmdbenchmark --spec inference-scheduling standup # Inference scheduling
llmdbenchmark --spec pd-disaggregation standup # Prefill-decode disaggregation
llmdbenchmark --spec tiered-prefix-cache standup # Tiered prefix cache
llmdbenchmark --spec precise-prefix-cache-aware standup # Precise prefix cache-aware routing
llmdbenchmark --spec wide-ep-lws standup # Wide expert-parallel with LWS
[!WARNING] wide-ep-lws requires RDMA/RoCE networking and LeaderWorkerSet (LWS) controller. Verify your cluster has working RDMA HCAs before deploying.
llmdbenchmark --spec guides/multi-model-wva --workspace /tmp/run-qwen run -p my-namespace \ --stack qwen3-06b \ -l guidellm -w sanity_random.yaml -j 2 &
| Flag | Env Var | Description |
|---|---|---|
--spec SPEC | LLMDBENCH_SPEC | Specification name or path (bare name, category/name, or full path) |
--workspace DIR / --ws | LLMDBENCH_WORKSPACE | Workspace directory for outputs (default: temp dir) |
--base-dir DIR / --bd | LLMDBENCH_BASE_DIR | Base directory for templates/scenarios (default: .) |
--non-admin / -i | LLMDBENCH_NON_ADMIN | Skip admin-only steps |
--dry-run / -n | LLMDBENCH_DRY_RUN | Generate YAML without applying to cluster |
--verbose / -v | LLMDBENCH_VERBOSE | Enable debug logging |
--version | Show version |
| Flag | Env Var | Description |
|---|---|---|
-p NS | LLMDBENCH_NAMESPACE | Namespace(s) to render into the plan |
-m MODELS | LLMDBENCH_MODELS | Model to render the plan for |
-t METHODS | LLMDBENCH_METHODS | Deployment method (standalone, modelservice) |
-f / --monitoring | Enable monitoring in rendered templates (PodMonitor, EPP verbosity) | |
-k FILE | LLMDBENCH_KUBECONFIG / KUBECONFIG | Kubeconfig path (used for cluster resource auto-detection) |
| Flag | Env Var | Description |
|---|---|---|
-s STEPS | Step filter (e.g., 0,1,5 or 1-7) | |
-c FILE | LLMDBENCH_SCENARIO | Scenario file |
-m MODELS | LLMDBENCH_MODELS | Models to deploy |
-p NS | LLMDBENCH_NAMESPACE | Namespace(s) |
-t METHODS | LLMDBENCH_METHODS | Deployment methods (standalone, modelservice) |
-r NAME | LLMDBENCH_RELEASE | Helm release name |
-k FILE | LLMDBENCH_KUBECONFIG / KUBECONFIG | Kubeconfig path |
--parallel N | LLMDBENCH_PARALLEL | Max parallel stacks (default: 4) |
--stack NAME[,NAME...] | LLMDBENCH_STACK | Restrict per-stack execution to the named subset. Useful in multi-stack scenarios (e.g. guides/multi-model-wva) to re-deploy a single pool without touching siblings. Unknown names fail loudly. |
--monitoring | LLMDBENCH_MONITORING | Enable PodMonitor creation and EPP verbosity during standup |
--skip-smoketest | Skip automatic smoketest after standup completes | |
--affinity | LLMDBENCH_AFFINITY | Node affinity / tolerations label |
--annotations | LLMDBENCH_ANNOTATIONS | Extra annotations for deployed resources |
--wva | LLMDBENCH_WVA | Workload Variant Autoscaler config |
| Flag | Env Var | Description |
|---|---|---|
-s STEPS | Step filter | |
-m MODELS | LLMDBENCH_MODELS | Model that was deployed (for resource name resolution) |
-t METHODS | LLMDBENCH_METHODS | Methods to tear down (standalone, modelservice) |
-r NAME | LLMDBENCH_RELEASE | Helm release name (default: llmdbench) |
-d / --deep | LLMDBENCH_DEEP_CLEAN | Deep clean: delete ALL resources in both namespaces |
-p NS | LLMDBENCH_NAMESPACE | Comma-separated namespaces (model,harness) |
--stack NAME[,NAME...] | LLMDBENCH_STACK | Restrict teardown to the named subset. Useful for removing one pool from a multi-stack scenario while leaving siblings in place. |
-k FILE | LLMDBENCH_KUBECONFIG / KUBECONFIG | Kubeconfig path |
| Flag | Env Var | Description |
|---|---|---|
-e FILE | LLMDBENCH_EXPERIMENTS | Experiment YAML with setup and run treatments (required) |
-p NS | LLMDBENCH_NAMESPACE | Namespace(s) |
-t METHODS | LLMDBENCH_METHODS | Deploy method |
-m MODELS | LLMDBENCH_MODELS | Models to deploy |
-k FILE | LLMDBENCH_KUBECONFIG / KUBECONFIG | Kubeconfig path |
--parallel N | LLMDBENCH_PARALLEL | Max parallel stacks (default: 4) |
-f / --monitoring | Enable monitoring during standup and run phases | |
-l HARNESS | LLMDBENCH_HARNESS | Harness name |
-w PROFILE | LLMDBENCH_WORKLOAD | Workload profile |
-o OVERRIDES | LLMDBENCH_OVERRIDES | Workload parameter overrides |
-r DEST | LLMDBENCH_OUTPUT | Results destination (local, gs://, s3://) |
-j N | LLMDBENCH_PARALLELISM | Parallel harness pods |
--wait-timeout N | LLMDBENCH_WAIT_TIMEOUT | Seconds to wait for harness completion |
-x DATASET | LLMDBENCH_DATASET | Dataset URL for harness replay |
-d / --debug | LLMDBENCH_DEBUG | Debug mode: start harness pods with sleep infinity |
--stop-on-error | Abort on first setup treatment failure | |
--skip-teardown | Leave stacks running for debugging |
| Flag | Env Var | Description |
|---|---|---|
-s STEPS | Step filter (e.g., 0,1,5 or 2-6) | |
-m MODEL | LLMDBENCH_MODEL | Model name override (e.g. facebook/opt-125m) |
-p NS | LLMDBENCH_NAMESPACE | Namespaces (deploy,benchmark) |
-t METHODS | LLMDBENCH_METHODS | Deploy method used during standup |
-k FILE | LLMDBENCH_KUBECONFIG / KUBECONFIG | Kubeconfig path |
-l HARNESS | LLMDBENCH_HARNESS | Harness name (inference-perf, guidellm, vllm-benchmark) |
-w PROFILE | LLMDBENCH_WORKLOAD | Workload profile YAML |
-e FILE | LLMDBENCH_EXPERIMENTS | Experiment treatments YAML for parameter sweeping |
-o OVERRIDES | LLMDBENCH_OVERRIDES | Workload parameter overrides (param=value,...) |
-r DEST | LLMDBENCH_OUTPUT | Results destination (local, gs://, s3://) |
-j N | LLMDBENCH_PARALLELISM | Parallel harness pods |
-U URL | LLMDBENCH_ENDPOINT_URL | Explicit endpoint URL (run-only mode) |
-c FILE | Run config YAML (run-only mode) | |
--generate-config | Generate config and exit | |
-x DATASET | LLMDBENCH_DATASET | Dataset URL for harness replay |
--wait-timeout N | LLMDBENCH_WAIT_TIMEOUT | Seconds to wait for harness completion |
--monitoring | Enable metrics scraping and EPP log capture during benchmark | |
-q / --serviceaccount | LLMDBENCH_SERVICE_ACCOUNT | Service account name for harness pods |
-g / --envvarspod | LLMDBENCH_HARNESS_ENVVARS_TO_YAML | Comma-separated env var names to propagate into harness pod |
--analyze | Run local analysis on results after collection | |
-z / --skip | LLMDBENCH_SKIP | Skip execution, only collect existing results |
-d / --debug | LLMDBENCH_DEBUG | Debug mode: start harness pods with sleep infinity |
--stack NAME[,NAME...] | LLMDBENCH_STACK | Restrict the benchmark to the named subset of stacks. Endpoint URL auto-resolves for the selected stack - no need for --endpoint-url. When --stack selects exactly one stack, -m/--models scopes to that stack only. |
--list-endpoints | Detect per-stack endpoint URLs, print a copy-paste table of llmdbenchmark run invocations, and exit without launching any harness pods. Useful after standup to discover what's deployed. |
Run post-deployment validation independently against an already-deployed stack.
llmdbenchmark --spec gpu smoketest -p my-namespace
llmdbenchmark --spec gpu smoketest -p my-namespace -s 2 # config validation only
| Flag | Env Var | Description |
|---|---|---|
-s STEPS | Step filter (e.g., 0,1,2 or 0-2) | |
-p NS | LLMDBENCH_NAMESPACE | Namespace(s) |
-t METHODS | LLMDBENCH_METHODS | Deployment methods (standalone, modelservice, fma) |
-k FILE | LLMDBENCH_KUBECONFIG / KUBECONFIG | Kubeconfig path |
--parallel N | LLMDBENCH_PARALLEL | Max parallel stacks (default: 4). Smoketest pins this to 1 regardless - parallel probes across stacks are confusing. |
--stack NAME[,NAME...] | LLMDBENCH_STACK | Restrict smoketest to the named subset of stacks. |
Smoketests also run automatically after standup unless --skip-smoketest is passed. See llmdbenchmark/smoketests/README.md for details on what each step validates.
Every CLI flag can be set via a LLMDBENCH_* environment variable (see tables above). The priority chain is:
This is useful for CI/CD pipelines, .bashrc configuration, or migrating from the original bash-based workflow.
```bash
llmdbenchmark standup --dry-run llmdbenchmark standup # live deploy to my-team-ns llmdbenchmark standup -p override-ns # CLI wins over env var ```
Boolean env vars accept 1, true, or yes (case-insensitive). Active LLMDBENCH_* overrides are logged at startup for debugging.
Values flow through a merge pipeline during the plan phase:
Steps read from the rendered config.yaml and never define their own fallback defaults. If a required key is missing from the rendered config, the step raises a clear error. This ensures defaults.yaml is the single source of truth for all default values. Environment variables (LLMDBENCH_*) sit between scenario overrides and CLI flags in the priority chain.
See config/README.md for the full configuration reference, including how to override values.
llmdbenchmark --spec guides/multi-model-wva run -p my-namespace
llmdbenchmark --spec guides/multi-model-wva run -p my-namespace \ --stack qwen3-06b \ -l inference-perf -w sanity_random.yaml
Already have a model-serving endpoint running? Skip deployment entirely:
llmdbenchmark --spec gpu run \
--endpoint-url http://10.131.0.42:80 \
--model meta-llama/Llama-3.1-8B \
--namespace my-namespace \
--harness inference-perf \
--workload sanity_random.yaml
This uses the same harness, profile rendering, and result collection pipeline -- just without the standup and teardown phases.
[!TIP]runcan also be used in debug mode (-d/--debug) which starts the harness pod withsleep infinityso you can exec into it and run commands interactively. See this example.
See workload/README.md for the full experiment file format and all pre-built experiments, as well as advanced functionality.
高质量的AI基准测试工具
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
AI Skill Hub 点评:LLM-D基准测试 的核心功能完整,质量良好。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | llm-d-benchmark |
| Topics | AIbenchmarkPython |
| GitHub | https://github.com/llm-d/llm-d-benchmark |
| License | Apache-2.0 |
| 语言 | Python |
收录时间:2026-05-26 · 更新时间:2026-05-26 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。