能力标签

🛠

AI工具

LLM-D基准测试

基于 Python · 开源免费，本地部署，数据完全自主可控

英文名：llm-d-benchmark

⭐ 60 Stars 🍴 79 Forks 💻 Python 📄 Apache-2.0 🏷 AI 7.5分

7.5AI 综合评分

AIbenchmarkPython

🌐 访问官网

✦ AI Skill Hub 推荐

经 AI Skill Hub 精选评估，LLM-D基准测试获评「推荐使用」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色，AI 评分 7.5 分，适合有一定技术背景的用户使用。

📚 深度解析

LLM-D基准测试是一款基于 Python 的开源工具，在 GitHub 上收获 0k+ Star，是AI、benchmark、Python领域中的优质开源项目。开源工具的最大优势在于代码完全透明，你可以审计每一行代码的安全性，也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS？**
对于个人开发者和有隐私需求的用户，本地部署的开源工具意味着数据不离本机，不受第三方服务商的数据政策约束。同时，开源工具通常没有使用次数限制和月度费用，一次安装即可长期使用，对于高频使用场景的总拥有成本（TCO）远低于订阅制商业工具。

**安装与环境准备**
LLM-D基准测试依赖 Python 运行环境。建议通过 pyenv（Python）或 nvm（Node.js）管理 Python 版本，避免全局环境污染。对于新手用户，推荐先创建虚拟环境（python -m venv venv && source venv/bin/activate），再安装依赖，这样即使出现问题也可以随时删除虚拟环境重新开始，不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues（已关闭的问题），大多数常见问题都已有解答。遇到 Bug 时，提供 pip list 的输出、完整错误堆栈和最小可复现示例，能显著提高开发者响应速度。AI Skill Hub 将持续追踪 LLM-D基准测试的版本更新，及时通知重要功能变化。

📋 工具概览

LLM-D基准测试是一款基于 Python 开发的开源工具，专注于 AI、benchmark、Python 等核心功能。作为 GitHub 开源项目，它拥有活跃的社区支持和持续的版本迭代，代码完全透明可审计，支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流，都能提供稳定可靠的解决方案。

GitHub Stars

⭐ 60

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

轻量级项目，按需更新

开源协议

Apache-2.0

AI 综合评分

7.5 分

工具类型

AI工具

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

🎯 主要使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：pip 安装（推荐）
pip install llm-d-benchmark

# 方式二：虚拟环境安装（推荐生产环境）
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install llm-d-benchmark

# 方式三：从源码安装（获取最新功能）
git clone https://github.com/llm-d/llm-d-benchmark
cd llm-d-benchmark
pip install -e .

# 验证安装
python -c "import llm_d_benchmark; print('安装成功')"

📋 安装步骤说明

访问 GitHub 仓库页面
按照 README 文档完成依赖安装
根据系统环境完成初始化配置
参考官方示例或文档开始使用
遇到问题可在 GitHub Issues 中查找解答

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 命令行使用
llm-d-benchmark --help

# 基本用法
llm-d-benchmark input_file -o output_file

# Python 代码中调用
import llm_d_benchmark

# 示例
result = llm_d_benchmark.process("input")
print(result)

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# llm-d-benchmark 配置文件示例（config.yml）
app:
  name: "llm-d-benchmark"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
llm-d-benchmark --config config.yml

# 或通过环境变量配置
export LLM_D_BENCHMARK_API_KEY="your-key"
export LLM_D_BENCHMARK_OUTPUT_DIR="./output"

📑 README 深度解析真实文档完整度 62/100 查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

llm-d-benchmark

	Google Kubernetes Engine	Coreweave Kubernetes Services	OpenShift
Standalone	[![.github/workflows/ci-nightly-benchmark-gke-standalone.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-standalone.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-standalone.yaml)	[![.github/workflows/ci-nightly-benchmark-cks-standalone.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-standalone.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-standalone.yaml)	[![.github/workflows/ci-nightly-benchmark-ocp-standalone.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-standalone.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-standalone.yaml)
Modelservice	[![.github/workflows/ci-nightly-benchmark-gke-modelservice.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-modelservice.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-modelservice.yaml)	[![.github/workflows/ci-nightly-benchmark-cks-modelservice.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-modelservice.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-modelservice.yaml)	[![.github/workflows/ci-nightly-benchmark-ocp-modelservice.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-modelservice.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-modelservice.yaml)
Fast Model Actuator	[![.github/workflows/ci-nightly-benchmark-gke-fma.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-fma.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-fma.yaml)	[![.github/workflows/ci-nightly-benchmark-cks-fma.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-fma.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-fma.yaml)	[![.github/workflows/ci-nightly-benchmark-ocp-fma.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-fma.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-fma.yaml)
Kustomize	NA	NA	NA

This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.

[!TIP] We acknowledge many users are still utilizing our previous (now deprecated) library, and to make the transition easier, we still have that library available. It can be found in our v0.5.2 version tag.

Prerequisites

Please refer to the official llm-d prerequisites for the most up-to-date requirements. For the client setup, the provided install.sh will install the necessary tools.

Administrative Requirements

Deploying the llm-d stack requires cluster-level admin privileges, as you will be configuring cluster-level resources. However, the scripts can be executed by namespace-level admin users, as long as the Kubernetes infrastructure components are configured and the target namespace already exists.

Prerequisites

Please refer to the official llm-d prerequisites for the most up-to-date requirements.

System Requirements

- Python 3.11+ - kubectl -- Kubernetes CLI - helm (>= 4.x) -- Helm package manager - curl, git -- Standard system tools - helmfile (>= 1.5) -- Required for modelservice deployments. Older helmfile is incompatible with Helm 4 (it probes helm with the removed helm version --client flag and panics). ./install.sh installs the pinned Helm 4 / helmfile combination for you. - kustomize, jq, yq -- Required for template rendering - skopeo, crane -- Required for container image management - oc (optional) -- Required for OpenShift clusters (either kubectl or oc must be present)

Administrative Requirements

[!IMPORTANT] Deploying the llm-d stack requires cluster-level admin privileges for configuring cluster-level resources. Namespace-level admin users can run the tool if Kubernetes infrastructure components are configured and the target namespace already exists. Use --non-admin to skip admin-only steps.

Dependencies

Getting Started

Install

The install script supports both uv and the standard python3 -m venv for virtual environment creation. When run interactively, it will prompt you to choose; in non-interactive mode (e.g. curl pipe), it auto-selects uv if your system Python is missing or older than 3.11. You can also pass --uv or --no-uv to skip the prompt.

Quick install (one-liner):

curl -sSL https://raw.githubusercontent.com/llm-d/llm-d-benchmark/main/install.sh | bash
cd llm-d-benchmark
source .venv/bin/activate
llmdbenchmark --version

Or clone manually:

git clone https://github.com/llm-d/llm-d-benchmark.git
cd llm-d-benchmark
./install.sh              # or: --uv / --no-uv
source .venv/bin/activate
llmdbenchmark --version

Install a specific branch:

LLMDBENCH_BRANCH=main \
  curl -sSL https://raw.githubusercontent.com/llm-d/llm-d-benchmark/main/install.sh | bash

The install script auto-detects if the repo is present -- if not, it clones it first. It creates a virtualenv, validates system tools (kubectl, helm, Python 3.11+), and installs the llmdbenchmark package. See Installation for manual install and flags.

[!TIP] The last line of output from llmdbenchmark standup shows the workspace path where all rendered configs, manifests, and results are stored.

Deploy and benchmark (full pipeline)

Stand up the llm-d stack, run a quick sanity benchmark, and tear down:

```bash

Preview what would be deployed (no cluster changes)

llmdbenchmark --spec gpu --dry-run standup

Deploy for real

llmdbenchmark --spec gpu standup

Run a sanity benchmark against the deployed endpoint

llmdbenchmark --spec gpu run -l inference-perf -w sanity_random.yaml

Deploy multiple models behind one gateway

The multi-model-wva scenario deploys N models under a single gateway, each with its own EPP + InferencePool + VariantAutoscaling + HPA, sharing one WVA controller and one HTTPRoute with N backendRefs:

```bash

Standup - renders two stacks (qwen3-06b, llama-31-8b), installs shared

infra once, deploys a per-model Helm release + VA + HPA for each.

llmdbenchmark --spec guides/multi-model-wva standup -p my-namespace

See what's deployed: list detected endpoints + copy-paste run commands.

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace --list-endpoints

infra (istio, Gateway, WVA controller, model PVC) installs normally,

Installation

Quick Install (recommended)

```bash

Manual Install w/o Install Script

git clone https://github.com/llm-d/llm-d-benchmark.git
cd llm-d-benchmark
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
pip install "git+https://github.com/llm-d-incubation/llm-d-planner.git@92b14fe09fea0ec9ff36539326b7a8df00f1022c"

Verify Installation

llmdbenchmark --version

[Deployment Methods](llmdbenchmark/standup/README.md#deployment-methods)

The standup phase supports two deployment paths:

standalone -- Direct Kubernetes Deployments and Services for each model (step 06)
modelservice -- Helm-based deployment with gateway infrastructure, GAIE, and LWS support (steps 07-09)

Both paths share steps 00-05 (infrastructure, namespaces, secrets) and step 10 (smoketest).

Benchmark qwen3-06b only with guidellm, two parallel harness pods

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace \ --stack qwen3-06b \ -l guidellm \ -w sanity_random.yaml \ -j 2


Breakdown of the Example:

- `--stack qwen3-06b` filters per-stack steps to that pool. Endpoint
  detection (step 03) runs only for that stack and auto-resolves to
  `http://<gateway>:80/qwen3-06b` - including the routing prefix - so every
  downstream step targets the qwen3-06b InferencePool.
- `-l guidellm` selects the guidellm harness
  ([workload/harnesses/guidellm-llm-d-benchmark.sh](workload/harnesses/guidellm-llm-d-benchmark.sh)).
- `-j 2` launches two guidellm pods hitting the same endpoint. Both pods
  run the same treatment (`-w`) but write to distinct result
  subdirectories (`{experiment_id}_1`, `{experiment_id}_2`) on the
  workload PVC.

Want to compare pools side-by-side? Launch two invocations in parallel
shells (different `--workspace` each):

bash

Example: set common defaults via env vars, override per-run via CLI

export LLMDBENCH_SPEC=inference-scheduling export LLMDBENCH_NAMESPACE=my-team-ns export LLMDBENCH_KUBECONFIG=~/.kube/my-cluster

Well-Lit Path Guides

llm-d-benchmark supports all available Well-Lit Path Guides. Each guide has a corresponding specification:

llmdbenchmark --spec inference-scheduling standup       # Inference scheduling
llmdbenchmark --spec pd-disaggregation standup          # Prefill-decode disaggregation
llmdbenchmark --spec tiered-prefix-cache standup        # Tiered prefix cache
llmdbenchmark --spec precise-prefix-cache-aware standup # Precise prefix cache-aware routing
llmdbenchmark --spec wide-ep-lws standup                # Wide expert-parallel with LWS

[!WARNING] wide-ep-lws requires RDMA/RoCE networking and LeaderWorkerSet (LWS) controller. Verify your cluster has working RDMA HCAs before deploying.

Terminal 1 - --workspace is a global option, placed before the subcommand

llmdbenchmark --spec guides/multi-model-wva --workspace /tmp/run-qwen run -p my-namespace \ --stack qwen3-06b \ -l guidellm -w sanity_random.yaml -j 2 &

Global Options

Flag	Env Var	Description
`--spec SPEC`	`LLMDBENCH_SPEC`	Specification name or path (bare name, category/name, or full path)
`--workspace DIR` / `--ws`	`LLMDBENCH_WORKSPACE`	Workspace directory for outputs (default: temp dir)
`--base-dir DIR` / `--bd`	`LLMDBENCH_BASE_DIR`	Base directory for templates/scenarios (default: `.`)
`--non-admin` / `-i`	`LLMDBENCH_NON_ADMIN`	Skip admin-only steps
`--dry-run` / `-n`	`LLMDBENCH_DRY_RUN`	Generate YAML without applying to cluster
`--verbose` / `-v`	`LLMDBENCH_VERBOSE`	Enable debug logging
`--version`		Show version

Plan Options

Flag	Env Var	Description
`-p NS`	`LLMDBENCH_NAMESPACE`	Namespace(s) to render into the plan
`-m MODELS`	`LLMDBENCH_MODELS`	Model to render the plan for
`-t METHODS`	`LLMDBENCH_METHODS`	Deployment method (`standalone`, `modelservice`)
`-f` / `--monitoring`		Enable monitoring in rendered templates (PodMonitor, EPP verbosity)
`-k FILE`	`LLMDBENCH_KUBECONFIG` / `KUBECONFIG`	Kubeconfig path (used for cluster resource auto-detection)

Standup Options

Flag	Env Var	Description
`-s STEPS`		Step filter (e.g., `0,1,5` or `1-7`)
`-c FILE`	`LLMDBENCH_SCENARIO`	Scenario file
`-m MODELS`	`LLMDBENCH_MODELS`	Models to deploy
`-p NS`	`LLMDBENCH_NAMESPACE`	Namespace(s)
`-t METHODS`	`LLMDBENCH_METHODS`	Deployment methods (`standalone`, `modelservice`)
`-r NAME`	`LLMDBENCH_RELEASE`	Helm release name
`-k FILE`	`LLMDBENCH_KUBECONFIG` / `KUBECONFIG`	Kubeconfig path
`--parallel N`	`LLMDBENCH_PARALLEL`	Max parallel stacks (default: 4)
`--stack NAME[,NAME...]`	`LLMDBENCH_STACK`	Restrict per-stack execution to the named subset. Useful in multi-stack scenarios (e.g. `guides/multi-model-wva`) to re-deploy a single pool without touching siblings. Unknown names fail loudly.
`--monitoring`	`LLMDBENCH_MONITORING`	Enable PodMonitor creation and EPP verbosity during standup
`--skip-smoketest`		Skip automatic smoketest after standup completes
`--affinity`	`LLMDBENCH_AFFINITY`	Node affinity / tolerations label
`--annotations`	`LLMDBENCH_ANNOTATIONS`	Extra annotations for deployed resources
`--wva`	`LLMDBENCH_WVA`	Workload Variant Autoscaler config

Teardown Options

Flag	Env Var	Description
`-s STEPS`		Step filter
`-m MODELS`	`LLMDBENCH_MODELS`	Model that was deployed (for resource name resolution)
`-t METHODS`	`LLMDBENCH_METHODS`	Methods to tear down (`standalone`, `modelservice`)
`-r NAME`	`LLMDBENCH_RELEASE`	Helm release name (default: `llmdbench`)
`-d` / `--deep`	`LLMDBENCH_DEEP_CLEAN`	Deep clean: delete ALL resources in both namespaces
`-p NS`	`LLMDBENCH_NAMESPACE`	Comma-separated namespaces (model,harness)
`--stack NAME[,NAME...]`	`LLMDBENCH_STACK`	Restrict teardown to the named subset. Useful for removing one pool from a multi-stack scenario while leaving siblings in place.
`-k FILE`	`LLMDBENCH_KUBECONFIG` / `KUBECONFIG`	Kubeconfig path

Experiment Options

Flag	Env Var	Description
`-e FILE`	`LLMDBENCH_EXPERIMENTS`	Experiment YAML with setup and run treatments (required)
`-p NS`	`LLMDBENCH_NAMESPACE`	Namespace(s)
`-t METHODS`	`LLMDBENCH_METHODS`	Deploy method
`-m MODELS`	`LLMDBENCH_MODELS`	Models to deploy
`-k FILE`	`LLMDBENCH_KUBECONFIG` / `KUBECONFIG`	Kubeconfig path
`--parallel N`	`LLMDBENCH_PARALLEL`	Max parallel stacks (default: 4)
`-f` / `--monitoring`		Enable monitoring during standup and run phases
`-l HARNESS`	`LLMDBENCH_HARNESS`	Harness name
`-w PROFILE`	`LLMDBENCH_WORKLOAD`	Workload profile
`-o OVERRIDES`	`LLMDBENCH_OVERRIDES`	Workload parameter overrides
`-r DEST`	`LLMDBENCH_OUTPUT`	Results destination (local, gs://, s3://)
`-j N`	`LLMDBENCH_PARALLELISM`	Parallel harness pods
`--wait-timeout N`	`LLMDBENCH_WAIT_TIMEOUT`	Seconds to wait for harness completion
`-x DATASET`	`LLMDBENCH_DATASET`	Dataset URL for harness replay
`-d` / `--debug`	`LLMDBENCH_DEBUG`	Debug mode: start harness pods with sleep infinity
`--stop-on-error`		Abort on first setup treatment failure
`--skip-teardown`		Leave stacks running for debugging

Run Options

Flag	Env Var	Description
`-s STEPS`		Step filter (e.g., `0,1,5` or `2-6`)
`-m MODEL`	`LLMDBENCH_MODEL`	Model name override (e.g. facebook/opt-125m)
`-p NS`	`LLMDBENCH_NAMESPACE`	Namespaces (deploy,benchmark)
`-t METHODS`	`LLMDBENCH_METHODS`	Deploy method used during standup
`-k FILE`	`LLMDBENCH_KUBECONFIG` / `KUBECONFIG`	Kubeconfig path
`-l HARNESS`	`LLMDBENCH_HARNESS`	Harness name (inference-perf, guidellm, vllm-benchmark)
`-w PROFILE`	`LLMDBENCH_WORKLOAD`	Workload profile YAML
`-e FILE`	`LLMDBENCH_EXPERIMENTS`	Experiment treatments YAML for parameter sweeping
`-o OVERRIDES`	`LLMDBENCH_OVERRIDES`	Workload parameter overrides (param=value,...)
`-r DEST`	`LLMDBENCH_OUTPUT`	Results destination (local, gs://, s3://)
`-j N`	`LLMDBENCH_PARALLELISM`	Parallel harness pods
`-U URL`	`LLMDBENCH_ENDPOINT_URL`	Explicit endpoint URL (run-only mode)
`-c FILE`		Run config YAML (run-only mode)
`--generate-config`		Generate config and exit
`-x DATASET`	`LLMDBENCH_DATASET`	Dataset URL for harness replay
`--wait-timeout N`	`LLMDBENCH_WAIT_TIMEOUT`	Seconds to wait for harness completion
`--monitoring`		Enable metrics scraping and EPP log capture during benchmark
`-q` / `--serviceaccount`	`LLMDBENCH_SERVICE_ACCOUNT`	Service account name for harness pods
`-g` / `--envvarspod`	`LLMDBENCH_HARNESS_ENVVARS_TO_YAML`	Comma-separated env var names to propagate into harness pod
`--analyze`		Run local analysis on results after collection
`-z` / `--skip`	`LLMDBENCH_SKIP`	Skip execution, only collect existing results
`-d` / `--debug`	`LLMDBENCH_DEBUG`	Debug mode: start harness pods with sleep infinity
`--stack NAME[,NAME...]`	`LLMDBENCH_STACK`	Restrict the benchmark to the named subset of stacks. Endpoint URL auto-resolves for the selected stack - no need for `--endpoint-url`. When `--stack` selects exactly one stack, `-m/--models` scopes to that stack only.
`--list-endpoints`		Detect per-stack endpoint URLs, print a copy-paste table of `llmdbenchmark run` invocations, and exit without launching any harness pods. Useful after `standup` to discover what's deployed.

Smoketest Options

Run post-deployment validation independently against an already-deployed stack.

llmdbenchmark --spec gpu smoketest -p my-namespace
llmdbenchmark --spec gpu smoketest -p my-namespace -s 2   # config validation only

Flag	Env Var	Description
`-s STEPS`		Step filter (e.g., `0,1,2` or `0-2`)
`-p NS`	`LLMDBENCH_NAMESPACE`	Namespace(s)
`-t METHODS`	`LLMDBENCH_METHODS`	Deployment methods (standalone, modelservice, fma)
`-k FILE`	`LLMDBENCH_KUBECONFIG` / `KUBECONFIG`	Kubeconfig path
`--parallel N`	`LLMDBENCH_PARALLEL`	Max parallel stacks (default: 4). Smoketest pins this to 1 regardless - parallel probes across stacks are confusing.
`--stack NAME[,NAME...]`	`LLMDBENCH_STACK`	Restrict smoketest to the named subset of stacks.

Smoketests also run automatically after standup unless --skip-smoketest is passed. See llmdbenchmark/smoketests/README.md for details on what each step validates.

Environment Variables

Every CLI flag can be set via a LLMDBENCH_* environment variable (see tables above). The priority chain is:

CLI flag (highest) -- explicitly passed on the command line
Environment variable -- exported in the user's shell
Rendered config (lowest) -- defaults.yaml + scenario YAML

This is useful for CI/CD pipelines, .bashrc configuration, or migrating from the original bash-based workflow.

```bash

These use the env vars above; --dry-run overrides nothing, just adds a flag

llmdbenchmark standup --dry-run llmdbenchmark standup # live deploy to my-team-ns llmdbenchmark standup -p override-ns # CLI wins over env var ```

Boolean env vars accept 1, true, or yes (case-insensitive). Active LLMDBENCH_* overrides are logged at startup for debugging.

[Config Override Chain](config/README.md#config-override-chain)

Values flow through a merge pipeline during the plan phase:

Config Override Chain

Steps read from the rendered config.yaml and never define their own fallback defaults. If a required key is missing from the rendered config, the step raises a clear error. This ensures defaults.yaml is the single source of truth for all default values. Environment variables (LLMDBENCH_*) sit between scenario overrides and CLI flags in the priority chain.

See config/README.md for the full configuration reference, including how to override values.

Run - iterates every stack, each harness pod targets its own pool's endpoint.

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace

Benchmark just one pool (no --endpoint-url needed - auto-resolves):

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace \ --stack qwen3-06b \ -l inference-perf -w sanity_random.yaml

Benchmark an existing endpoint (run-only mode)

Already have a model-serving endpoint running? Skip deployment entirely:

llmdbenchmark --spec gpu run \
  --endpoint-url http://10.131.0.42:80 \
  --model meta-llama/Llama-3.1-8B \
  --namespace my-namespace \
  --harness inference-perf \
  --workload sanity_random.yaml

This uses the same harness, profile rendering, and result collection pipeline -- just without the standup and teardown phases.

[!TIP] run can also be used in debug mode (-d / --debug) which starts the harness pod with sleep infinity so you can exec into it and run commands interactively. See this example.

See workload/README.md for the full experiment file format and all pre-built experiments, as well as advanced functionality.