能力标签
🛠
AI工具

LLM-D基准测试

基于 Python · 开源免费,本地部署,数据完全自主可控
英文名:llm-d-benchmark
⭐ 60 Stars 🍴 79 Forks 💻 Python 📄 Apache-2.0 🏷 AI 7.5分
7.5AI 综合评分
AIbenchmarkPython
✦ AI Skill Hub 推荐

经 AI Skill Hub 精选评估,LLM-D基准测试 获评「推荐使用」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 7.5 分,适合有一定技术背景的用户使用。

📚 深度解析
LLM-D基准测试 是一款基于 Python 的开源工具,在 GitHub 上收获 0k+ Star,是AI、benchmark、Python领域中的优质开源项目。开源工具的最大优势在于代码完全透明,你可以审计每一行代码的安全性,也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS?**
对于个人开发者和有隐私需求的用户,本地部署的开源工具意味着数据不离本机,不受第三方服务商的数据政策约束。同时,开源工具通常没有使用次数限制和月度费用,一次安装即可长期使用,对于高频使用场景的总拥有成本(TCO)远低于订阅制商业工具。

**安装与环境准备**
LLM-D基准测试 依赖 Python 运行环境。建议通过 pyenv(Python)或 nvm(Node.js)管理 Python 版本,避免全局环境污染。对于新手用户,推荐先创建虚拟环境(python -m venv venv && source venv/bin/activate),再安装依赖,这样即使出现问题也可以随时删除虚拟环境重新开始,不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues(已关闭的问题),大多数常见问题都已有解答。遇到 Bug 时,提供 pip list 的输出、完整错误堆栈和最小可复现示例,能显著提高开发者响应速度。AI Skill Hub 将持续追踪 LLM-D基准测试 的版本更新,及时通知重要功能变化。
📋 工具概览

LLM-D基准测试 是一款基于 Python 开发的开源工具,专注于 AI、benchmark、Python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。

GitHub Stars
⭐ 60
开发语言
Python
支持平台
Windows / macOS / Linux
维护状态
轻量级项目,按需更新
开源协议
Apache-2.0
AI 综合评分
7.5 分
工具类型
AI工具
Forks
79
📖 中文文档
以下内容由 AI Skill Hub 根据项目信息自动整理,如需查看完整原始文档请访问底部「原始来源」。

LLM-D基准测试 是一款基于 Python 开发的开源工具,专注于 AI、benchmark、Python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。

📌 核心特色
  • 开源免费,支持本地部署,数据完全自主可控
  • 活跃的 GitHub 开源社区,持续迭代更新
  • 提供详细文档和使用示例,新手友好
  • 支持自定义配置,灵活适配不同使用环境
  • 可作为基础组件集成进现有技术栈或进行二次开发
🎯 主要使用场景
  • 本地部署运行,保护数据隐私,满足合规要求
  • 自定义集成到现有系统,扩展技术栈能力
  • 作为开源基础组件进行商业化二次开发
以下安装命令基于项目开发语言和类型自动生成,实际以官方 README 为准。
安装命令
# 方式一:pip 安装(推荐)
pip install llm-d-benchmark

# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install llm-d-benchmark

# 方式三:从源码安装(获取最新功能)
git clone https://github.com/llm-d/llm-d-benchmark
cd llm-d-benchmark
pip install -e .

# 验证安装
python -c "import llm_d_benchmark; print('安装成功')"
📋 安装步骤说明
  1. 访问 GitHub 仓库页面
  2. 按照 README 文档完成依赖安装
  3. 根据系统环境完成初始化配置
  4. 参考官方示例或文档开始使用
  5. 遇到问题可在 GitHub Issues 中查找解答
以下用法示例由 AI Skill Hub 整理,涵盖最常见的使用场景。
常用命令 / 代码示例
# 命令行使用
llm-d-benchmark --help

# 基本用法
llm-d-benchmark input_file -o output_file

# Python 代码中调用
import llm_d_benchmark

# 示例
result = llm_d_benchmark.process("input")
print(result)
以下配置示例基于典型使用场景生成,具体参数请参照官方文档调整。
配置示例
# llm-d-benchmark 配置文件示例(config.yml)
app:
  name: "llm-d-benchmark"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
llm-d-benchmark --config config.yml

# 或通过环境变量配置
export LLM_D_BENCHMARK_API_KEY="your-key"
export LLM_D_BENCHMARK_OUTPUT_DIR="./output"
📑 README 深度解析 真实文档 完整度 62/100 查看 GitHub 原文 →
以下内容由系统直接从 GitHub README 解析整理,保留代码块、表格与列表结构。

llm-d-benchmark

Release Status License Join Slack

.github/workflows/ci-nightly-benchmark-build-image.yaml

Google Kubernetes EngineCoreweave Kubernetes ServicesOpenShift
Standalone[![.github/workflows/ci-nightly-benchmark-gke-standalone.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-standalone.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-standalone.yaml)[![.github/workflows/ci-nightly-benchmark-cks-standalone.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-standalone.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-standalone.yaml)[![.github/workflows/ci-nightly-benchmark-ocp-standalone.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-standalone.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-standalone.yaml)
Modelservice[![.github/workflows/ci-nightly-benchmark-gke-modelservice.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-modelservice.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-modelservice.yaml)[![.github/workflows/ci-nightly-benchmark-cks-modelservice.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-modelservice.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-modelservice.yaml)[![.github/workflows/ci-nightly-benchmark-ocp-modelservice.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-modelservice.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-modelservice.yaml)
Fast Model Actuator[![.github/workflows/ci-nightly-benchmark-gke-fma.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-fma.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-gke-fma.yaml)[![.github/workflows/ci-nightly-benchmark-cks-fma.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-fma.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-cks-fma.yaml)[![.github/workflows/ci-nightly-benchmark-ocp-fma.yaml](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-fma.yaml/badge.svg)](https://github.com/llm-d/llm-d-benchmark/actions/workflows/ci-nightly-benchmark-ocp-fma.yaml)
KustomizeNANANA

This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.

[!TIP] We acknowledge many users are still utilizing our previous (now deprecated) library, and to make the transition easier, we still have that library available. It can be found in our v0.5.2 version tag.

Prerequisites

Please refer to the official llm-d prerequisites for the most up-to-date requirements. For the client setup, the provided install.sh will install the necessary tools.

Administrative Requirements

Deploying the llm-d stack requires cluster-level admin privileges, as you will be configuring cluster-level resources. However, the scripts can be executed by namespace-level admin users, as long as the Kubernetes infrastructure components are configured and the target namespace already exists.

Prerequisites

Please refer to the official llm-d prerequisites for the most up-to-date requirements.

System Requirements

- Python 3.11+ - kubectl -- Kubernetes CLI - helm (>= 4.x) -- Helm package manager - curl, git -- Standard system tools - helmfile (>= 1.5) -- Required for modelservice deployments. Older helmfile is incompatible with Helm 4 (it probes helm with the removed helm version --client flag and panics). ./install.sh installs the pinned Helm 4 / helmfile combination for you. - kustomize, jq, yq -- Required for template rendering - skopeo, crane -- Required for container image management - oc (optional) -- Required for OpenShift clusters (either kubectl or oc must be present)

Administrative Requirements

[!IMPORTANT] Deploying the llm-d stack requires cluster-level admin privileges for configuring cluster-level resources. Namespace-level admin users can run the tool if Kubernetes infrastructure components are configured and the target namespace already exists. Use --non-admin to skip admin-only steps.

Dependencies

Getting Started

Install

The install script supports both uv and the standard python3 -m venv for virtual environment creation. When run interactively, it will prompt you to choose; in non-interactive mode (e.g. curl pipe), it auto-selects uv if your system Python is missing or older than 3.11. You can also pass --uv or --no-uv to skip the prompt.

Quick install (one-liner):

curl -sSL https://raw.githubusercontent.com/llm-d/llm-d-benchmark/main/install.sh | bash
cd llm-d-benchmark
source .venv/bin/activate
llmdbenchmark --version

Or clone manually:

git clone https://github.com/llm-d/llm-d-benchmark.git
cd llm-d-benchmark
./install.sh              # or: --uv / --no-uv
source .venv/bin/activate
llmdbenchmark --version

Install a specific branch:

LLMDBENCH_BRANCH=main \
  curl -sSL https://raw.githubusercontent.com/llm-d/llm-d-benchmark/main/install.sh | bash

The install script auto-detects if the repo is present -- if not, it clones it first. It creates a virtualenv, validates system tools (kubectl, helm, Python 3.11+), and installs the llmdbenchmark package. See Installation for manual install and flags.

[!TIP] The last line of output from llmdbenchmark standup shows the workspace path where all rendered configs, manifests, and results are stored.

Deploy and benchmark (full pipeline)

Stand up the llm-d stack, run a quick sanity benchmark, and tear down:

```bash

Preview what would be deployed (no cluster changes)

llmdbenchmark --spec gpu --dry-run standup

Deploy for real

llmdbenchmark --spec gpu standup

Run a sanity benchmark against the deployed endpoint

llmdbenchmark --spec gpu run -l inference-perf -w sanity_random.yaml

Deploy multiple models behind one gateway

The multi-model-wva scenario deploys N models under a single gateway, each with its own EPP + InferencePool + VariantAutoscaling + HPA, sharing one WVA controller and one HTTPRoute with N backendRefs:

```bash

Standup - renders two stacks (qwen3-06b, llama-31-8b), installs shared

infra once, deploys a per-model Helm release + VA + HPA for each.

llmdbenchmark --spec guides/multi-model-wva standup -p my-namespace

See what's deployed: list detected endpoints + copy-paste run commands.

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace --list-endpoints

infra (istio, Gateway, WVA controller, model PVC) installs normally,

Installation

Manual Install w/o Install Script

git clone https://github.com/llm-d/llm-d-benchmark.git
cd llm-d-benchmark
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
pip install "git+https://github.com/llm-d-incubation/llm-d-planner.git@92b14fe09fea0ec9ff36539326b7a8df00f1022c"

Verify Installation

llmdbenchmark --version

[Deployment Methods](llmdbenchmark/standup/README.md#deployment-methods)

The standup phase supports two deployment paths:

  • standalone -- Direct Kubernetes Deployments and Services for each model (step 06)
  • modelservice -- Helm-based deployment with gateway infrastructure, GAIE, and LWS support (steps 07-09)

Both paths share steps 00-05 (infrastructure, namespaces, secrets) and step 10 (smoketest).

Benchmark qwen3-06b only with guidellm, two parallel harness pods

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace \ --stack qwen3-06b \ -l guidellm \ -w sanity_random.yaml \ -j 2


Breakdown of the Example:

- `--stack qwen3-06b` filters per-stack steps to that pool. Endpoint
  detection (step 03) runs only for that stack and auto-resolves to
  `http://<gateway>:80/qwen3-06b` - including the routing prefix - so every
  downstream step targets the qwen3-06b InferencePool.
- `-l guidellm` selects the guidellm harness
  ([workload/harnesses/guidellm-llm-d-benchmark.sh](workload/harnesses/guidellm-llm-d-benchmark.sh)).
- `-j 2` launches two guidellm pods hitting the same endpoint. Both pods
  run the same treatment (`-w`) but write to distinct result
  subdirectories (`{experiment_id}_1`, `{experiment_id}_2`) on the
  workload PVC.

Want to compare pools side-by-side? Launch two invocations in parallel
shells (different `--workspace` each):
bash

Example: set common defaults via env vars, override per-run via CLI

export LLMDBENCH_SPEC=inference-scheduling export LLMDBENCH_NAMESPACE=my-team-ns export LLMDBENCH_KUBECONFIG=~/.kube/my-cluster

Well-Lit Path Guides

llm-d-benchmark supports all available Well-Lit Path Guides. Each guide has a corresponding specification:

llmdbenchmark --spec inference-scheduling standup       # Inference scheduling
llmdbenchmark --spec pd-disaggregation standup          # Prefill-decode disaggregation
llmdbenchmark --spec tiered-prefix-cache standup        # Tiered prefix cache
llmdbenchmark --spec precise-prefix-cache-aware standup # Precise prefix cache-aware routing
llmdbenchmark --spec wide-ep-lws standup                # Wide expert-parallel with LWS
[!WARNING] wide-ep-lws requires RDMA/RoCE networking and LeaderWorkerSet (LWS) controller. Verify your cluster has working RDMA HCAs before deploying.

Terminal 1 - --workspace is a global option, placed before the subcommand

llmdbenchmark --spec guides/multi-model-wva --workspace /tmp/run-qwen run -p my-namespace \ --stack qwen3-06b \ -l guidellm -w sanity_random.yaml -j 2 &

Global Options

FlagEnv VarDescription
--spec SPECLLMDBENCH_SPECSpecification name or path (bare name, category/name, or full path)
--workspace DIR / --wsLLMDBENCH_WORKSPACEWorkspace directory for outputs (default: temp dir)
--base-dir DIR / --bdLLMDBENCH_BASE_DIRBase directory for templates/scenarios (default: .)
--non-admin / -iLLMDBENCH_NON_ADMINSkip admin-only steps
--dry-run / -nLLMDBENCH_DRY_RUNGenerate YAML without applying to cluster
--verbose / -vLLMDBENCH_VERBOSEEnable debug logging
--versionShow version

Plan Options

FlagEnv VarDescription
-p NSLLMDBENCH_NAMESPACENamespace(s) to render into the plan
-m MODELSLLMDBENCH_MODELSModel to render the plan for
-t METHODSLLMDBENCH_METHODSDeployment method (standalone, modelservice)
-f / --monitoringEnable monitoring in rendered templates (PodMonitor, EPP verbosity)
-k FILELLMDBENCH_KUBECONFIG / KUBECONFIGKubeconfig path (used for cluster resource auto-detection)

Standup Options

FlagEnv VarDescription
-s STEPSStep filter (e.g., 0,1,5 or 1-7)
-c FILELLMDBENCH_SCENARIOScenario file
-m MODELSLLMDBENCH_MODELSModels to deploy
-p NSLLMDBENCH_NAMESPACENamespace(s)
-t METHODSLLMDBENCH_METHODSDeployment methods (standalone, modelservice)
-r NAMELLMDBENCH_RELEASEHelm release name
-k FILELLMDBENCH_KUBECONFIG / KUBECONFIGKubeconfig path
--parallel NLLMDBENCH_PARALLELMax parallel stacks (default: 4)
--stack NAME[,NAME...]LLMDBENCH_STACKRestrict per-stack execution to the named subset. Useful in multi-stack scenarios (e.g. guides/multi-model-wva) to re-deploy a single pool without touching siblings. Unknown names fail loudly.
--monitoringLLMDBENCH_MONITORINGEnable PodMonitor creation and EPP verbosity during standup
--skip-smoketestSkip automatic smoketest after standup completes
--affinityLLMDBENCH_AFFINITYNode affinity / tolerations label
--annotationsLLMDBENCH_ANNOTATIONSExtra annotations for deployed resources
--wvaLLMDBENCH_WVAWorkload Variant Autoscaler config

Teardown Options

FlagEnv VarDescription
-s STEPSStep filter
-m MODELSLLMDBENCH_MODELSModel that was deployed (for resource name resolution)
-t METHODSLLMDBENCH_METHODSMethods to tear down (standalone, modelservice)
-r NAMELLMDBENCH_RELEASEHelm release name (default: llmdbench)
-d / --deepLLMDBENCH_DEEP_CLEANDeep clean: delete ALL resources in both namespaces
-p NSLLMDBENCH_NAMESPACEComma-separated namespaces (model,harness)
--stack NAME[,NAME...]LLMDBENCH_STACKRestrict teardown to the named subset. Useful for removing one pool from a multi-stack scenario while leaving siblings in place.
-k FILELLMDBENCH_KUBECONFIG / KUBECONFIGKubeconfig path

Experiment Options

FlagEnv VarDescription
-e FILELLMDBENCH_EXPERIMENTSExperiment YAML with setup and run treatments (required)
-p NSLLMDBENCH_NAMESPACENamespace(s)
-t METHODSLLMDBENCH_METHODSDeploy method
-m MODELSLLMDBENCH_MODELSModels to deploy
-k FILELLMDBENCH_KUBECONFIG / KUBECONFIGKubeconfig path
--parallel NLLMDBENCH_PARALLELMax parallel stacks (default: 4)
-f / --monitoringEnable monitoring during standup and run phases
-l HARNESSLLMDBENCH_HARNESSHarness name
-w PROFILELLMDBENCH_WORKLOADWorkload profile
-o OVERRIDESLLMDBENCH_OVERRIDESWorkload parameter overrides
-r DESTLLMDBENCH_OUTPUTResults destination (local, gs://, s3://)
-j NLLMDBENCH_PARALLELISMParallel harness pods
--wait-timeout NLLMDBENCH_WAIT_TIMEOUTSeconds to wait for harness completion
-x DATASETLLMDBENCH_DATASETDataset URL for harness replay
-d / --debugLLMDBENCH_DEBUGDebug mode: start harness pods with sleep infinity
--stop-on-errorAbort on first setup treatment failure
--skip-teardownLeave stacks running for debugging

Run Options

FlagEnv VarDescription
-s STEPSStep filter (e.g., 0,1,5 or 2-6)
-m MODELLLMDBENCH_MODELModel name override (e.g. facebook/opt-125m)
-p NSLLMDBENCH_NAMESPACENamespaces (deploy,benchmark)
-t METHODSLLMDBENCH_METHODSDeploy method used during standup
-k FILELLMDBENCH_KUBECONFIG / KUBECONFIGKubeconfig path
-l HARNESSLLMDBENCH_HARNESSHarness name (inference-perf, guidellm, vllm-benchmark)
-w PROFILELLMDBENCH_WORKLOADWorkload profile YAML
-e FILELLMDBENCH_EXPERIMENTSExperiment treatments YAML for parameter sweeping
-o OVERRIDESLLMDBENCH_OVERRIDESWorkload parameter overrides (param=value,...)
-r DESTLLMDBENCH_OUTPUTResults destination (local, gs://, s3://)
-j NLLMDBENCH_PARALLELISMParallel harness pods
-U URLLLMDBENCH_ENDPOINT_URLExplicit endpoint URL (run-only mode)
-c FILERun config YAML (run-only mode)
--generate-configGenerate config and exit
-x DATASETLLMDBENCH_DATASETDataset URL for harness replay
--wait-timeout NLLMDBENCH_WAIT_TIMEOUTSeconds to wait for harness completion
--monitoringEnable metrics scraping and EPP log capture during benchmark
-q / --serviceaccountLLMDBENCH_SERVICE_ACCOUNTService account name for harness pods
-g / --envvarspodLLMDBENCH_HARNESS_ENVVARS_TO_YAMLComma-separated env var names to propagate into harness pod
--analyzeRun local analysis on results after collection
-z / --skipLLMDBENCH_SKIPSkip execution, only collect existing results
-d / --debugLLMDBENCH_DEBUGDebug mode: start harness pods with sleep infinity
--stack NAME[,NAME...]LLMDBENCH_STACKRestrict the benchmark to the named subset of stacks. Endpoint URL auto-resolves for the selected stack - no need for --endpoint-url. When --stack selects exactly one stack, -m/--models scopes to that stack only.
--list-endpointsDetect per-stack endpoint URLs, print a copy-paste table of llmdbenchmark run invocations, and exit without launching any harness pods. Useful after standup to discover what's deployed.

Smoketest Options

Run post-deployment validation independently against an already-deployed stack.

llmdbenchmark --spec gpu smoketest -p my-namespace
llmdbenchmark --spec gpu smoketest -p my-namespace -s 2   # config validation only
FlagEnv VarDescription
-s STEPSStep filter (e.g., 0,1,2 or 0-2)
-p NSLLMDBENCH_NAMESPACENamespace(s)
-t METHODSLLMDBENCH_METHODSDeployment methods (standalone, modelservice, fma)
-k FILELLMDBENCH_KUBECONFIG / KUBECONFIGKubeconfig path
--parallel NLLMDBENCH_PARALLELMax parallel stacks (default: 4). Smoketest pins this to 1 regardless - parallel probes across stacks are confusing.
--stack NAME[,NAME...]LLMDBENCH_STACKRestrict smoketest to the named subset of stacks.

Smoketests also run automatically after standup unless --skip-smoketest is passed. See llmdbenchmark/smoketests/README.md for details on what each step validates.

Environment Variables

Every CLI flag can be set via a LLMDBENCH_* environment variable (see tables above). The priority chain is:

  1. CLI flag (highest) -- explicitly passed on the command line
  2. Environment variable -- exported in the user's shell
  3. Rendered config (lowest) -- defaults.yaml + scenario YAML

This is useful for CI/CD pipelines, .bashrc configuration, or migrating from the original bash-based workflow.

```bash

These use the env vars above; --dry-run overrides nothing, just adds a flag

llmdbenchmark standup --dry-run llmdbenchmark standup # live deploy to my-team-ns llmdbenchmark standup -p override-ns # CLI wins over env var ```

Boolean env vars accept 1, true, or yes (case-insensitive). Active LLMDBENCH_* overrides are logged at startup for debugging.

[Config Override Chain](config/README.md#config-override-chain)

Values flow through a merge pipeline during the plan phase:

Config Override Chain

Steps read from the rendered config.yaml and never define their own fallback defaults. If a required key is missing from the rendered config, the step raises a clear error. This ensures defaults.yaml is the single source of truth for all default values. Environment variables (LLMDBENCH_*) sit between scenario overrides and CLI flags in the priority chain.

See config/README.md for the full configuration reference, including how to override values.

Run - iterates every stack, each harness pod targets its own pool's endpoint.

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace

Benchmark just one pool (no --endpoint-url needed - auto-resolves):

llmdbenchmark --spec guides/multi-model-wva run -p my-namespace \ --stack qwen3-06b \ -l inference-perf -w sanity_random.yaml

Benchmark an existing endpoint (run-only mode)

Already have a model-serving endpoint running? Skip deployment entirely:

llmdbenchmark --spec gpu run \
  --endpoint-url http://10.131.0.42:80 \
  --model meta-llama/Llama-3.1-8B \
  --namespace my-namespace \
  --harness inference-perf \
  --workload sanity_random.yaml

This uses the same harness, profile rendering, and result collection pipeline -- just without the standup and teardown phases.

[!TIP] run can also be used in debug mode (-d / --debug) which starts the harness pod with sleep infinity so you can exec into it and run commands interactively. See this example.

See workload/README.md for the full experiment file format and all pre-built experiments, as well as advanced functionality.

CLI Reference

🎯 aiskill88 AI 点评 A 级 2026-05-26

高质量的AI基准测试工具

⚡ 核心功能
👥 适合人群
AI 技术爱好者研究人员和学生开发者和工程师技术创业者
🎯 使用场景
  • 本地部署运行,保护数据隐私,满足合规要求
  • 自定义集成到现有系统,扩展技术栈能力
  • 作为开源基础组件进行商业化二次开发
⚖️ 优点与不足
✅ 优点
  • +Apache-2.0 协议,可免费商用
  • +完全开源免费,无授权费用
  • +本地部署,数据完全自主可控
  • +开发者社区支持,遇问题可查可问
⚠️ 不足
  • 安装和初始配置可能需要一定技术基础
  • 功能完整性通常不如成熟商业产品
  • 技术支持主要依赖开源社区,响应速度不稳定
⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。

📄 License 说明

✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。

🔗 相关工具推荐
🧩 你可能还需要
基于当前 Skill 的能力图谱,自动补全的工具组合
❓ 常见问题 FAQ
参考项目文档和示例代码
💡 AI Skill Hub 点评

AI Skill Hub 点评:LLM-D基准测试 的核心功能完整,质量良好。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。

📚 深入学习 LLM-D基准测试
查看分步骤安装教程和完整使用指南,快速上手这款工具
🌐 原始信息
原始名称 llm-d-benchmark
Topics AIbenchmarkPython
GitHub https://github.com/llm-d/llm-d-benchmark
License Apache-2.0
语言 Python
🔗 原始来源
🐙 GitHub 仓库  https://github.com/llm-d/llm-d-benchmark 🌐 官方网站  https://www.llm-d.ai

收录时间:2026-05-26 · 更新时间:2026-05-26 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。