bench matrix¶

Run one benchmark across multiple endpoints from a single YAML config. Each target × concurrency point produces a signed envelope under --output, and the trailing Rich table summarises throughput, TTFT, and ok_rate per pair.

bench matrix automates the "run-it-N-times" shape that bench run covers for a single endpoint — useful for vLLM-vs-vLLM-vs-hosted comparisons captured in one command.

Synopsis¶

bench matrix <config.yaml> --output DIR [--signing-mode dev|keyless] [--dev-key PATH]
                           [--continue-on-error/--no-continue-on-error]

Example: Llama vs Qwen on two vLLM endpoints¶

matrix.yaml:

schema: inferencebench.matrix.v1
suite_id: llm.inference.chatbot-short
duration_s: 60
sweep: [1, 16]
targets:
  - name: llama-vllm
    model: meta-llama/Llama-3.1-8B-Instruct
    engine: vllm
    base_url: http://localhost:8000/v1
    quant: fp16
  - name: qwen-vllm
    model: Qwen/Qwen2.5-7B-Instruct
    engine: vllm
    base_url: http://localhost:8001/v1
    quant: fp16

bench matrix matrix.yaml --output ./matrix-results

Expected output (real conc=16 numbers from validation-runs/2026-05-16-cross-model-corpus/):

                                 Matrix results
 target        point  throughput_tok_per_s  ttft_p50_ms  ok_rate  envelope                       status
 llama-vllm    1      122.2                 13.98        1.000    llama-vllm-c1-814953250c16.json   ✓
 llama-vllm    16     1384                  41.69        1.000    llama-vllm-c16-60be8efd6d21.json  ✓
 qwen-vllm     1      120.0                 13.40        1.000    qwen-vllm-c1-07b69e640395.json    ✓
 qwen-vllm     16     1362                  40.98        1.000    qwen-vllm-c16-8d7ef1b17fb7.json   ✓

Envelope filenames are prefixed with <target-name>-c<point>-<content_hash[:12]>.json.

Flags¶

Flag	Default	Description
`--output`	required	Output directory for produced envelopes.
`--signing-mode`	`dev`	`dev` (local cosign key) or `keyless` (Sigstore OIDC).
`--dev-key`	`./cosign.key`	Path to local cosign signing key (used when `--signing-mode=dev`).
`--continue-on-error` / `--no-continue-on-error`	on	Keep going past failed targets. With `--no-continue-on-error`, stop the matrix on the first failure.

Config schema¶

Field	Required	Description
`schema`	yes (`inferencebench.matrix.v1`)	Schema identifier.
`suite_id`	yes	Fully-qualified benchmark id (e.g. `llm.inference.chatbot-short`).
`duration_s`	optional (default `60`)	Per-point measurement duration in seconds.
`sweep`	yes	Non-empty list of positive integer concurrency points.
`targets[].name`	yes	Unique short label used as the envelope filename prefix.
`targets[].model`	yes	Model id passed to the plugin.
`targets[].engine`	yes	Engine kind (e.g. `vllm`).
`targets[].base_url`	optional	Endpoint URL.
`targets[].quant`	optional	Quantization format string recorded on the envelope.
`targets[].api_key_env`	optional	Env var to read for the API key. Target is skipped (yellow warning) if the var is unset.
`targets[].extra`	optional	Extra `RunContext.extra` keys forwarded to the plugin.

Adding a hosted-OpenAI target¶

  # - name: openai-gpt4o
  #   model: gpt-4o-mini
  #   engine: openai
  #   base_url: https://api.openai.com/v1
  #   api_key_env: OPENAI_API_KEY

Phase 1 ships vLLM, SGLang (skeleton), llama.cpp, and provider-hosted engines via the OpenAI-compatible kind. Set the env var before invoking bench matrix; targets whose env var is missing are skipped with a warning rather than failing the whole matrix.

Exit codes¶

0 — every pair produced an envelope (or was skipped because of a missing API key).
1 — at least one pair errored, or no envelopes were produced.
2 — invalid YAML or missing --output.