bench ci¶
Two subcommands that wire bench run and bench diff into a GitHub Actions regression workflow:
bench ci initwrites a stock.github/workflows/bench-regression.yml.bench ci validateparses an existing workflow and confirms the required shape (right triggers, callsbench run+bench diff --strict).
Synopsis¶
bench ci init [--out PATH] [--suite ID] [--model ID] [--engine NAME]
[--baseline PATH] [--runner LABELS] [--tolerance FLOAT] [--force]
bench ci validate [PATH]
Example: generate a workflow¶
Expected output:
bench ci init
workflow .github/workflows/bench-regression.yml
suite llm.inference.sharegpt-v3
model meta-llama/Llama-3.1-8B-Instruct
engine vllm
baseline .bench/baseline.json
runner self-hosted,gpu
tolerance 0.05
Wrote .github/workflows/bench-regression.yml. Commit it alongside a baseline envelope at .bench/baseline.json to start gating PRs on regressions.
The generated job:
- Installs
inferencebench+inferencebench-llmviauv pip --system. - Generates a dev signing key at
.bench/cosign.keyon first run. - Runs
bench run <suite> --model <id> --engine <kind> --base-url http://localhost:8000/v1 --duration 60 --output .bench/results. bench diff <baseline> <new> --strict --tolerance <value>— fails the build on any regression.- Uploads the envelope as an Actions artifact.
If .bench/baseline.json is missing on the first run the workflow promotes the current envelope to baseline and emits a ::warning::.
Example: validate an existing workflow¶
bench ci validate — .github/workflows/bench-regression.yml
check status fix
triggers include pull_request or workflow_dispatch PASS
a step runs `bench run` PASS
a step runs `bench diff` PASS
`bench diff` uses `--strict` PASS
All checks passed.
Exit code is 0 on all-pass, 1 if any check fails, 2 if the file can't be parsed as YAML.
Flags (init)¶
| Flag | Default | Description |
|---|---|---|
--out |
.github/workflows/bench-regression.yml |
Destination workflow file. |
--suite |
llm.inference.sharegpt-v3 |
Benchmark id passed to bench run. |
--model |
meta-llama/Llama-3.1-8B-Instruct |
Model id passed to bench run --model. |
--engine |
vllm |
Inference engine kind. |
--baseline |
.bench/baseline.json |
In-repo path to the baseline envelope. |
--runner |
self-hosted,gpu |
Comma-separated runs-on labels. |
--tolerance |
0.05 |
Regression tolerance forwarded to bench diff --tolerance. |
--force |
off | Overwrite an existing workflow file. |
See also¶
- bench diff — the regression detector under the hood
- Recipes: CI integration
- Recipes: regression check