bench ci¶

Two subcommands that wire bench run and bench diff into a GitHub Actions regression workflow:

bench ci init writes a stock .github/workflows/bench-regression.yml.
bench ci validate parses an existing workflow and confirms the required shape (right triggers, calls bench run + bench diff --strict).

Synopsis¶

bench ci init     [--out PATH] [--suite ID] [--model ID] [--engine NAME]
                  [--baseline PATH] [--runner LABELS] [--tolerance FLOAT] [--force]
bench ci validate [PATH]

Example: generate a workflow¶

bench ci init

Expected output:

                bench ci init
 workflow   .github/workflows/bench-regression.yml
 suite      llm.inference.sharegpt-v3
 model      meta-llama/Llama-3.1-8B-Instruct
 engine     vllm
 baseline   .bench/baseline.json
 runner     self-hosted,gpu
 tolerance  0.05
Wrote .github/workflows/bench-regression.yml. Commit it alongside a baseline envelope at .bench/baseline.json to start gating PRs on regressions.

The generated job:

Installs inferencebench + inferencebench-llm via uv pip --system.
Generates a dev signing key at .bench/cosign.key on first run.
Runs bench run <suite> --model <id> --engine <kind> --base-url http://localhost:8000/v1 --duration 60 --output .bench/results.
bench diff <baseline> <new> --strict --tolerance <value> — fails the build on any regression.
Uploads the envelope as an Actions artifact.

If .bench/baseline.json is missing on the first run the workflow promotes the current envelope to baseline and emits a ::warning::.

Example: validate an existing workflow¶

bench ci validate .github/workflows/bench-regression.yml

                       bench ci validate — .github/workflows/bench-regression.yml
 check                                                       status  fix
 triggers include pull_request or workflow_dispatch          PASS
 a step runs `bench run`                                     PASS
 a step runs `bench diff`                                    PASS
 `bench diff` uses `--strict`                                PASS
All checks passed.

Exit code is 0 on all-pass, 1 if any check fails, 2 if the file can't be parsed as YAML.

Flags (`init`)¶

Flag	Default	Description
`--out`	`.github/workflows/bench-regression.yml`	Destination workflow file.
`--suite`	`llm.inference.sharegpt-v3`	Benchmark id passed to `bench run`.
`--model`	`meta-llama/Llama-3.1-8B-Instruct`	Model id passed to `bench run --model`.
`--engine`	`vllm`	Inference engine kind.
`--baseline`	`.bench/baseline.json`	In-repo path to the baseline envelope.
`--runner`	`self-hosted,gpu`	Comma-separated `runs-on` labels.
`--tolerance`	`0.05`	Regression tolerance forwarded to `bench diff --tolerance`.
`--force`	off	Overwrite an existing workflow file.

bench ci¶

Synopsis¶

Example: generate a workflow¶

Example: validate an existing workflow¶

Flags (init)¶

See also¶

Flags (`init`)¶