CLI overview¶
The bench CLI is a Typer app. Every subcommand maps to a verb you can run on a benchmark or an envelope.
Expected output:
Usage: bench [OPTIONS] COMMAND [ARGS]...
InferenceBench Suite — vendor-neutral, signed-envelope AI benchmarks.
Options:
--version Show version and exit.
-v, --verbose Verbose logging (DEBUG level).
--help Show this message and exit.
Commands:
run Run a benchmark and produce a signed envelope.
audit Verify every envelope in a directory and report failures.
bundle Pack/unpack a one-file shareable envelope bundle (.bundle.zip).
cache Manage the local envelope fetch cache.
ci Generate or validate a GitHub Actions regression-check workflow.
compare Compare benchmark runs (Pareto frontier).
cost Compare model cost across providers.
diff Per-metric delta between two envelopes.
doctor Diagnose hardware health before benchmarking.
export Export an envelope as markdown / CSV / Slack snippet.
fetch Fetch a signed envelope from a remote URI.
history Time-series view of one metric across runs.
leaderboard Browse public leaderboards.
list List every benchmark across every installed plugin.
matrix Run one benchmark across multiple endpoints.
plugin Manage benchmark plugins.
plugins List installed plugins (shorthand for "bench plugin list").
profile Re-run a benchmark with high-frequency telemetry for diagnosis.
publish Publish a signed envelope (HF Hub, local).
replay Replay a benchmark from an existing envelope.
schema Emit JSON Schema for envelopes / benchmark specs / mirror index.
summary Summarise envelopes in a directory or file.
verify Verify a signed envelope's signature + content.
watch Watch an envelopes directory and rebuild the leaderboard on changes.
Commands at a glance¶
| Command | Purpose | Page |
|---|---|---|
bench run |
Execute a benchmark, produce a signed envelope (supports --sweep, --rps-sweep, --all-benchmarks) |
bench run |
bench audit |
Verify every envelope in a directory and report failures | bench audit |
bench bundle |
Pack/unpack a one-file shareable envelope bundle (.bundle.zip) |
bench bundle |
bench cache |
Inspect, locate, and clear the local fetch cache | bench cache |
bench ci |
Generate or validate a GitHub Actions regression workflow | bench ci |
bench compare |
Pareto-frontier comparison across N envelopes | bench compare |
bench cost |
Provider-cost comparison from the pricing registry | bench cost |
bench diff |
Per-metric delta between two envelopes (regression detection) | bench diff |
bench doctor |
Pre-run hardware diagnostic | bench doctor |
bench export |
Render an envelope as markdown / CSV / Slack | bench export |
bench fetch |
Download an envelope to local cache (hf://, https://, file://) |
bench fetch |
bench history |
Time-series of one metric across an envelope corpus (with sparkline) | bench history |
bench leaderboard |
Render a static HTML site from a directory of envelopes | bench leaderboard |
bench list |
Catalogue every benchmark across every installed plugin | bench list |
bench matrix |
Run one benchmark across multiple endpoints in a single command | bench matrix |
bench plugin |
list, init, install, info subcommands |
bench plugin |
bench plugins |
Shorthand for bench plugin list |
bench plugin |
bench profile |
Re-run a benchmark with high-frequency NVML / RAPL sampling | bench profile |
bench publish |
Publish to HF Hub or local mirror | bench publish |
bench replay |
Re-run a benchmark from a signed envelope | bench replay |
bench schema |
Emit JSON Schema for envelope / benchmark-spec / mirror-index | bench schema |
bench summary |
Tabulate envelopes in a directory (--json for jq) |
bench summary |
bench verify |
Verify a signed envelope's signature + content hash | bench verify |
bench watch |
Watch an envelopes directory and rebuild the leaderboard on changes | bench watch |
Global options¶
| Flag | Effect |
|---|---|
--version |
Print version and exit. |
-v, --verbose |
Enable DEBUG logging. |
--help |
Show help for the command. |
Output formats¶
bench defaults to Rich tables for terminal output. Commands that emit machine-readable data accept a flag for it: bench compare --report json, bench diff --report json, bench summary --json, bench list --json, bench history --json, bench audit --report json. Envelopes themselves are always canonical JSON on disk.
Exit codes¶
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Operational failure (verification failed, hardware refused, regression with --strict, etc.) |
| 2 | Bad invocation (missing flag, invalid value, envelope not found) |