Skip to content

CLI overview

The bench CLI is a Typer app. Every subcommand maps to a verb you can run on a benchmark or an envelope.

bench --help

Expected output:

Usage: bench [OPTIONS] COMMAND [ARGS]...

  InferenceBench Suite — vendor-neutral, signed-envelope AI benchmarks.

Options:
  --version          Show version and exit.
  -v, --verbose      Verbose logging (DEBUG level).
  --help             Show this message and exit.

Commands:
  run          Run a benchmark and produce a signed envelope.
  audit        Verify every envelope in a directory and report failures.
  bundle       Pack/unpack a one-file shareable envelope bundle (.bundle.zip).
  cache        Manage the local envelope fetch cache.
  ci           Generate or validate a GitHub Actions regression-check workflow.
  compare      Compare benchmark runs (Pareto frontier).
  cost         Compare model cost across providers.
  diff         Per-metric delta between two envelopes.
  doctor       Diagnose hardware health before benchmarking.
  export       Export an envelope as markdown / CSV / Slack snippet.
  fetch        Fetch a signed envelope from a remote URI.
  history      Time-series view of one metric across runs.
  leaderboard  Browse public leaderboards.
  list         List every benchmark across every installed plugin.
  matrix       Run one benchmark across multiple endpoints.
  plugin       Manage benchmark plugins.
  plugins      List installed plugins (shorthand for "bench plugin list").
  profile      Re-run a benchmark with high-frequency telemetry for diagnosis.
  publish      Publish a signed envelope (HF Hub, local).
  replay       Replay a benchmark from an existing envelope.
  schema       Emit JSON Schema for envelopes / benchmark specs / mirror index.
  summary      Summarise envelopes in a directory or file.
  verify       Verify a signed envelope's signature + content.
  watch        Watch an envelopes directory and rebuild the leaderboard on changes.

Commands at a glance

Command Purpose Page
bench run Execute a benchmark, produce a signed envelope (supports --sweep, --rps-sweep, --all-benchmarks) bench run
bench audit Verify every envelope in a directory and report failures bench audit
bench bundle Pack/unpack a one-file shareable envelope bundle (.bundle.zip) bench bundle
bench cache Inspect, locate, and clear the local fetch cache bench cache
bench ci Generate or validate a GitHub Actions regression workflow bench ci
bench compare Pareto-frontier comparison across N envelopes bench compare
bench cost Provider-cost comparison from the pricing registry bench cost
bench diff Per-metric delta between two envelopes (regression detection) bench diff
bench doctor Pre-run hardware diagnostic bench doctor
bench export Render an envelope as markdown / CSV / Slack bench export
bench fetch Download an envelope to local cache (hf://, https://, file://) bench fetch
bench history Time-series of one metric across an envelope corpus (with sparkline) bench history
bench leaderboard Render a static HTML site from a directory of envelopes bench leaderboard
bench list Catalogue every benchmark across every installed plugin bench list
bench matrix Run one benchmark across multiple endpoints in a single command bench matrix
bench plugin list, init, install, info subcommands bench plugin
bench plugins Shorthand for bench plugin list bench plugin
bench profile Re-run a benchmark with high-frequency NVML / RAPL sampling bench profile
bench publish Publish to HF Hub or local mirror bench publish
bench replay Re-run a benchmark from a signed envelope bench replay
bench schema Emit JSON Schema for envelope / benchmark-spec / mirror-index bench schema
bench summary Tabulate envelopes in a directory (--json for jq) bench summary
bench verify Verify a signed envelope's signature + content hash bench verify
bench watch Watch an envelopes directory and rebuild the leaderboard on changes bench watch

Global options

Flag Effect
--version Print version and exit.
-v, --verbose Enable DEBUG logging.
--help Show help for the command.

Output formats

bench defaults to Rich tables for terminal output. Commands that emit machine-readable data accept a flag for it: bench compare --report json, bench diff --report json, bench summary --json, bench list --json, bench history --json, bench audit --report json. Envelopes themselves are always canonical JSON on disk.

Exit codes

Code Meaning
0 Success
1 Operational failure (verification failed, hardware refused, regression with --strict, etc.)
2 Bad invocation (missing flag, invalid value, envelope not found)