Recipe: Run bench in a container¶
The Dockerfile at the repo root builds a self-contained image with the bench CLI and all bundled plugins installed. Use it when you want a clean, reproducible environment that doesn't touch the host Python install — typical in CI runners, on multi-tenant build hosts, or when sharing a quick benchmark with a teammate.
1. Build the image¶
From the repo root:
The build is multi-stage. Stage 1 uses uv build --all-packages to produce wheels for every workspace package (cli, harness, envelope, the plugins, the integrations). Stage 2 is a slim python:3.12-slim runtime that pip installs those wheels under a non-root bench user. The final image is ~120 MB compressed.
2. Sanity-check it¶
docker run --rm yobitelcomm/bench:0.0.2 --help
docker run --rm yobitelcomm/bench:0.0.2 list
docker run --rm yobitelcomm/bench:0.0.2 doctor
--help, list, and doctor need no external state, so they confirm the image is wired up correctly.
3. Use the helper script¶
scripts/run_in_container.sh wraps a standard docker run invocation:
./scripts/run_in_container.sh run llm.inference \
--model meta-llama/Llama-3.1-8B-Instruct \
--engine vllm \
--endpoint http://localhost:8000/v1
The helper:
- Mounts the current working directory at /work and cd's into it (so envelopes the CLI writes are visible on the host).
- Uses --network host so the container can reach a vLLM/SGLang/TRT-LLM/llama.cpp/MLX endpoint running on localhost of the host.
- Picks the image from $BENCH_IMAGE, defaulting to yobitelcomm/bench:latest. Pin a specific tag when reproducibility matters:
Caveats¶
- Networking.
--network hostis the simplest way to reach a local inference server. On Docker Desktop (macOS / Windows)--network hostdoesn't behave the same way — use--add-host=host.docker.internal:host-gatewayand point--endpointathttp://host.docker.internal:8000/v1instead. - GPU access. This image is CPU-only by design; the CLI itself doesn't need a GPU. The inference engine (vLLM, SGLang, TRT-LLM, llama.cpp, mlx_lm.server) runs outside the container and the CLI just talks to it over HTTP. If you want to run the engine in a container too, see the engine's own image.
- Signing keys.
bench publishand signed-envelope flows need access to a Sigstore dev key or OIDC credentials. Mount the key explicitly:-v ~/.config/bench/cosign.key:/home/bench/.config/bench/cosign.key:ro. The.dockerignoredeliberately excludescosign.key/cosign.pubso they never end up baked into the image. - Output directories. The CLI writes envelopes and per-request samples to
--output(defaultvalidation-runs/). Mount that path in if you want results to persist on the host — the helper script handles this by mounting$(pwd)at/work.