InferenceBench

code.generation.mbpp-mini

2 entries. Pareto frontier computed on throughput_tok_per_s (higher is better) vs. ttft_p50_ms (lower is better). Rows marked P are on the frontier.

2 of 2 matching
Model Engine Hardware Quant TTFT P50 (ms) TTFT P99 (ms) Throughput (tok/s) $/M tokens J/token Power avg (W) Power peak (W) WER mean J / audio s Envelope
Qwen/Qwen2.5-Coder-7B-Instruct vllm unknown 8x NVIDIA H100 80GB HBM3 fp16 18.07 JSON
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct vllm unknown 8x NVIDIA H100 80GB HBM3 fp16 55.04 JSON