code.generation.humaneval-mini

9 entries. Pareto frontier computed on throughput_tok_per_s (higher is better) vs. ttft_p50_ms (lower is better). Rows marked P are on the frontier.

Model	Engine	Hardware	Quant	TTFT P50 (ms)	TTFT P99 (ms)	Throughput (tok/s)	$/M tokens	J/token	Power avg (W)	Power peak (W)	WER mean	J / audio s	Envelope
microsoft/Phi-3.5-mini-instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	12.40	—	—	—	—	—	—	—	—	JSON
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	57.57	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-8B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	15.77	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-Coder-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	15.18	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.67	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-70B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	28.08	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	bf16	13.47	—	—	—	6.33	881	935	—	—	JSON
mistralai/Mistral-7B-Instruct-v0.3	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.95	—	—	—	—	—	—	—	—	JSON
google/gemma-2-9b-it	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	17.90	—	—	—	—	—	—	—	—	JSON