code.generation.humaneval-mini
8 entries.
Pareto frontier computed on
throughput_tok_per_s (higher is better) vs.
ttft_p50_ms (lower is better).
Rows marked P are on the frontier.
8 of 8 matching
| Model | Engine | Hardware | Quant | TTFT P50 (ms) | TTFT P99 (ms) | Throughput (tok/s) | $/M tokens | J/token | Power avg (W) | Power peak (W) | WER mean | J / audio s | Envelope | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| microsoft/Phi-3.5-mini-instruct | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 12.40 | — | — | — | — | — | — | — | — | JSON | |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 57.57 | — | — | — | — | — | — | — | — | JSON | |
| meta-llama/Llama-3.1-8B-Instruct | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 15.77 | — | — | — | — | — | — | — | — | JSON | |
| Qwen/Qwen2.5-Coder-7B-Instruct | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 15.18 | — | — | — | — | — | — | — | — | JSON | |
| Qwen/Qwen2.5-7B-Instruct | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 14.67 | — | — | — | — | — | — | — | — | JSON | |
| meta-llama/Llama-3.1-70B-Instruct | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 28.08 | — | — | — | — | — | — | — | — | JSON | |
| mistralai/Mistral-7B-Instruct-v0.3 | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 14.95 | — | — | — | — | — | — | — | — | JSON | |
| google/gemma-2-9b-it | vllm unknown | 8x NVIDIA H100 80GB HBM3 | fp16 | 17.90 | — | — | — | — | — | — | — | — | JSON |