llm.quality.arithmetic-mini

6 entries. Pareto frontier computed on throughput_tok_per_s (higher is better) vs. ttft_p50_ms (lower is better). Rows marked P are on the frontier.

Model	Engine	Hardware	Quant	TTFT P50 (ms)	TTFT P99 (ms)	Throughput (tok/s)	$/M tokens	J/token	Power avg (W)	Power peak (W)	WER mean	J / audio s	Envelope
Qwen/Qwen2.5-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	13.40	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-8B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.12	—	—	—	—	—	—	—	—	JSON
microsoft/Phi-3.5-mini-instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	11.24	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-70B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	32.53	—	—	—	—	—	—	—	—	JSON
mistralai/Mistral-7B-Instruct-v0.3	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	12.41	—	—	—	—	—	—	—	—	JSON
google/gemma-2-9b-it	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	17.08	—	—	—	—	—	—	—	—	JSON