llm.quality.factual-mini

9 entries. Pareto frontier computed on throughput_tok_per_s (higher is better) vs. ttft_p50_ms (lower is better). Rows marked P are on the frontier.

Model	Engine	Hardware	Quant	TTFT P50 (ms)	TTFT P99 (ms)	Throughput (tok/s)	$/M tokens	J/token	Power avg (W)	Power peak (W)	WER mean	J / audio s	Envelope
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	45.01	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2-VL-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	29.44	—	—	—	—	—	—	—	—	JSON
mistralai/Mistral-7B-Instruct-v0.3	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	11.70	—	—	—	—	—	—	—	—	JSON
google/gemma-2-9b-it	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	16.89	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-8B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.08	—	—	—	—	—	—	—	—	JSON
microsoft/Phi-3.5-mini-instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	11.21	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-70B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	32.61	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-Coder-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	13.96	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.12	—	—	—	—	—	—	—	—	JSON