llm.quality.persona-consistency-mini

8 entries. Pareto frontier computed on throughput_tok_per_s (higher is better) vs. ttft_p50_ms (lower is better). Rows marked P are on the frontier.

Model	Engine	Hardware	Quant	TTFT P50 (ms)	TTFT P99 (ms)	Throughput (tok/s)	$/M tokens	J/token	Power avg (W)	Power peak (W)	WER mean	J / audio s	Envelope
meta-llama/Llama-3.1-70B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	33.45	—	—	—	—	—	—	—	—	JSON
microsoft/Phi-3.5-mini-instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	11.61	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-Coder-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.48	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-8B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.77	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.26	—	—	—	—	—	—	—	—	JSON
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	57.53	—	—	—	—	—	—	—	—	JSON
mistralai/Mistral-7B-Instruct-v0.3	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	12.37	—	—	—	—	—	—	—	—	JSON
google/gemma-2-9b-it	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	—	—	—	—	—	—	—	—	—	JSON