llm.mt.flores-200-mini-en-fr

6 entries. Pareto frontier computed on throughput_tok_per_s (higher is better) vs. ttft_p50_ms (lower is better). Rows marked P are on the frontier.

Model	Engine	Hardware	Quant	TTFT P50 (ms)	TTFT P99 (ms)	Throughput (tok/s)	$/M tokens	J/token	Power avg (W)	Power peak (W)	WER mean	J / audio s	Envelope
meta-llama/Llama-3.1-70B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	32.03	—	—	—	—	—	—	—	—	JSON
mistralai/Mistral-7B-Instruct-v0.3	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	12.00	—	—	—	—	—	—	—	—	JSON
Qwen/Qwen2.5-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	bf16	12.99	—	—	—	21.40	643	848	—	—	JSON
Qwen/Qwen2.5-7B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.11	—	—	—	—	—	—	—	—	JSON
google/gemma-2-9b-it	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	16.77	—	—	—	—	—	—	—	—	JSON
meta-llama/Llama-3.1-8B-Instruct	vllm unknown	8x NVIDIA H100 80GB HBM3	fp16	14.16	—	—	—	—	—	—	—	—	JSON