Home > AI Solutions > Gen AI > White Papers > Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s > Llama-3 70B Tokens per second per GPU without any TTFT constraint
Similarly, we plotted the throughput efficiency per GPU for different tensor parallelism degrees at a full 8k tokens of context length (4k input and 4k output) without any time constraint.