Home > AI Solutions > Gen AI > White Papers > Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s > Llama-3 70B Throughput analysis without TTFT constraint
In 0, we plot total throughput for different tensor parallelism degrees for Llama-3 70B with 8k tokens of context length (4k input and 4k output).
In Figure 6, we show the total throughput at the best batch size (maximum batch size for each tensor parallelism). The maximum throughput observed with TP 8 is 2580.61 tokens per second.