Home > AI Solutions > Gen AI > White Papers > Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s > Total throughput analysis with 2 second TTFT constraint
In Figure 9, we plot total throughput for different tensor parallelism degrees for Llama-2 13B with 4k tokens of context length, that is, 2k input and 2k output. The results were filtered with a time constraint of Time To First Token (TTFT) less than 2 seconds.
In Figure 10, we show the total throughput at the best batch size (maximum batch size for each tensor parallelism). The maximum throughput observed with TP 8 is 5018 tokens per second.