Total throughput analysis with 2 second TTFT constraint

Thank you for your feedback!

In Figure 9, we plot total throughput for different tensor parallelism degrees for Llama-2 13B with 4k tokens of context length, that is, 2k input and 2k output. The results were filtered with a time constraint of Time To First Token (TTFT) less than 2 seconds.
Figure 9. Llama-2 13B: Total Throughput vs. Batch size: TTFT < 2 seconds
In Figure 10, we show the total throughput at the best batch size (maximum batch size for each tensor parallelism). The maximum throughput observed with TP 8 is 5018 tokens per second.
Figure 10. Llama-2 13B: Total Throughput at maximum batch size TP Comparison: TTFT < 2 seconds