Chapters
Executive summary
Scenarios
Conclusion
Appendix
This document describes the performance improvement in inference token generation that tensor parallelism brings to Llama-2 and Llama-3 open source models from Meta, when running on a Dell XE9680 server with 8x H100 GPU accelerators.