Home > AI Solutions > Gen AI > White Papers > Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s
Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s
This document describes the performance improvement in inference token generation that tensor parallelism brings to Llama-2 and Llama-3 open source models from Meta, when running on a Dell XE9680 server with 8x H100 GPU accelerators.