Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us

Netherlands/Nederlands

Home AI Solutions Gen AI White Papers

Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s

This document describes the performance improvement in inference token generation that tensor parallelism brings to Llama-2 and Llama-3 open source models from Meta, when running on a Dell XE9680 server with 8x H100 GPU accelerators.

Thank you for your feedback!