Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
Netherlands/Nederlands
Home > AI Solutions > Gen AI > White Papers > Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s

Maximizing Llama Open Source Model Inference Performance with Tensor Parallelism on a Dell XE9680 with H100s

This document describes the performance improvement in inference token generation that tensor parallelism brings to Llama-2 and Llama-3 open source models from Meta, when running on a Dell XE9680 server with 8x H100 GPU accelerators.