Home > AI Solutions > Gen AI > White Papers > Maximizing AI Performance: A Deep Dive into Scalable Inferencing on Dell with NVIDIA > Use Case 2 - Scaling Up Multiple Instances of Llama 3 Models
The second use case involves executing multiple Llama 3 8B model instances on the PowerEdge XE9680 and R760xa servers. The goal is to gather data on latency and throughput as the number of instances increases and to collect system metrics such as CPU, GPU, memory, and network utilization. This helps understand the performance impact of scaling up the number of model instances on each server.
Goals