Home > AI Solutions > Gen AI > White Papers > Maximizing AI Performance: A Deep Dive into Scalable Inferencing on Dell with NVIDIA > Use Case 3 - Cluster Testing with Multiple Instances of Llama 3 Models
In this cluster test, we aimed to demonstrate the scalability of Llama 3 8B models across multiple nodes within a Kubernetes (K8S) cluster. The focus was on the performance of the R760xa servers across the same platform in the context of AI Factory with NVIDIA. The objective was to deploy the Llama 3 8B model across two R760xa servers, fully loading both servers and distributing the load using a frontend load balancer. The test aimed to gather key metrics such as throughput, latency, and host system performance (GPU, CPU, memory).
Goals: