Use Case 3 - Cluster Testing with Multiple Instances of Llama 3 Models | Maximizing AI Performance: A Deep Dive into Scalable Inferencing on Dell with NVIDIA | Dell Technologies Info Hub

Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

None

None

Thank you for your feedback!

In this cluster test, we aimed to demonstrate the scalability of Llama 3 8B models across multiple nodes within a Kubernetes (K8S) cluster. The focus was on the performance of the R760xa servers across the same platform in the context of AI Factory with NVIDIA. The objective was to deploy the Llama 3 8B model across two R760xa servers, fully loading both servers and distributing the load using a frontend load balancer. The test aimed to gather key metrics such as throughput, latency, and host system performance (GPU, CPU, memory).
Goals:
- Configure a cluster spanning two R760xa servers.
- Scale Llama 3 8B models across the cluster to maximum capacity.
- Gather performance metrics (throughput, latency, GPU, CPU, memory).
- Demonstrate effective load balancing across the cluster.