Home > AI Solutions > Gen AI > White Papers > Maximizing AI Performance: A Deep Dive into Scalable Inferencing on Dell with NVIDIA > Solution approach
For each use case test, we installed a Kubernetes front-end load balancer. This is done to keep every metric as identical as possible across single or multiple instances of Llama 3.