Home > AI Solutions > Gen AI > White Papers > Dell Scalable Architecture for Retrieval-Augmented Generation (RAG) with NVIDIA Microservices > Hardware design
PowerEdge R760xa is a high-performance, scalable server for intensive GPU applications. The R760xa is a purpose-built server designed to boost acceleration performance for AI workloads like inferencing and RAG. We use two R760xa servers, each containing 2x NVIDIA H100 GPUs for 4 GPUs across 2 R760xa's. The PowerEdge R760xa can connect to the PowerScale using high-speed 100 GB Ethernet networking. A third server, such as the PowerEdge R660, should be used to create a 3-node Kubernetes deployment.
PowerScale F710 running OneFS 9.7 was selected due to its ability to handle demanding workloads. PowerScale with OneFS 9.7 provides a flexible, secure, and efficient scale-out NAS solution. PowerScale can deliver high-performance data access for applications like AI/ML. The OneFS operating system provides the intelligence behind the highly scalable, high–performance modular PowerScale storage solution that can grow with your business.
Dell PowerSwitches are used throughout the build for different types of traffic. The PowerSwitch N3248TE-ON is used for the Out of Band network to connect to iDRACs and management ports. The PowerSwitch S5248F-ON is used for in-band North-South server traffic to handle front-end client connections to Ubuntu and Kubernetes front-end service IPs. The PowerScale NAS has 100Gb Z9664F-ON switches to support NAS front-end traffic and different Z9664F-ON for backend traffic.
Figure 2: Network diagram
Figure 3: Rack Configuration