Home > AI Solutions > Gen AI > Guides > Design Guide—Generative AI Digital Assistants in the Enterprise > Architecture overview
The following figure shows an overview of the physical architecture, along with the main software components:
Figure 3. Digital assistant solution architecture with physical infrastructure
Dell Technologies provides a diverse selection of acceleration-optimized servers with an extensive portfolio of accelerators featuring NVIDIA GPUs. As shown in Figure 3, we showcase Dell PowerEdge R760xa servers, which are purpose-built for generative AI for the primary workloads, plus Dell PowerEdge R760 servers for the control plane nodes.
PowerEdge R760xa servers support the following GPU accelerator configurations:
The selection and sizing of the proper server and GPU configurations will be done as part of the service engagement for each customer.
Organizations can choose between 25 Gb or 100 Gb networking infrastructure based on their specific requirements. For LLM inferencing tasks using text data, we recommend using existing network infrastructure with 25 Gb Ethernet, which adequately meets the bandwidth demands of text data. To future-proof the infrastructure, a 100 Gb Ethernet setup can be used.
PowerSwitch S5232F-ON or PowerSwitch S5248F-ON switches can be used as the network switch. PowerSwitch S5232F-ON supports both 25 Gb and 100 Gb Ethernet, while PowerSwitch S5248F-ON is a 25 Gb Ethernet switch. ConnectX-6 Network adapters are used for network connectivity. They are available in both 25 Gb and 100 Gb options.
Local storage that is available in PowerEdge servers is used for operating system and container storage. Kubernetes deploys the Rancher local path Storage Class that makes local storage available for provisioning pods.
The need for external storage for the digital assistant solution depends on the specific requirements and characteristics of the AI model and the deployment environment. Often, external file storage is required for storing the graphic elements used to render the digital assistant itself. LLMs, on the other hand, reside in GPU memory after having been loaded from either local storage, external file, or block storage.
Dell PowerScale offers massive AI performance with the ultimate density. It accelerates all phases of the AI pipeline, from model training to inferencing and fine-tuning. Boasting up to 24 NVMe SSD drives per node and 300 PBs of storage per cluster, it ensures GPU utilization for large-scale model training and drives faster time to AI insights with up to 127 percent improved throughput[1].
PowerScale is no stranger to AI-optimized infrastructure, being one of the first storage products to offer GPUDirect support with NVIDIA, low latency storage access with Network File System over Remote Direct Memory Access (NFSoRDMA), multitenant capabilities, simultaneous multiprotocol support, and 6x9s availability and resiliency to ensure uninterrupted uptime.
Building on that foundation and using continuous software and hardware innovation, our next-generation all-flash systems form a key component of Dell’s AI-ready data platform.
They offer multicloud agility with our Dell APEX File Storage for public cloud portfolio, federal-grade security features to safeguard the AI process from attacks such as data poisoning and model inversion, and exceptional efficiency with the world’s most efficient scale-out NAS[2] to manage the AI data growth while controlling storage costs. PowerScale is the first ethernet-based NVIDIA DGX SuperPOD-certified storage platform and is enabling seamless AI Adoption for enterprise customers everywhere.
[1] Disclosure: Based on internal testing, comparing streaming write of F910 on OneFS 9.8 to streaming write of F900 on OneFS 9.5. Results might vary. April 2024
[2] Disclosure: Based on Dell analysis comparing efficiency-related features─data reduction, storage capacity, data protection, hardware, space, life cycle management efficiency, and ENERGY STAR certified configurations, June 2023.