Home > AI Solutions > Gen AI > Guides > Generative AI in the Enterprise with AMD Accelerators > Architecture overview
The Dell Validated Design for Generative AI Model Customization is designed to address the challenges of customizing LLMs for enterprise use cases. LLMs have shown tremendous potential in natural language processing tasks but require specialized infrastructure for efficient customization and deployment.
This reference architecture serves as a blueprint, offering organizations guidelines and best practices to design and implement scalable, efficient, and reliable infrastructure specifically tailored for generative AI models training and customization. While its primary focus is LLM customization, the architecture can be adapted for discriminative or predictive AI model training.
The following figure shows the key components of the reference architecture:
The following sections describe the key components of the reference architecture. Additional information about the multinode configuration is provided in Networking design.
The compute infrastructure is a critical component of the design, responsible for the efficient implementation of AI models. The PowerEdge XE9680 server supports the AMD Instinct MI300X accelerator, offering more choice for AI performance.
In this design, PowerEdge XE9680 servers are configured as worker nodes in a Kubernetes cluster. Omnia, an open‑source software for deploying and managing clusters, is used to deploy the Kubernetes cluster. A Kubernetes cluster is a group of interconnected servers that run containerized applications managed by Kubernetes, an open-source container orchestration system. Kubernetes clusters are highly scalable, making them ideal for managing containerized applications and complex distributed systems, offering features such as automated load balancing and self-healing.
This design incorporates three physical networks: a frontend network for management, storage, client/server traffic (sometimes referred to as north/south traffic); a backend network for internode GPU communication (sometimes referred to as east/west traffic) that is used for distributed training; and an out-of-band network for server management. Dell PowerSwitch switches running the Enterprise SONiC Distribution by Dell Technologies network operating system power these physical networks.
This design uses Dell PowerScale storage as a repository for datasets for model customization, models, model versioning and management, and model ensembles. We recommend it for storage and archival of inference data, including capture and retention of prompts and outputs when the model customization has been completed and put into inferencing operations. These recommendations are useful for marketing and sales or customer service applications in which further analysis of customer interactions might be desirable.
The flexible and robust storage capabilities of PowerScale offer the scale and speed necessary for training and operationalizing AI models, providing a foundational component for AI workflow. Its capacity to handle the vast data requirements of AI, combined with its reliability and high performance, cements the crucial role that external storage plays in successfully bringing AI models from conception to application.
This design uses inference, RAG, and fine-tuning with two sizes of the Llama 3 model, 8 B and 70 B, as the recommended foundation models for our validation scenarios. Other popular foundation models can also be used with this design.
Open-source tools are crucial for generative AI workflows because they foster collaboration, innovation, and ease of integration. This validated design uses open-source tools and frameworks for the long-term sustainability of generative AI, as this approach is better at supporting the core values of generative AI governance.
You can use various open-source components for inferencing, RAG, and fine-tuning with this validated design. We validated the design with the components shown in Figure 1.