Home > AI Solutions > Gen AI > Guides > Generative AI in the Enterprise with AMD Accelerators > Overview
In the dynamic field of artificial intelligence (AI) and natural language processing, large language models (LLMs) have become essential for tasks such as generating text, answering questions, and analyzing sentiment. The Meta Llama 3 model, known for its exceptional ability to comprehend context and produce coherent responses, exemplifies a powerful conversational agent. However, maximizing the performance of these models necessitates a strategic focus on a well-designed architecture that includes high-performance servers, high-speed networking, and scalable storage.
The convergence of AI and data-driven innovation has revolutionized industries worldwide. As organizations strive to use the full potential of AI, hardware acceleration emerges as a pivotal element for achieving superior performance. Recognizing this imperative, Dell Technologies, AMD, and Broadcom have joined forces to enhance the Dell portfolio for Generative AI in the Enterprise, offering businesses expanded options to support their unique AI initiatives.
The Dell PowerEdge XE9680 server is the foundation of this collaboration. Equipped with eight AMD Instinct™ MI300X accelerators, it offers enterprises unparalleled capabilities. Dell's first eight-way GPU platform, this server is engineered to enhance application performance by handling the most complex generative AI, machine learning, deep learning, and high-performance computing workloads.
The AMD Instinct MI300X accelerator is based on the next-generation AMD CDNA™ 3 architecture, delivering efficiency and performance for the most demanding AI and HPC applications. It is designed with 304 high throughput compute units, AI-specific functions including new data-type support, photo, and video decoding, plus 192 GB of HBM3 memory on a GPU accelerator. Using state-of-the-art die stacking and chiplet technology in a multichip package propels generative AI, machine learning, and inferencing, while extending AMD leadership in AI and HPC.
The AMD ROCm™ open-source software platform is optimized to extract the best AI workload performance from AMD Instinct MI300X accelerators while maintaining compatibility with industry software frameworks. ROCm consists of a collection of drivers, development tools, and APIs that enable accelerator programming from low-level kernel to end-user applications and is customizable to meet specific needs.
Broadcom 57608 Dual 200G Port PCIe Network Interface Cards (NICs) in Dell PowerEdge XE9680 servers provide connectivity to Ethernet fabric built using Broadcom’s Tomahawk family of switches. Each NIC can provide 400 Gbps of aggregate throughput in each direction, transmit or receive. Broadcom PCIe switches enable peer to peer data transfer between AMD MI300X accelerators and Broadcom Ethernet NICs. Broadcom’s open-source NIC drivers enable Remote Direct Access Memory (RDMA) over Converged Ethernet (RoCE) transport based high-performance collective communications among GPUs on different Dell PowerEdge XE9680 servers.
In summary, this high-performance design for generative AI provides a validated, modular, and scalable architecture based on the PowerEdge XE9680 server with AMD MI300X accelerators and ROCm software, Broadcom Ethernet NICs and PCIe switches, and Dell PowerSwitch networking and PowerScale storage. It also includes guidelines for implementation of LLM inferencing and model customization solutions, including fine-tuning and retrieval augmented generation (RAG) methodologies.