Home > Servers > Specialty Servers > White Papers > Deploy and Finetune Llama2 70B Chat on PowerEdge XE9680 with AMD Instinct MI300X > Software components
This section offers a comprehensive overview of each software component, to lay the groundwork for a detailed exploration of the step-by-step process of deploying and fine-tuning the Llama2 70B chat model.
Ubuntu Server is a popular choice for AI servers due to its compatibility with leading AI and machine learning libraries, stability, security features, and extensive software repository. The PowerEdge XE9680 supports Ubuntu 22.04, and it is used here. Learn more at https://releases.ubuntu.com/jammy/.
The AMD ROCm 6 open-source software platform is optimized to extract the best HPC and AI workload performance from AMD Instinct MI300 accelerators. It provides expanded support for AMD Instinct MI300 accelerators while maintaining compatibility with industry software frameworks. ROCm consists of drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications. It is customizable to meet your specific needs. To learn more, see https://www.amd.com/en/products/software/rocm.html.
Docker
Docker is a containerization platform that streamlines the deployment and management of AI workloads by encapsulating them and their dependencies into portable, isolated containers. To learn more, see https://www.docker.com/.
Dell Technologies and Hugging Face collaborate to make it easy for enterprises to create, fine-tune, and implement their own open-source generative AI (GenAI) models with the Hugging Face community on industry-leading Dell infrastructure products and services. To learn more, see https://huggingface.co/docs/transformers/en/index.
vLLM
vLLM is a high-speed library designed for LLM inference and serving. It offers exceptional performance with state-of-the-art serving throughput, efficient memory management, continuous batching, and fast model execution. Additionally, vLLM provides flexibility and ease of use by seamlessly integrating with popular Hugging Face models, supporting various decoding algorithms for high-throughput serving, enabling tensor parallelism for distributed inference, and offering streaming outputs. It is compatible with NVIDIA and AMD GPUs and includes experimental features, such as prefix caching and multi-lora support. To learn more, see https://github.com/vllm-project/vllm.
Llama 2 represents a significant advancement in the field of large-scale language modeling. Developed by Meta AI and released in 2023, Llama 2 is a family of pre-trained and fine-tuned large language models (LLMs). These models are available for research and commercial use, catering to various natural language processing (NLP) tasks. Llama2 Chat Models come in multiple sizes, ranging from 7 billion to 70 billion parameters, allowing flexibility based on computing capabilities and specific applications. Llama2 includes specialized fine-tuned chat models optimized for dialog use cases, known as Llama-2-Chat, with improved scale, efficiency, and performance. To learn more, see https://huggingface.co/meta-llama/Llama-2-70b-chat-hf.
Gradio is an open-source Python package that enables you to quickly create easy-to-use, customizable UI components for your machine-learning models, APIs, or even arbitrary Python functions. Gradio simplifies the process of showcasing machine learning models, APIs, or data science workflows. It is a powerful tool for creating interactive interfaces. To learn more about it, see https://www.gradio.app/docs/interface.
The Guanaco-llama2-1k dataset was chosen for fine-tuning the Llama2 70B Chat Model. It contains 1,000 rows of training data. Researchers and practitioners can access this dataset using the Hugging Face Hub. By leveraging the Guanaco-llama2-1k dataset, they can customize the Llama2 model to excel in question-answering scenarios, ultimately making AI more accessible and adaptable. The dataset is accessible here: https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k/tree/main.