Empowering Enterprises with Generative AI: How Does MLPerf™ Help Support Requirements?
Fri, 14 Apr 2023 17:05:26 -0000
|Read Time: 0 minutes
Generative AI has developed into a critical workload in the deep learning ecosystem. In the generative AI world, 2023 has been a year of explosive growth as generative AI continues to make huge progress by improving the quality and ease of access to these ecosystems. With the advent of ChatGPT, Stable Diffusion, and so on, which have gained significant popularity, we can consider generative AI to be one of the pivotal use cases that mainstreams AI to the world. We expect to see generative AI push new frontiers and enable an explosion of productivity. This blog provides an overview of generative AI and its relevance to the MLCommonsTM AI system benchmark to which we submit on a frequent basis.
Introduction to Generative AI
Generative AI is a phenomenon by which AI systems (consisting of hardware and software) can produce plausible renders of images, audio, video, text, code, 3D renders, and so on when given an instruction prompt. The prompt can be text, voice, or other forms. Some popular examples include ChatGPT, Stable Diffusion image generator, and Text to speech engines.
These AI systems can enable a significant productivity boost by generating and modifying existing pieces of content that effectively improve the user’s workflow.
What can these AI Systems do?
Generative AI is capable of generating and optimizing:
- Chat and Text─This modality is useful for customer support, for generating blogs, ad copies, design guides, and technical reports, reading and taking action, answering questions, summarizing large documents, producing code that can run directly, inspiring developers to write improved code, and so on.
- Video generations:
- Talking head videos─These videos can be useful for content producers, tutorial guides, and so on in which personas are able to communicate with voice, lip syncing, and emotions, these are helpful for customer support and other interactive services.
- NERF (neural radiance fields) – Given a few angles from pictures, it can produce an entire scene of smooth footage that looks to be real. NERF can be useful in providing more perspective for a scene and enable more interesting viewpoints.
- High-resolution images─These creative images can be used for multiple purposes including B-rolls, explanation of ideas and simulated concepts, special effects, graphic vectors, infographics, backdrops, scenes and so on.
- High-fidelity audio─These audio samples can be voices, music, and so on. Voices can deliver emotions, be of high quality like voiceovers, and deliver speech for advertisements. Audio samples can also be songs for karaoke, songs with beats, customer support and so on.
- 3D Generations─These renders are useful for producing a new world with just imagination. They are powerful for VFX, VR, and other immersive experiences. These 3D generations can be used for creating digital clones of the real world, games, commercials, movies, and so on.
This blog does not highlight many other use cases. With more innovation and research, there will be a Cambrian explosion of more use cases that are fueled by generative AI. These models can also produce personalized content for the end user as opposed to serving generic material.
What kind of compute is needed to train these AI systems?
Training generative AI systems is a compute-intensive task. Typically, text generation, chatbots, and instruction followers have billions of parameters and use thousands of GPU hours. This task presents a large problem needing different mechanisms of parallelization, training update optimizations, including full stack (hardware and software) optimization, and so on.
For instance, the GPT3 model has 175 B parameters and the Megatron model has approximately 530 B parameters. Training and Inference procedures for these systems are significantly different than the traditional deep learning models that do not have as many parameters. For instance, large language models (LLMs) require large inference setups including multinode inference, scaling training to a trillion parameter models needing different mechanisms including dynamic sparsity, optimizing communication costs, self-tuning, and so on.
In essence, the compute needs for generative AI are ever growing in unique ways. While training generative AI models remains crucial for compute needs, the subsequent necessity for compute could be arriving from fine tuning and inferencing needs.
Why adapt now?
Generative AI has been in development for many years now; Transformers, Wavenet, GANs, Autoencoders with decoders, and so on have been around for quite some time. There has been much innovation in these areas, which continues to be mixed and matched to meet productive outcomes. For instance, the growth of multimodal models (models that take different kinds of inputs) facilitate a more collaborative workflow. Multimodal models form the cornerstone for enabling near human intelligence for a specific task. Although there is small chance of reaching human-level performance overall, these multimodal models can produce plausible results. Consumers of these systems can take the outputs, modify them and use them in their workflow. These systems render output quickly compared to a manual effort and provide more layers of creativity.
These plausible renders, ease of access, and open-source development have been an incredible fuel for popularizing generative AI systems. The next step is pushing these systems to perform better, whether by improving quality of service or improving throughput. Improving quality of service and throughput is an already established problem. To improve convergence and throughput, many benchmarks have been attempting to optimize AI systems.
Relationship to MLCommons
The MLCommons Training benchmark has been instrumental in enabling significant improvements for convergence of the training time of systems by taking a holistic view of the hardware and software. The MLCommons Inference benchmark has been conducive for optimizing the inference of AI systems.
Furthermore, MLCommons has generative AI benchmarks in their road map. For instance, LLM is part of MLCommons v3.0 training; Stable Diffusion is scheduled to be included as part of MLCommons v3.1 training.
The need to continuously improve systems is essential, more so now for generative AI use cases. We can see that the MLCommons community has made significant improvements in performance every year. These optimizations from vendors, benchmarks, and the deep learning community continue to serve this generative AI effort. All these efforts make adoption of generative AI more attractive now.
Paradigms
Some fundamental models that are used for generative AI workloads in MLPerf benchmarks include:
Figure 1: Transformer architecture
- Transformer─This model uses an attention mechanism to model areas of interest in a specific context. This method allows building relationships that signify how one element relates to others.
Figure 2: U-Net architecture
- 3d-UNet─This model uses convolution and pooling blocks to set up a contractive and expanding path that creates a bottleneck. The image is reconstructed from this bottleneck. The bottleneck captures the compression of data; only important information is used to reconstruct the image.
How is generative AI relevant to MLPerf Training and Inference?
MLPerf Training uses the BERT language model. Many text-based generative AI workloads are LLMs. While BERT is not as large as GPT3 (about 1/500th the size of GPT3 based on a number of parameters (340 M compared to 175 B)) it has fundamental blocks that GPT3 uses.
For instance, BERT uses multiple Attention Heads, Layernorms SoftMax, and so on, which GPT3 also uses. While parameters, layer count, and model size are larger for GPT3, BERT uses fundamentally similar procedures that are essential for training.
Conversely, Stable Diffusion uses UNet layers. This method is useful for constructing images of high quality. It takes encoded text and uses the UNet bottleneck to effectively enable a denoising procedure. 3D-UNet is a part of the MLPerf benchmark, which is optimized.
The preceding examples show that optimizations used in MLPerf are transferable, and we can use current MLPerf models to be a relative proxy to the generative AI workloads.
Furthermore, MLPerf includes LLMs and Stable Diffusion on the road map for the upcoming training submission versions. We can expect optimized versions of these implementations to be made available to the public.
The links in the references show optimizations made by NVIDIA for the benchmarks. Customers can take the already optimized references and use them for their generative AI use cases.
We recognize the importance of AI workloads including generative AI. Therefore, we submit to MLCommons benchmarks that provide like-to-like comparisons with different OEMs and vendors. Scale is an important aspect of generative AI workloads. We have introduced the new PowerEdge XE9680 server that produces stellar performance at scale. The following figure shows the performance improvement from MLPerf Inference v2.1 to MLPerf Inference v3.0.
* MLPerf ID 2.1-0014 and MLPerf ID 3.0-0013
Figure 3: MLPerf Inference 3.0 vs Inference 2.1 performance improvement from XE9680 server having 8xH100 GPUs compared to XE8545 having 4xA100 GPUs
PowerEdge XE9680 and XE8545 systems are an excellent choice for generative AI workloads. Customers can expect faster time to value and these systems scale very well, as attributed by the MLPerf training results.
Conclusion
While generative AI has produced enormous excitement, there are many challenges such as biased outputs, incorrect answers, hallucinations, instability, and so on that require monitoring and policing. Generative AI systems still cannot make autonomous decisions tied to other algorithms for mission-critical applications.
The latest MLPerf Inference 3.0 results show up to three times to eight times improvements for all categories. These improvements show Dell Technologies’ commitment to continuously enable improvement in performance. We understand generative AI is an important class of AI workload; Dell hardware supports these workloads. By upgrading to the latest servers, such as the PowerEdge XE9680 servers, customers can derive a faster time to value. Dell Technologies can help customers adapt and deploy generative AI workloads.
To summarize, compute, quality of service (plausible outputs), open-source development, and ease of access are major drivers for mass adoption of generative AI. Organizations can leverage these drivers to produce outputs for their workflow. Enabling these systems with humans in the loop are good first steps to boosting productivity. Dell Technologies has been making MLPerf submissions to show how our servers can deliver excellent performance. The optimizations made for MLPerf are transferable to generative AI workloads.
References
https://arxiv.org/abs/1706.03762
https://arxiv.org/abs/1505.04597
https://developer.nvidia.com/blog/leading-mlperf-training-2-1-with-full-stack-optimizations-for-ai/
https://developer.nvidia.com/blog/boosting-mlperf-training-performance-with-full-stack-optimization/
Related Blog Posts
Dell Technologies Shines in MLPerf™ Stable Diffusion Results
Tue, 12 Dec 2023 14:51:21 -0000
|Read Time: 0 minutes
Abstract
The recent release of MLPerf Training v3.1 results includes the newly launched Stable Diffusion benchmark. At the time of publication, Dell Technologies leads the OEM market in this performance benchmark for training a Generative AI foundation model, especially for the Stable Diffusion model. With the Dell PowerEdge XE9680 server submission, Dell Technologies is differentiated as the only vendor with a Stable Diffusion score for an eight-way system. The time to converge by using eight NVIDIA H100 Tensor Core GPUs is 46.7 minutes.
Overview
Generative AI workload deployment is growing at an unprecedented rate. Key reasons include increased productivity and the increasing convergence of multimodal input. Creating content has become easier and is becoming more plausible across various industries. Generative AI has enabled many enterprise use cases, and it continues to expand by exploring more frontiers. This growth can be attributed to higher resolution text to image, text-to-video generations, and other modality generations. For these impressive AI tasks, the need for compute is even more expansive. Some of the more popular generative AI workloads include chatbot, video generation, music generation, 3D assets generation, and so on.
Stable Diffusion is a deep learning text-to-image model that accepts input text and generates a corresponding image. The output is credible and appears to be realistic. Occasionally, it can be hard to tell if the image is computer generated. Consideration of this workload is important because of the rapid expansion of use cases such as eCommerce, marketing, graphics design, simulation, video generation, applied fashion, web design, and so on.
Because these workloads demand intensive compute to train, the measurement of system performance during their use is essential. As an AI systems benchmark, MLPerf has emerged as a standard way to compare different submitters that include OEMs, accelerator vendors, and others in a like-to-like way.
MLPerf recently introduced the Stable Diffusion benchmark for v3.1 MLPerf Training. It measures the time to converge a Stable Diffusion workload to reach the expected quality targets. The benchmark uses the Stable Diffusion v2 model trained on the LAION-400M-filtered dataset. The original LAION 400M dataset has 400 million image and text pairs. A subset of those images (approximately 6.5 million) is used for training in the benchmark. The validation dataset is a subset of 30 K COCO 2014 images. Expected quality targets are FID <= 90 and CLIP>=0.15.
The following figure shows a latent diffusion model[1]:
Figure 1: Latent diffusion model
[1] Source: https://arxiv.org/pdf/2112.10752.pdf
Stable Diffusion v2 is a latent diffusion model that combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. MLPerf Stable Diffusion focuses on the U-Net denoising network, which has approximately 865 M parameters. There are some deviations from the v2 model. However, these adjustments are minor and encourage more submitters to make submissions with compute constraints.
The submission uses the NVIDIA NeMo framework, included with NVIDIA AI Enterprise, for secure, supported, and stable production AI. It is a framework to build, customize, and deploy generative AI models. It includes training and inferencing frameworks, guard railing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost effective, and a fast way to adopt generative AI.
Performance of the Dell PowerEdge XE9680 server and other NVIDIA-based GPUs on Stable Diffusion
The following figure shows the performance of NVIDIA H100 Tensor Core GPU-based systems on the Stable Diffusion benchmark. It includes submissions from Dell Technologies and NVIDIA that use different numbers of NVIDIA H100 GPUs. The results shown vary from eight GPUs (Dell submission) to 1024 GPUs (NVIDIA submission). The following figure shows the expected performance of this workload and demonstrates that strong scaling is achievable with less scaling loss.
Figure 2: MLPerf Training Stable Diffusion scaling results on NVIDIA H100 GPUs from Dell Technologies and NVIDIA
End users can use state-of-the-art compute to derive faster time to value.
Conclusion
The key takeaways include:
- The latest released MLPerf Training v3.1 measures Generative AI workloads like Stable Diffusion.
- Dell Technologies is the only OEM vendor to have made an MLPerf-compliant Stable Diffusion submission.
- The Dell PowerEdge XE9680 server is an excellent choice to derive value from Image Generation AI workloads for marketing, art, gaming, and so on. The benchmark results are outstanding for Stable Diffusion v2.
MLCommons Results
https://mlcommons.org/benchmarks/training/
The preceding graphs are MLCommons results for MLPerf IDs 3.1-2019, 3.1-2050, 3.1-2055, and 3.1-2060.
The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
Dell PowerEdge Servers Achieve Stellar Scores with MLPerf™ Training v3.1
Wed, 08 Nov 2023 17:43:48 -0000
|Read Time: 0 minutes
Abstract
MLPerf is an industry-standard AI performance benchmark. For more information about the MLPerf benchmarks, see Benchmark Work | Benchmarks MLCommons.
Today marks the release of a new set of results for MLPerf Training v3.1. The Dell PowerEdge XE9680, XE8640, and XE9640 servers in the submission demonstrated excellent performance. The tasks included image classification, medical image segmentation, lightweight and heavy-weight object detection, speech recognition, language modeling, recommendation, and text to image. MLPerf Training v3.1 results provide a baseline for end users to set performance expectations.
What is new with MLPerf Training 3.1 and the Dell Technologies submissions?
The following are new for this submission:
- For the benchmarking suite, a new benchmark was added: stable diffusion with the Laion400 dataset.
- Dell Technologies submitted the newly introduced Liquid Assisted Air Cooled (LAAC) PowerEdge XE9640 system, which is a part of the latest generation Dell PowerEdge servers.
Overview of results
Dell Technologies submitted 30 results. These results were submitted using five different systems. We submitted results for the PowerEdge XE9680, XE8640, and XE9640 servers. We also submitted multinode results for the PowerEdge XE9680 and XE8640 servers. The PowerEdge XE9680 server was powered by eight NVIDIA H100 Tensor Core GPUs, while the PowerEdge XE8640 and XE9640 servers were powered by four NVIDIA H100 Tensor Core GPUs each.
Datapoints of interest
Interesting datapoints include:
- Our new stable diffusion results with the PowerEdge XE9680 server have been submitted for the first time and are exclusive. Dell Technologies, NVIDIA, and Habana Labs are the only submitters to have made an official submission. This submission is important because of the explosion of Generative AI workloads. The submission uses the NVIDIA NeMo framework, included in NVIDIA AI Enterprise for secure, supported, and stable production AI.
- Dell PowerEdge XE8640 and XE9640 servers secured several top performer titles (#1 titles) among other systems equipped with four NVIDIA H100 GPUs. The tasks included language modeling, recommendation, heavy-weight object detection, speech to text, and medical image segmentation.
- A number of multinode results were submitted for the previous round, which can be compared with this round. PowerEdge XE9680 multinode results were submitted. Additionally, this round was the first time multinode results with the newer generation PowerEdge XE8640 servers were submitted. The results show near linear scaling. Furthermore, Dell Technologies is the only submitter in addition to NVIDIA, Habana Labs, and Intel making multinode, on-premises result submissions.
- The results for the PowerEdge XE9640 server with liquid assisted air cooling (LAAC) are similar to the PowerEdge XE8640 air-cooled server.
The following figure shows all the convergence times for Dell systems and corresponding workloads in the benchmark. Because different benchmarks are included in the same graph, the y axis is expressed logarithmically. Overall, these numbers show an excellent time to converge for the workload in question.
Figure 1. Logarithmic y axis: Overview of Dell MLPerf Training v3.1 results
Conclusion
We submitted compliant results for the MLCommons Training v3.1 benchmark. These results are based on the latest generation of Dell PowerEdge XE9680, XE8640, and XE9640 servers, powered by NVIDIA H100 Tensor Core GPUs. All results are stellar. They demonstrate that multinode scaling is linear and that more servers can help to solve the same problem faster. Different results allow end users to make decisions about expected performance before deploying their compute-intensive training workloads. The workloads in the submission include image classification, medical image segmentation, lightweight and heavy-weight object detection, speech recognition, language modeling, recommendation, and text to image. Enterprises can enable and maximize their AI transformation with Dell Technologies efficiently with Dell solutions.
MLCommons Results
https://mlcommons.org/benchmarks/training/
The preceding graphs are MLCommons results for MLPerf IDs from 3.1-2005 to 3.1-2009.
The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.