Do We Always Need GPUs for AI Workloads?

Graphics Processing Units (GPUs) have long been the preferred choice for accelerating AI workloads, especially deep learning tasks. However, the assumption that GPUs are indispensable for all AI applications merits a closer examination.

In this blog, we shift the focus to Central Processing Units (CPUs), delve into the role of CPU performance in AI workloads, and investigate scenarios where CPUs might offer competitive or even superior performance compared to GPUs.

To measure the performance of AI inference workload types on CPUs, we used the TensorFlow benchmark. TensorFlow is a benchmark with implementations of popular convolutional neural networks for large-scale image recognition (VGG-16, AlexNet, GoogLeNet, and ResNet-50) and various batch sizes (16, 32, 64, 256, and 512). It is designed to support workloads running on a single machine as well as workloads running in distributed mode across multiple hosts. The study looks at all subtests in TensorFlow.

We looked at the performance trend that each model shows for the different batch sizes to decide which of the 1-socket PowerEdge R7615 and 2-socket PowerEdge R7625 versions is suitable for a CPU-based AI inference type of workload.

The following figures show the performance of convolutional models on different batch sizes in the balanced, 12-DIMMs-per-socket configuration with memory capacity of 64 GB per DIMM in PowerEdge 7625 and 7615 with 4th Gen AMD EPYC 9654 and 9654P processors:

Figure 1. Performance of convolutional models on different batch sizes in a balanced, 12-DIMMs-per-socket configuration with memory capacity of 64 GB per DIMM with default BIOS settings

The batch size can vary depending on several factors, including the specific application, available computational resources, and hardware constraints. Generally, larger batch sizes are preferred because they offer better parallelization and computational efficiency, but they also require more memory. As we can see in the line graphs, the 2-socket server (PowerEdge R7625) outperforms the 1-socket server (PowerEdge R7615) by up to 150 percent in smaller batch sizes.

We found that the performance of smaller batch sizes is great in CPUs and suggest that our customers buy that configuration based on performance, business requirements and future scalability.

In practice, the choice between CPU-based and GPU-based AI inference depends on the specific requirements of the application. Some AI workloads benefit more from the parallel processing capabilities of GPUs, while others may prioritize low latency and versatile processing, which CPUs can provide.

Ultimately, the choice between using GPUs or CPUs for AI workloads should be based on a thorough understanding of the workload's characteristics, performance requirements, available hardware, and budget considerations. In some cases, a combination of different hardware components might also be a viable solution to optimize performance and cost.

You can find more about this on CPU-based AI inference | Workload-Based DDR5 Memory Guidance for Next-Generation PowerEdge Servers | Dell Technologies Info Hub.

Author: Swaraj Mohapatra

Your Browser is Out of Date