Can CPUs Effectively Run AI Applications?
Fri, 03 Mar 2023 20:06:28 -0000
|Read Time: 0 minutes
Due to the inherent advantages of GPUs in high speed scale matrix operations, developers have gravitated to GPUs for AI training (developing the model) and inference (the model in execution).
With the scarcity of GPUs driven by the massive growth of AI applications, including recent advancements in stable diffusion and large language models that have taken the world by storm, such as ChatGPT by OpenAI, the question for many developers is:
Are CPUs up to the task of AI?
To answer the question, Dell Technologies and Scalers AI set up a Dell PowerEdge R760 server with 4th Gen Intel® Xeon® processors and integrated Intel® Deep Learning acceleration. Notably, we did not install a GPU on this server.
In this blog, Part One of a two-part series, we’ll put this latest and greatest Intel® Xeon® CPU just released this month by Intel® to the test on AI inference . We’ll also run AI on video streams, one of the most common mediums to run AI, and pair industry specific application logic to showcase a real-world AI workload.
In Part Two, we’ll train a model in a technique called transfer learning. Most training is done on GPUs today, and transfer learning presents a great opportunity to leverage existing models while customizing for targeted use cases.
The industry specific use case
Scalers AI developed a smart city solution that uses artificial intelligence and computer vision to monitor traffic safety in real time. The solution identifies potential safety hazards, such as illegal lane changes on freeway on-ramps, reckless driving, and vehicle collisions, by analyzing video footage from cameras positioned at key locations.
For comparison, we also set up the previous generation Dell PowerEdge R750 server and ran the AI inferencing object detection workload on both servers. What did we learn?
Dell PowerEdge R760 with 4th Gen Intel® Xeon® Processors and Intel® Deep Learning Boost delivered!
Let’s find out about the generational server comparison.
The following charts show the performance gain from the last gen to the current gen server. The graph on the left shows inference-only performance, while the middle graph adds video decode. Finally, the graph on the right shows the full application performance with the smart city solution application logic.
The performance claims are great. But what does this mean for my business?
Dell PowerEdge R760 and Scalers AI smart city solution results show that for a similar application, users can expect the Dell PowerEdge R760 server to perform real-time inferencing on up to 90 1080P video streams when it is deployed. Dell PowerEdge R750 can handle up to 50 1080P video streams, and this is all without a GPU. Although GPUs add additional AI computing capability, this study shows that they may only sometimes be necessary, depending on your unique requirements, such as how many streams must be displayed concurrently.
Given these results, Scalers AI confidently recommends using Dell PowerEdge R760 with 4th Gen Intel® Xeon® Processors and Intel® Deep Learning Boost for AI computer vision workloads, such as the Scalers AI Traffic Safety Solution using object detection, because they fulfill all application requirements.
Now that we have shown highly effective object detection on a CPU, what about a more compute-intensive complex model such as segmentation?
Here we are running segmentation on 10 streams, while displaying four streams on the more complex segmentation model.
As you can see, CPUs are up to the task of running AI inference on models such as object detection and segmentation. Perhaps more important for developers, they offer the flexibility to run the full workload on the same processor, thereby lowering the TCO.
With the rapid growth of AI, the ability to deploy on CPUs is a key differentiator for real-world use cases such as traffic safety. This frees up GPU resources for training and graphics use cases.
Check in for Part Two of this blog series as we discuss a technique to train a transfer learning model and put a CPU to the test there.
Resources
Interested in trying for yourself? Get access to the solution code!
To save developers hundreds of potential hours of development, Dell Technologies and Scalers AI are offering access to the solution code to fast-track development of AI workloads on next-generation Dell PowerEdge servers with 4th Gen Intel® Xeon® scalable processors.
For access to the code, reach out to your Dell representative or contact Scalers AI!
To learn more about the study discussed here, visit the following webpages:
• Myth-Busting:
Can Intel® Xeon® Processors Effectively Run AI Applications?
• Accelerate Industry Transformation:
Build Custom Models with Transfer Learning on Intel® Xeon®
• Scalers AI Performance Insights:
Dell PowerEdge R760 with 4th Gen Intel® Xeon® Scalable Processors in AI
Authors:
Steen Graham, CEO at Scalers AI
Delmar Hernandez, Server Technologist at Dell Technologies
Related Blog Posts
AI-Powered Smart Cities: PowerEdge and Intel Team Up to Deliver the Future
Mon, 07 Aug 2023 19:49:44 -0000
|Read Time: 0 minutes
Have you ever found yourself stuck at a red light with no other cars in sight, wondering why it takes so long to change? Or witnessed another never-ending traffic study in your city? What if we harness artificial intelligence to help cities make smart decisions fast?
The power of AI in smart cities
Artificial intelligence has emerged as a critical technology that is driving advancements in smart cities. It can analyze vast amounts of data to identify patterns and help make informed decisions, allowing city leaders to respond swiftly. These real-time insights will revolutionize how cities manage their infrastructures and services.
Improving traffic flow with AI
Imagine a world where AI optimizes traffic flow, minimizes wasted commute time, and reduces traffic congestion and thus pollution. Dell Technologies, Intel, and Scalers AI developed a concept solution combining the power of the latest PowerEdge servers offering 4th Gen Intel® Xeon® CPUs and Intel Data Center Flex Series GPUs. This innovative solution harnesses every ounce of computing power offered in the latest generation of PowerEdge servers to deliver maximum server performance.
Leading the way with smart city solutions
We developed this cutting-edge concept solution to give us a glimpse of the possible. Our approach involves monitoring automotive behavior and traffic using real-time video footage from many strategically positioned cameras. By analyzing this data, the application identifies safety hazards like reckless driving and vehicle collisions, empowering cities to respond swiftly.
The impact of the Intel–Dell partnership
The partnership between Intel and Dell, supported by the expertise of Scalers AI, is driving smart cities into reality. The combined power of CPUs and GPUs for AI workloads enhances urban safety, sustainability, and efficiency. This collaboration allows cities to explore the potential of AI for real-world applications.
To learn more about this groundbreaking solution and how the latest technology from Dell Technologies and Intel will revolutionize urban living, visit https://infohub.delltechnologies.com/section-assets/07-09-intel-data-center-flex-series-gpu-with-poweredge-r660-driving-innovation.
Author: Delmar Hernandez
Do AI Models and Systems Have to Come in All Shapes and Sizes? If so, Why?
Wed, 24 Apr 2024 13:21:25 -0000
|Read Time: 0 minutes
I was recently in a meeting with some corporate strategists. They were noting that the AI market was too fragmented post ChatGPT and they needed help defining AI. The strategists said that there was too much confusion in the market, and we needed to help our customers understand and simplify this new field of technology. This led to an excellent discussion about general vs. generative AI, their different use cases and infrastructure needs, and why they need to be looked at separately. Then to reinforce that this is top of mind for many, it was not two hours later I was talking to a colleague and almost the same question came up: why the need for different approaches to different types of AI workloads? Why are there no “silver bullets” for AI?
“Traditional” vs. LLM AI
There is a division in AI models. The market has settled on the terms ‘General’ vs. ‘Generative’ for these models. These two types of models can be defined by their size as measured in parameters. Parameters can be defined as the weights given to different probabilities of a given output. The models we used in past years have ranged in parameter size from tens of millions (ResNet) to at most 100s of millions (BERT). These models remain effective and make up the majority of models deployed in production.
The new wave of models, publicly highlighted by OpenAI’s GPT-3 and ChatGPT, show a huge shift. ChatGPT clocks in at five billion to 20 billion parameters; GPT-3 is 175 billion parameters. GPT-4 is even more colossal, somewhere in the range of 1.5 to 170 trillion parameters, depending on the version. This is at the core of why we must treat various AI systems differently in what we want to do with them, their infrastructure requirements, and in how we deploy them. To determine the final size and performance requirements for an AI model, you should factor in the token count as well. Tokens in the context of LLMs are the units of text that models use for input and output. Token count can vary from a few hundred for an LLM inference job to 100s of billions for LLM training.
Why the jump?
So, what happened? Why did we suddenly jump up in model size by 2+ orders of magnitude? Determinism. Previously AI scientists were trying to solve very specific questions.
Let’s look at the example of an ADAS or self-driving car system. There is an image recognition model deployed in a car and it is looking for specific things, such as stop signs. The deployed model will determine when it sees a stop sign and follows a defined and limited set of rules for how to react. While smart in its ability to recognize stop signs in a variety of conditions (faded, snow covered, bent, and so on), it has a set pattern of behavior. The input and output of the model always match (stop sign = stop.)
With LLM or generative systems they must deal with both the problem of understanding the question (prompt) and then generating the most appropriate response. This is why Chat GPT can give you different answers to the same input: it reruns the entire process and even the smallest changes to the particulars of the input or the model itself will cause different outcomes. The outcomes of ChatGPT are not predetermined. This necessitates a much higher level of complexity and that has led to the explosive growth of model size. This size explosion has also led to another oddity: nothing is a set size. Most generative models are sized in a range as different versions and will be optimized for specific focus areas.
So, what do we do?
As AI practitioners we must recognize when to use different forms of AI models and systems. We must continue to monitor for additional changes in the AI model landscape. We must endeavor to find ways to optimize models and use only the parts we need because this will lead to significant reductions in the size of models and the ease, speed, and cost effectiveness of AI system deployments. If your AI project team or company would like to discuss this, reach out to your Dell Technologies contact to start a conversation on how Dell Technologies can help grow your AI at any scale.
Author: Justin Potuznik
Engineering Technologist – High Performance Computing & Artificial Intelligence
Dell Technologies | ISG Chief Technology & Innovation Office