Optimizing Computer Vision Workloads: A Guide to Selecting NVIDIA GPUs
Fri, 27 Oct 2023 15:31:21 -0000|
Read Time: 0 minutes
Long gone are the days when facilities managers and security personnel were required to be in a control room with their attention locked onto walls of video monitors. The development of lower-cost and more capable video cameras, more powerful data science computing platforms, and the need to reduce operations overhead have caused the deployment of video management systems (VMS) and computer vision analytics applications to skyrocket in the last ten years in all sectors of the economy. Modern computer vision applications can detect a wide range of events without constant human supervision, including overcrowding, unauthorized access, smoke detection, vehicle operation infractions, and more. Better situational awareness of their environments can help organizations achieve better outcomes for everyone involved.
Table 1 – Outcomes achievable with better situational awareness
Increased operational efficiencies
Leverage all the data that you capture to deliver high-quality services and improve resource allocation.
Optimized safety and security
Provide a safer, more real-time aware environment.
Provide a more positive, personalized, and engaging experience for both customers and employees.
Measure and lower your environmental impact.
New revenue opportunities
Unlock more monetization opportunities from your data with more actionable insights.
The technical challenge
Computer vision analytics uses various techniques and algorithms, including object detection, classification, feature extraction, and more. The computation resources that are required for these tasks depend on the resolution of the source video, frame rates, and the complexity of both the scene and the types of analytics being processed. The diagram below shows a simplified set of steps (pipeline) that is frequently implemented in a computer vision application.
Figure 1: Logical processing pipeline for computer vision
Inference is the step that most people are familiar with. A trained algorithm can distinguish between a passenger automobile and a delivery van, similar to the classic dogs versus cats example often used to explain computer vision. While the other steps are less familiar to the typical user of computer vision applications, they are critical to achieving good results and require dedicated graphics processing units (GPUs). For example, the Decode/Encode steps are tuned to leverage hardware that resides on the GPU to provide optimal performance.
Given the extensive portfolio of NVIDIA GPUs available today, organizations that are getting started with computer vision applications often need help understanding their options. We have tested the performance of computer vision analytics applications with various models of NVIDIA GPUs and collected the results. The remainder of this article provides background on the test results and our choice of model.
Choosing a GPU
The market for GPUs is broadly divided into data center, desktop, and mobility products. The workload that is placed on a GPU when training large image classification and detection models is almost exclusively performed on data center GPUs. Once these models are trained and delivered in a computer vision application, multiple CPU and GPU resource options can be available at run time. Small facilities, such as a small retailer with only a few cameras, can afford to deploy only a desktop computer with a low-power GPU for near real-time video analytics. In contrast, large organizations with hundreds to thousands of cameras need the power of data center-class GPUs.
However, all data center GPUs are not created equal. The table below compares selected characteristics for a sample of NVIDIA data center GPUs. The FP32 floating point calculations per second metric indicates the relative performance that a developer can expect on either model training or the inference stage of the typical pipeline used in a computer vision application, as discussed above.
The capability of the GPU for performing other pipeline elements required for high-performance computer vision tasks, including encoding/decoding, is best reflected by the Media Engines details.
First, consider the Media Engines row entry for the A30 GPU column. There is 1 JPEG decoder and 4 video decoders, but no video encoders. This configuration makes the A30 incompatible with the needs of many market-leading computer vision application vendors' products, even though it is a data center GPU.
Table 2: NVIDA Ampere architecture GPU characteristics
FP32 (Tera Flops)
4x 16 GDDR6
24 GB HBM2
1 video encoder
2 video decoders (includes AV1 decode)
4 video encoder
8 video decoders (includes AV1 decode)
1 JPEG decoder
4 video decoders
1 optical flow accelerator
1 video encoder
2 video decoders (includes AV1 decode)
Comparing the FP32 TFLOPS between the A30 and A40 shows that the A40 is a more capable GPU for training and pure inference tasks. During our testing, the computer vision applications quickly exhausted the available Media Engines on the A40. Selecting a GPU for computer vision requires matching the available resources needed for computer vision including media engines, available memory, and other computing capabilities that can be different across use cases.
Next, examining the Media Engines description for the A2 GPU column confirms that the product houses 1 video encoder and 2 video decoders. This card will meet the needs of most computer vision applications and is supported for data center use; however, the low number of encoders and decoders, memory, and floating point processing will limit the number of concurrent streams that can be processed. The low power consumption of the A2 increases the flexibility of choice of server for deployment, which is important for edge and near-edge scenarios.
Still focusing on the table above, compare all the characteristics of the A2 GPU column with the A16 GPU. Notice that there are four times the resources on the A16 versus the A2. This can be explained by looking at the diagram below. The A16 was constructed by putting four A2 “engines” on a single PCI card. Each of the boxes labeled GPU0-GPU3 contains all the memory, media engines and other processing capabilities that you would have available to a server that had a standard A2 GPU card installed. Also notice that the A16 requires approximately 4 times the power of an A2.
The table below shows the same metric comparison used in the discussion above for the newest NVIDIA GPU products based on the Ada Lovelace architecture. The L4 GPU offers 2 encoders and 4 decoders for a card that consumes just 72 W. Compared with the 1 encoder and 2 decoder configuration on the A2 at 40 to 60 W, the L4 should be capable of processing many more video streams for less power than two A2 cards. The L40 with 3 encoders and 3 decoders is expected to be the new computer vision application workhorse for organizations with hundreds to thousands of video streams. While the L40S has the same number of Media Engines and memory as the L40, it was designed to be an upgrade/replacement for the A100 Ampere architecture training and/or inference computing leader.
FP32 (Tera Flops)
24 GDDR6 w/ ECC
48 GDDR6 w/ ECC
48 GDDR6 w/ ECC
2 video encoder
4 video decoders
4 JPEG decoder
(includes AV1 decode)
3 video encoder
3 video decoders
3 video encoder
3 video decoders
In total seven different NVIDIA GPU cards were discussed that are useful for CV workloads. From the Ampere family of cards we found that the A16 performed well for a wide variety of CV inference workloads. The A16 provides a good balance of video Decoders/Encoders, CUDA cores and memory for computer vision workloads.
For the newer Ada Lovlace family of cards, the L40 looks like a well-balanced card with great throughput potential. We are currently testing out this card in our lab and will provide a future blog on its performance for CV workloads.
Related Blog Posts
The Future of AI Using LiDAR
Tue, 30 Jan 2024 14:48:31 -0000|
Read Time: 0 minutes
Light Detection and Ranging (LiDAR) is a method for determining the distance from a sensor to an object or a surface by sending out a laser beam and measuring the time for the reflected light to return to the receiver. We recently designed a solution to understand how using data from multiple LiDAR sensors monitoring a single space can be combined into a three-dimensional (3D) perceptual understanding of how people and objects flow and function within public and private spaces. Our key partner in this research is Seoul Robotics, a leader in LiDAR 3D perception and analytics tools.
Most people are familiar with the use of LiDAR on moving vehicles to detect nearby objects that has become popular in transportation applications. Stationary LiDAR is now becoming more widely adopted for 3D imaging in applications where cameras have been used traditionally.
Multiple sensor LiDAR applications can produce a complete 3D grid map with precise depth and location information for objects in the jointly monitored environment. This technology overcomes several limitations of 2D cameras. Using AI, LiDAR systems can improve the quality of analysis results for data collected during harsh weather conditions like rain, snow, and fog. Furthermore, LiDAR is more robust than optical cameras for conditions where the ambient lighting is low or produces reflections and glare.
Another advantage of LiDAR for computer vision is related to privacy protection. The widespread deployment of high-resolution optical cameras has raised concerns regarding the potential violation of individual privacy and misuse of the data.
LiDAR 3D perception is a promising alternative to traditional camera systems. LiDAR data does not contain biometric data that could be cross-referenced with other sources to identify individuals uniquely. This approach allows operators to track anonymous objects that maintain individuals' privacy. Therefore, it is essential to consider replacing or augmenting such cameras to reduce the overhead of ensuring that data is secure and used appropriately.
Worldwide, organizations use AI-enabled computer vision solutions to create safer, more efficient public and private spaces using only optical thermal and infrared cameras. Data scientists have developed many machine learning and deep neural network tools to detect and label objects using data from these different camera types.
As LiDAR becomes vital for the reasons discussed above, organizations are investigating their options for whether LiDAR is best deployed alongside traditional cameras or if there are opportunities to design new systems using LiDAR sensors exclusively. It is rare when existing cameras can be replaced with LiDAR sensors mounted in the exact locations used today.
An example deployment of 2 LiDAR sensors for a medium-sized room is below:
Detecting the position of the stationary objects and people moving through this space (flow and function) with LiDAR requires careful placement of the sensors, calibration of the room's geometry, and data processing algorithms that can extract information from both sensors without distortion or duplications. Collecting and processing LiDAR data for 3D perception requires a different toolset and expertise, but companies like Seoul Robotics can help.
Another aspect of LiDAR systems design that needs to be evaluated is data transfer requirements. In most large environments using camera deployments today (e.g., airport/transportation hubs, etc.), camera data is fed back to a centralized hub for real-time processing.
A typical optical camera in an AI computer vision system would have a resolution and refresh rate of 1080@30FPS. This specification would translate to ~4Mb/s of network traffic per camera. Even with older network technology, thousands of cameras can be deployed and processed.
There is a significant increase in the density of the data produced and processed for LiDAR systems compared to video systems. A currently available 32-channel LiDAR sensor will produce between 25Mb/s and 50Mb/s of data on the network segment between the device and the AI processing node. Newer high-density 128-channel LiDAR sensors consume up to 256Mb/s of network bandwidth, so something will need to change from the current strategy of centralized data processing.
It is not feasible to design a system that will consume the entire network capacity of a site with LiDAR traffic. In addition, it can also be challenging and expensive to upgrade the site's private network to handle higher speeds. The most efficient solution, therefore, is to design a federated solution for processing LiDAR data closer to the location of the sensors.
With a switch to the architecture in the right-side panel above, it is possible to process multiple LiDAR sensors closer to where they are mounted at the site and only send any resulting alerts and events back to a central location (primary node) for further processing and triggering corrective actions. This approach avoids the costly transfer of dense LiDAR data across long network segments.
It is important to note that processing LiDAR data with millions of points per second requires significant computational capability. We also validated that leveraging the massive parallel computing power of GPUs like the NVIDIA A2 greatly enhanced the object detection accuracy in the distributed processing nodes. The Dell XR4000 series of rugged Dell servers should be a good option for remote processing in many environments.
LiDAR is becoming increasingly important in designing AI for computer vision solutions due to its ability to handle challenging lighting situations and enhance user privacy. LiDAR differs from video cameras, so planning the deployment carefully is essential.
LiDAR systems can be designed in either a central or federated manner or even a mix of both. The rapidly growing network bandwidth requirements of LiDAR may cause a rethink on how systems for AI-enabled data processes are deployed sooner rather than later.
For more details on CV 3D Flow and Function with LiDAR see Computer Vision 3D Flow and Function AI with LiDAR.
Who’s watching your IP cameras?
Thu, 20 Jul 2023 18:05:50 -0000|
Read Time: 0 minutes
In today’s world, the deployment of security cameras is a common practice. In some public facilities like airports, travelers can be in view of a security camera 100% of the time. The days of security guards watching banks of video panels being fed from hundreds of security cameras are quickly being replaced by computer vision systems powered by artificial intelligence (AI). Today’s advanced analytics can be performed on many camera streams in real-time without a human in the loop. These systems enhance not only personal safety but also provide other benefits, including better passenger experience and enhanced shopping experiences.
Modern IP cameras are complex devices. In addition to recording video streams at increasingly higher resolutions (4k is now common), they can also encode and send those streams over traditional internet protocol IP to downstream systems for additional analytic processing and eventually archiving. Some cameras on the market today have enough onboard computing power and storage to evaluate AI models and perform analytics right on the camera.
The development of IP-connected cameras provided great flexibility in deployment by eliminating the need for specialized cables. IP cameras are so easy to plug into existing IT infrastructure that almost anyone can do it. However, since most camera vendors use a modified version of an open-source Linux operating system, IT and security professionals realize there are hundreds or thousands of customized Linux servers mounted on walls and ceilings all over their facilities. Whether you are responsible for <10 cameras at a small retail outlet or >5000 at an airport facility, the question remains “How much exposure do all those cameras pose from cyber-attacks?”
To understand the potential risk posed by IP cameras, we assembled a lab environment with multiple camera models from different vendors. Some cameras were thought to be up to date with the latest firmware, and some were not.
Working in collaboration with the Secureworks team and their suite of vulnerability and threat management tools, we assessed a strategy for detecting IP camera vulnerabilities Our first choice was to implement their Secureworks Taegis™ VDR vulnerability scanning software to scan our lab IP network to discover any camera vulnerabilities. VDR provides a risk-based approach to managing vulnerabilities driven by automated & intelligent machine learning.
We planned to discover the cameras with older firmware and document their vulnerabilities. Then we would have the engineers upgrade all firmware and software to the latest patches available and rescan to see if all the vulnerabilities were resolved.
Once the SecureWorks Edge agent was set up in the lab, we could easily add all the IP ranges that might be connected to our cameras. All the cameras on those networks were identified by SecureWorks VDR and automatically added to the VDR AWS cloud-based reporting console.
Discovering Camera Vulnerabilities
The results of the scans were surprising. Almost all discovered cameras had some Critical issues identified by the VDR scanning. In one case, even after a camera was upgraded to the latest firmware available from the vendor, VDR found Critical software and configuration vulnerabilities shown below:
One of the remaining critical issues was the result of an insecure FTP username/password that was not changed from the vendor’s default settings before the camera was put into service. These types of procedural lapses should not happen, but inadvertently they are bound to. The password hardening mistake was easily caught by a VDR scan so that another common cybersecurity risk could be dealt with. This is an example of an issue not related to firmware but a combination of the need for vendors not to ship with a well-known FTP login and the responsibility of users to not forget to harden the login.
Another example of the types of Critical issues you can expect when dealing with IP cameras relates to discovering an outdated library dependency found on the camera. The library is required by the vendor software but was not updated when the latest camera firmware patches were applied.
Camera Administration Consoles
The VDR tool will also detect if a camera is exposing any HTTP sites/services and look for vulnerabilities there. Most IP cameras ship with an embedded HTTP server so administrators can access the cameras' functionality and perform maintenance. Again, considering the number of deployed cameras, this represents a huge number of websites that may be susceptible to hacking. Our testing found some examples of the type of issues that a camera’s web applications can expose:
The scan of this device found an older version of Apache webserver software and outdated SSL libraries in use for this cameras website and should be considered a critical vulnerability.
In this article, we have tried to raise awareness of the significant Cyber Security risk that IP cameras pose to organizations, both large and small. Providing effective video recording and analysis capabilities is much more than simply mounting cameras on the wall and walking away. IT and security professionals must ask, “Who’s watching our IP cameras? Each camera should be continuously patched to the latest version of firmware and software - and scanned with a tool like SecureWorks VDR. If vulnerabilities still exist after scanning and patching, it is critical to engage with your camera vendor to remediate the issues that may adversely impact your organization if neglected. Someone will be watching your IP cameras; let’s ensure they don’t conflict with your best interests.
Dell Technologies is at the forefront of delivering enterprise-class computer vision solutions. Our extensive partner network and key industry stakeholders have allowed us to develop an award-winning process that takes customers from ideation to full-scale implementation faster and with less risk. Our outcomes-based process for computer vision delivers:
- Increased operational efficiencies: Leverage all the data you’re capturing to deliver high-quality services and improve resource allocation.
- Optimized safety and security: Provide a safer, more real-time aware environment
- Enhanced experience: Provide a more positive, personalized, and engaging experience for customers and employees.
- Improved sustainability: Measure and lower your environmental impact.
- New revenue opportunities: Unlock more monetization opportunities from your data with more actionable insights