Dell Integrated System for Microsoft Azure Stack HCI with Storage Spaces Direct
Fri, 01 Mar 2024 22:18:07 -0000
|Read Time: 0 minutes
Many industry analysts covering the computer vision market are predicting double-digit compound annual growth over the next five years on top of the approximately $60B US yearly expenditures. Organizations investing significantly in greenfield or upgrade projects must evaluate their IT infrastructure designs. Technology options have improved considerably since video management and computer vision with AI systems were introduced. Virtualization technologies like Microsoft Azure Stack have many advantages in efficiency and manageability compared to the traditional approach of dedicating bespoke infrastructure stacks for every application, from video ingest to AI analytics and real-time alerting. This article describes our recent validation of Microsoft Azure Stack Hyperconverged Infrastructure (HCI) for hosting multiple computer vision applications, including video management and two AI-enabled computer vision applications. The full white paper we published based on the work titled Computer Vision on Microsoft Azure Stack HCI is also available for online reading or download.
Microsoft Azure Stack hyperconverged infrastructure (HCI) is an on-premises IT platform integrated with an Azure public cloud management service. Azure Stack (AS) represents a comprehensive solution for organizations looking to leverage the benefits of cloud computing while maintaining control over on-premises infrastructure. This platform is a core component of Microsoft's hybrid cloud strategy, which brings the agility and fast-paced innovation of cloud computing to on-premises environments. ASHCI offerings from Dell Technologies provide flexible, scalable, and secure solutions to customers looking to consolidate virtualized workloads.
The ASHCI platform seamlessly integrates with core Windows Server technologies like Hyper-V for virtualization and Storage Spaces Direct (S2D) for storage. The convergence of management tools for both on-premises and cloud resources with additional options for integration with other Azure services reduces deployment and operation overhead for enterprises pursuing a hybrid cloud strategy.
The system architecture we implemented is the Dell Integrated System for Microsoft Azure Stack HCI with Storage Spaces Direct, plus NVIDIA A16 server-class GPUs. The Azure Stack HCI system leverages Microsoft Windows virtual machine virtualization that will be familiar to many IT and OT (operational technology) professionals.
We performed real-time analytics with BriefCam and Ipsotek by integration with the Milestone directory server and video recording services. All three applications were hosted on a 5-node Microsoft Azure Stack HCI cluster.
The three applications chosen for this validation were:
- BriefCam provides an industry-leading application for video analytics for rapid video review and search, real-time alerting, and quantitative video insights.
- Ipsotek specializes in AI-enhanced video analytics software to manage automatically generated alerts in real-time for crowd management, smoke detection, intrusion detection, perimeter protection, number plate recognition, and traffic management.
- The Milestone Systems XProtect platform video management software enables organizations and institutions to create the perfect combination of cameras, sensors, and analytics.
In summary, Azure Stack HCI solutions from Dell Technologies offer a versatile and balanced hybrid cloud approach, allowing organizations to capitalize on the strengths of both on-premises and cloud environments. This flexibility is essential for AI computer vision environments where efficiency, security, compliance, and innovation are keys to sustaining competitive advantage. Our experience working with Microsoft Azure Stack HCI to host enterprise applications for video management and computer vision AI revealed the depth of the platform's innovation and a focus on ease of deployment and management.
For more information:
Computer Vision on Microsoft Azure Stack HCI White Paper
Microsoft Azure Stack HCI
- Dell Integrated System for Microsoft Azure Stack HCI
- Delivering aHybrid Cloud Architecture Through Microsoft-Dell Integrated HCI
- Microsoft Azure Stack HCI documentation
BriefCam Software Website
Ipsotek Ltd Website
Milestone Systems Website
NVIDIA GPU Hardware
Related Blog Posts
The Future of AI Using LiDAR
Tue, 30 Jan 2024 14:48:31 -0000
|Read Time: 0 minutes
Introduction
Light Detection and Ranging (LiDAR) is a method for determining the distance from a sensor to an object or a surface by sending out a laser beam and measuring the time for the reflected light to return to the receiver. We recently designed a solution to understand how using data from multiple LiDAR sensors monitoring a single space can be combined into a three-dimensional (3D) perceptual understanding of how people and objects flow and function within public and private spaces. Our key partner in this research is Seoul Robotics, a leader in LiDAR 3D perception and analytics tools.
Most people are familiar with the use of LiDAR on moving vehicles to detect nearby objects that has become popular in transportation applications. Stationary LiDAR is now becoming more widely adopted for 3D imaging in applications where cameras have been used traditionally.
Multiple sensor LiDAR applications can produce a complete 3D grid map with precise depth and location information for objects in the jointly monitored environment. This technology overcomes several limitations of 2D cameras. Using AI, LiDAR systems can improve the quality of analysis results for data collected during harsh weather conditions like rain, snow, and fog. Furthermore, LiDAR is more robust than optical cameras for conditions where the ambient lighting is low or produces reflections and glare.
Another advantage of LiDAR for computer vision is related to privacy protection. The widespread deployment of high-resolution optical cameras has raised concerns regarding the potential violation of individual privacy and misuse of the data.
LiDAR 3D perception is a promising alternative to traditional camera systems. LiDAR data does not contain biometric data that could be cross-referenced with other sources to identify individuals uniquely. This approach allows operators to track anonymous objects that maintain individuals' privacy. Therefore, it is essential to consider replacing or augmenting such cameras to reduce the overhead of ensuring that data is secure and used appropriately.
Challenges
Worldwide, organizations use AI-enabled computer vision solutions to create safer, more efficient public and private spaces using only optical thermal and infrared cameras. Data scientists have developed many machine learning and deep neural network tools to detect and label objects using data from these different camera types.
As LiDAR becomes vital for the reasons discussed above, organizations are investigating their options for whether LiDAR is best deployed alongside traditional cameras or if there are opportunities to design new systems using LiDAR sensors exclusively. It is rare when existing cameras can be replaced with LiDAR sensors mounted in the exact locations used today.
An example deployment of 2 LiDAR sensors for a medium-sized room is below:
Detecting the position of the stationary objects and people moving through this space (flow and function) with LiDAR requires careful placement of the sensors, calibration of the room's geometry, and data processing algorithms that can extract information from both sensors without distortion or duplications. Collecting and processing LiDAR data for 3D perception requires a different toolset and expertise, but companies like Seoul Robotics can help.
Another aspect of LiDAR systems design that needs to be evaluated is data transfer requirements. In most large environments using camera deployments today (e.g., airport/transportation hubs, etc.), camera data is fed back to a centralized hub for real-time processing.
A typical optical camera in an AI computer vision system would have a resolution and refresh rate of 1080@30FPS. This specification would translate to ~4Mb/s of network traffic per camera. Even with older network technology, thousands of cameras can be deployed and processed.
There is a significant increase in the density of the data produced and processed for LiDAR systems compared to video systems. A currently available 32-channel LiDAR sensor will produce between 25Mb/s and 50Mb/s of data on the network segment between the device and the AI processing node. Newer high-density 128-channel LiDAR sensors consume up to 256Mb/s of network bandwidth, so something will need to change from the current strategy of centralized data processing.
Technical Solution
It is not feasible to design a system that will consume the entire network capacity of a site with LiDAR traffic. In addition, it can also be challenging and expensive to upgrade the site's private network to handle higher speeds. The most efficient solution, therefore, is to design a federated solution for processing LiDAR data closer to the location of the sensors.
With a switch to the architecture in the right-side panel above, it is possible to process multiple LiDAR sensors closer to where they are mounted at the site and only send any resulting alerts and events back to a central location (primary node) for further processing and triggering corrective actions. This approach avoids the costly transfer of dense LiDAR data across long network segments.
It is important to note that processing LiDAR data with millions of points per second requires significant computational capability. We also validated that leveraging the massive parallel computing power of GPUs like the NVIDIA A2 greatly enhanced the object detection accuracy in the distributed processing nodes. The Dell XR4000 series of rugged Dell servers should be a good option for remote processing in many environments.
Conclusion
LiDAR is becoming increasingly important in designing AI for computer vision solutions due to its ability to handle challenging lighting situations and enhance user privacy. LiDAR differs from video cameras, so planning the deployment carefully is essential.
LiDAR systems can be designed in either a central or federated manner or even a mix of both. The rapidly growing network bandwidth requirements of LiDAR may cause a rethink on how systems for AI-enabled data processes are deployed sooner rather than later.
For more details on CV 3D Flow and Function with LiDAR see Computer Vision 3D Flow and Function AI with LiDAR.
Optimizing Computer Vision Workloads: A Guide to Selecting NVIDIA GPUs
Fri, 27 Oct 2023 15:31:21 -0000
|Read Time: 0 minutes
Introduction
Long gone are the days when facilities managers and security personnel were required to be in a control room with their attention locked onto walls of video monitors. The development of lower-cost and more capable video cameras, more powerful data science computing platforms, and the need to reduce operations overhead have caused the deployment of video management systems (VMS) and computer vision analytics applications to skyrocket in the last ten years in all sectors of the economy. Modern computer vision applications can detect a wide range of events without constant human supervision, including overcrowding, unauthorized access, smoke detection, vehicle operation infractions, and more. Better situational awareness of their environments can help organizations achieve better outcomes for everyone involved.
Table 1 – Outcomes achievable with better situational awareness
Increased operational efficiencies | Leverage all the data that you capture to deliver high-quality services and improve resource allocation. |
Optimized safety and security | Provide a safer, more real-time aware environment. |
Enhanced experience | Provide a more positive, personalized, and engaging experience for both customers and employees. |
Improved sustainability | Measure and lower your environmental impact. |
New revenue opportunities | Unlock more monetization opportunities from your data with more actionable insights. |
The technical challenge
Computer vision analytics uses various techniques and algorithms, including object detection, classification, feature extraction, and more. The computation resources that are required for these tasks depend on the resolution of the source video, frame rates, and the complexity of both the scene and the types of analytics being processed. The diagram below shows a simplified set of steps (pipeline) that is frequently implemented in a computer vision application.
Figure 1: Logical processing pipeline for computer vision
Inference is the step that most people are familiar with. A trained algorithm can distinguish between a passenger automobile and a delivery van, similar to the classic dogs versus cats example often used to explain computer vision. While the other steps are less familiar to the typical user of computer vision applications, they are critical to achieving good results and require dedicated graphics processing units (GPUs). For example, the Decode/Encode steps are tuned to leverage hardware that resides on the GPU to provide optimal performance.
Given the extensive portfolio of NVIDIA GPUs available today, organizations that are getting started with computer vision applications often need help understanding their options. We have tested the performance of computer vision analytics applications with various models of NVIDIA GPUs and collected the results. The remainder of this article provides background on the test results and our choice of model.
Choosing a GPU
The market for GPUs is broadly divided into data center, desktop, and mobility products. The workload that is placed on a GPU when training large image classification and detection models is almost exclusively performed on data center GPUs. Once these models are trained and delivered in a computer vision application, multiple CPU and GPU resource options can be available at run time. Small facilities, such as a small retailer with only a few cameras, can afford to deploy only a desktop computer with a low-power GPU for near real-time video analytics. In contrast, large organizations with hundreds to thousands of cameras need the power of data center-class GPUs.
However, all data center GPUs are not created equal. The table below compares selected characteristics for a sample of NVIDIA data center GPUs. The FP32 floating point calculations per second metric indicates the relative performance that a developer can expect on either model training or the inference stage of the typical pipeline used in a computer vision application, as discussed above.
The capability of the GPU for performing other pipeline elements required for high-performance computer vision tasks, including encoding/decoding, is best reflected by the Media Engines details.
First, consider the Media Engines row entry for the A30 GPU column. There is 1 JPEG decoder and 4 video decoders, but no video encoders. This configuration makes the A30 incompatible with the needs of many market-leading computer vision application vendors' products, even though it is a data center GPU.
Table 2: NVIDA Ampere architecture GPU characteristics
| A2 | A16 | A30 | A40 |
FP32 (Tera Flops) | 4.5 | 4x 4.5 | 10.3 | 37.4 |
Memory (GB) | 16 GDDR6 | 4x 16 GDDR6 | 24 GB HBM2 | 48 GDDR6 with ECC |
Media Engines | 1 video encoder 2 video decoders (includes AV1 decode) | 4 video encoder 8 video decoders (includes AV1 decode) | 1 JPEG decoder 4 video decoders 1 optical flow accelerator | 1 video encoder 2 video decoders (includes AV1 decode) |
Power (Watts) | 40-60 (Configurable) | 250 | 165 | 300 |
Comparing the FP32 TFLOPS between the A30 and A40 shows that the A40 is a more capable GPU for training and pure inference tasks. During our testing, the computer vision applications quickly exhausted the available Media Engines on the A40. Selecting a GPU for computer vision requires matching the available resources needed for computer vision including media engines, available memory, and other computing capabilities that can be different across use cases.
Next, examining the Media Engines description for the A2 GPU column confirms that the product houses 1 video encoder and 2 video decoders. This card will meet the needs of most computer vision applications and is supported for data center use; however, the low number of encoders and decoders, memory, and floating point processing will limit the number of concurrent streams that can be processed. The low power consumption of the A2 increases the flexibility of choice of server for deployment, which is important for edge and near-edge scenarios.
Still focusing on the table above, compare all the characteristics of the A2 GPU column with the A16 GPU. Notice that there are four times the resources on the A16 versus the A2. This can be explained by looking at the diagram below. The A16 was constructed by putting four A2 “engines” on a single PCI card. Each of the boxes labeled GPU0-GPU3 contains all the memory, media engines and other processing capabilities that you would have available to a server that had a standard A2 GPU card installed. Also notice that the A16 requires approximately 4 times the power of an A2.
The table below shows the same metric comparison used in the discussion above for the newest NVIDIA GPU products based on the Ada Lovelace architecture. The L4 GPU offers 2 encoders and 4 decoders for a card that consumes just 72 W. Compared with the 1 encoder and 2 decoder configuration on the A2 at 40 to 60 W, the L4 should be capable of processing many more video streams for less power than two A2 cards. The L40 with 3 encoders and 3 decoders is expected to be the new computer vision application workhorse for organizations with hundreds to thousands of video streams. While the L40S has the same number of Media Engines and memory as the L40, it was designed to be an upgrade/replacement for the A100 Ampere architecture training and/or inference computing leader.
| L4 | L40 | L40S |
FP32 (Tera Flops) | 30.3 | 90.5 | 91.6 |
Memory (GB) | 24 GDDR6 w/ ECC | 48 GDDR6 w/ ECC | 48 GDDR6 w/ ECC |
Media Engines | 2 video encoder 4 video decoders 4 JPEG decoder (includes AV1 decode) | 3 video encoder 3 video decoders
| 3 video encoder 3 video decoders
|
Power (Watts) | 72 | 300 | 350 |
Conclusion
In total seven different NVIDIA GPU cards were discussed that are useful for CV workloads. From the Ampere family of cards we found that the A16 performed well for a wide variety of CV inference workloads. The A16 provides a good balance of video Decoders/Encoders, CUDA cores and memory for computer vision workloads.
For the newer Ada Lovlace family of cards, the L40 looks like a well-balanced card with great throughput potential. We are currently testing out this card in our lab and will provide a future blog on its performance for CV workloads.
References
A2 - https://www.nvidia.com/content/dam/en-zz/solutions/data-center/a2/pdf/a2-datasheet.pdf
A16 - https://images.nvidia.com/content/Solutions/data-center/vgpu-a16-datasheet.pdf
A30 - https://www.nvidia.com/en-us/data-center/products/a30-gpu/
A40 - https://images.nvidia.com/content/Solutions/data-center/a40/nvidia-a40-datasheet.pdf
L4 - https://www.nvidia.com/en-us/data-center/l4/