Home Storage PowerScale (Isilon) Direct from Development - Tech Notes

Support Predictive Maintenance Analytics Workloads with Dell AI Starter Kit

Download PDF

Wed, 02 Aug 2023 20:47:04 -0000

Read Time: 0 minutes

Darren Miller

Summary

Deep Learning (DL) techniques have enabled great successes in many fields, such as computer vision, natural language processing (NLP), autonomous driving, and predictive maintenance (PdM), by enabling a model to learn from existing data and then make corresponding predictions. This is the heart of PdM, a technique designed to help determine the condition of in-service equipment in order to estimate when maintenance should be performed to avoid unplanned outages, increase worker safety, and reduce downtime.

The success of DL in PdM is due to a combination of improved algorithms, access to larger datasets, and increased computational power. The choice and design of the system components, carefully selected and tuned for DL use-cases, can have a big impact on the speed, accuracy, and business value of implementing AI techniques for PdM.

In such a complex environment, it is critical that organizations be able to rely on vendors that they trust. Over the last few years, Dell and NVIDIA have established a strong partnership to help organizations fast-track their AI initiatives. This document demonstrates how the Dell PowerScale All-Flash Scale-out NAS, Dell PowerEdge servers with NVIDIA GPUs, and Dell PowerSwitches, can be used to provide an excellent environment for small teams performing data science, AI, and deep learning for PdM. The kit described here is intended to be a quick starter architecture for experimenting and tuning models for small production environments or singular solutions within larger production data centers.

Dell PowerScale

An efficient data science team often requires the ability to share massive amounts of data while providing high performance, reliability, and seamless access from multiple operating systems. Dell PowerScale scale-out NAS (network attached storage) provides this critical capability. With its ability to scale capacity and performance easily, PowerScale allows data science teams to collaborate effectively and to share data across different applications and systems. Its flexible deployment options, including on-premises, hybrid cloud, and multi-cloud, provide the necessary agility to adapt to changing business needs and to future proof the deployment.

PowerScale also offers advanced security features, such as file system and volume-level encryption and secure access zones, to ensure the confidentiality and integrity of sensitive data. These features make PowerScale an essential component in any data science team's toolkit, enabling them to efficiently manage, analyze, and derive insights from large amounts of data.

PowerScale all-flash storage platforms, powered by the OneFS operating system, provide a powerful yet simple scale-out storage architecture to speed up access to massive amounts of unstructured data while dramatically reducing cost and complexity. They deliver extreme performance and efficiency for your most demanding unstructured data applications and workloads.

Support any workload

Choose from all-flash, hybrid and archive nodes for the best fit for your data. Run multiple data protocols with simultaneous access to avoid storage silos. Deploy as an on-prem NAS appliance, in APEX, or in the Cloud.

Scalable data management

Scale up, down, or out non-disruptively to tens of petabytes. Manage your storage infrastructure with a single UI with CloudIQ. Manage your datasets across your enterprise.

Protect your data

PowerScale provides built-in availability, redundancy, security, data protection, and replication with OneFS. It offers protection from cyber attacks with integrated ransomware defense and smart AirGap, and is designed for 6x9s availability.

Dell PowerEdge

The latest generation of PowerEdge servers enhances both business agility and time to market, and can support transformational workloads such as databases and analytics, virtualization, software-defined storage, virtual desktop infrastructure (VDI), containerization, HPC, AI, and ML. Dell PowerEdge systems can draw from NVIDIA’s full AI stack — including GPUs, DPUs, and the NVIDIA AI Enterprise software suite — providing enterprises the foundation required for a wide range of AI applications, including speech recognition, cybersecurity, recommendation systems, and a growing number of groundbreaking language-based services.

Performance and scale

Next-generation PowerEdge servers provide improved performance, delivering greater AI inferencing. You can order PowerEdge systems with NVIDIA Bluefield data processing units to provide additional offload, acceleration and workload isolation capabilities that are ideal for power efficiency for private, hybrid, and multicloud deployments.

Designed for sustainability

Dell Smart Flow design is a new feature within the Dell Smart Cooling suite that increases airflow and reduces fan power by up to 52% as compared to previous generation servers. The Smart Flow design supports greater server performance and requires less power to cool systems for more efficient data centers. An array of air movers is available from turnkey to high-end to best meet server cooling needs.

Reliability and security

Dell’s cyber-resilient architecture is a layered security approach, consisting of a web of security solution elements designed to protect, detect, and recover from threats. We enhance supply chain security with the Secured Component Verification (SCV) offering. SCV allows customers to verify cryptographically that the components set in the factory match what was delivered to them.

PowerEdge servers help accelerate Zero Trust adoption within organizations' IT environments. The devices constantly verify access, by assuming that every user and device is a potential threat. At the hardware level, silicon-based hardware root of trust, with elements including the Dell Secured Component Verification (SCV), helps verify supply chain security from design to delivery.

Dell PowerSwitch

Dell Technologies is keenly aware of the challenges that exist in the networking space and what must be done to address the limitations brought on by slow-moving legacy, proprietary networks and their impact on AI initiatives. With Dell Technologies Open Networking, we offer a complete strategy that combines networking scalability and agility with standards-based hardware and innovative, best-in-class software solutions — and the automation tools to streamline a large amount of manual intervention. You’ll be in a better position to meet workflow and application demands with greater network flexibly and control.

Software-Defined Networking

Dell Networking Operating Systems deliver full-featured Software-Defined Networking functionality with Layer 2 and Layer 3 connectivity that meets your needs with software from Dell and Open Networking ecosystem partners.

Software for Open Networking in the Cloud (SONiC)

Dell Technologies offers a finely-tuned, enterprise ready and globally supported distribution of SONiC – called Enterprise SONiC Distribution by Dell Technologies – to help bring the benefits of hyper-scale-focused SONiC contributions to rest of the Enterprise, and Telco markets and use cases.

Architecture

Figure 1. Predictive analytics stack components

Architecture components

Predictive Analytics architectures incorporate a variety of hardware and software components. Dell Technologies offers a large selection of hardware to build such architecture, starting from compute servers with the PowerEdge Family, PowerSwitch for the networking, and PowerScale for distributed storage. In this solution we used Dell PowerEdge servers populated with NVIDIA GPUs, running Ubuntu 20.04 LTS release.

To leverage NVIDIA GPUs, we used the NVIDIA Container Toolkit, which allows users to build and run GPU accelerated containers. For more details about this toolkit, see the NVIDIA website. Finally, we used a customized docker container based on NVIDIA’s TensorFlow Docker image. This image provides a large ecosystem of tools that allows engineers and data scientists to develop ML applications using JupyterLab, TensorFlow, Keras, RAPIDS cuDF libraries, and many more. The biggest interest in this methodology is the flexibility that Docker offers. Users can build and customize their own images and deploy specific Docker containers based on their needs.

Table 1. Core hardware and software components

Component	Description
PowerScale F200	PowerScale F200 delivers the performance of flash storage in a cost-effective form factor to address the needs of a wide variety of workloads. Each node allows you to scale raw storage capacity from 3.84 TB to 30.72 TB and up to 7.7 PB of raw capacity per cluster. The F200 includes in-line compression and deduplication. The minimum number of PowerScale nodes per cluster is three while the maximum cluster size is 252 nodes.
PowerSwitch S5224-ON	The S5200-ON is a complete family of switches:12-port, 24-port, and 48-port 25GbE/100GbE ToR switches, 96-port 25GbE/100GbE Middle of Row (MoR)/End of Row (EoR) switch, and a 32-port 100GbE Multi-Rate Spine/Leaf switch.
PowerEdge R7525	The Dell PowerEdge R7525 is a two-socket, 2U rack-based server that is designed to run complex workloads using highly scalable memory, I/O capacity, and network options. The system is based on the 2nd Gen AMD EPYC processor (up to 64 cores), has up to 32 DIMMs, PCI Express (PCIe) 4.0-enabled expansion slots, and supports up to three double wide 300W or six single wide 75W accelerators.
NVIDIA Container Toolkit	The NVIDIA Container Toolkit allows users to build and run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPUs.
JupyterLab	JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning.

References

Dell PowerEdge Servers
Dell PowerScale Scale-Out File Storage
Dell PowerSwitch Networking

Tags:

Model	NVIDIA L40S
Form factor	PCIe Gen4
GPU architecture	Ada Lovelace
CUDA cores	18176
Memory size	48 GB
Memory type	GDDR6
Base clock	1110 MHz
Boost clock	2520 MHz
Memory clock	2250 MHz
MIG support	No
Peak memory bandwidth	864 GB/s
Total board power	350 W

System Name	PowerEdge R760
Status	Available
System Type	Data Center
Number of Nodes	1
Host Processor Model	Intel Xeon Platinum 8580
Host Processors per Node	2
Host Memory Capacity	16x 96GB 5600 MT/s
Host Storage Capacity	6TB, NVME
Accelerator Model Name	L40S NVIDIA
Accelerator Per Node	2
Accelerator Memory Configuration	48GB, GDDR6

OS	Ubuntu 20.04.6
Software Stack	TensorRT 9.3.0, CUDA 12.3, cuDNN 8.9.6, Driver 545.23.08, DALI 1.28.0
Host Memory Configuration	16x 96GB 5600 MT/s
Framework	TensorRT 9.3.0, CUDA 12.3

Your Browser is Out of Date

Support Predictive Maintenance Analytics Workloads with Dell AI Starter Kit

Summary

Dell PowerScale

Support any workload

Scalable data management

Protect your data

Dell PowerEdge

Performance and scale

Designed for sustainability

Reliability and security

Dell PowerSwitch

Software-Defined Networking

Software for Open Networking in the Cloud (SONiC)

Architecture

Architecture components

References

Related Documents

Accelerating High-Performance Computing with Dell PowerEdge XE9680: A Look at HPL Performance

Executive Summary

Testing

Performance

MLPerf™ Inference v4.0 Performance on Dell PowerEdge R760 with NVIDIA L40S GPUs

Summary

Market positioning

PowerEdge R760 Rack Server

NVIDIA L40S: Ada Lovelace GPU architecture

NVIDIA L40S specifications

MLPerf Benchmark

Test Configuration

Results

Conclusion

Appendix - MLPerf workloads and scenarios

References