Documents (25)

Implementing AI: Dell PowerEdge XE9640 and Intel® Data Center GPU Max 1550

Esther Baldwin-Intel Justin King Yashesh Shroff – Intel

Fri, 12 Apr 2024 15:36:58 -0000

Read Time: 0 minutes

Implementing AI: Dell PowerEdge XE9640 and Intel^® Data Center GPU Max 1550

An AI inference and training POC powered by Dell Technologies

Author: Esther Baldwin – Intel, Yashesh Shroff – Intel, Justin King - Dell

Summary

In the current economic climate, the CIO’s access to infrastructure for Artificial Intelligence (AI) development and delivery is challenging. In addition to the increasing demand for FLOPs to generate new and faster insights for their business, CIO’s face challenges on several fronts. These include the supply chain and lead time for traditional resources and the need to continue to maintain modernized environments which drive growth while reducing the cost of programs and bring forward tangible value.

In addition to modernization challenges for new data management approaches, such as data lakes, CIOs are being asked to support artificial intelligence technology for various uses as it permeates all aspects of the business environment. For many, this is an emerging technology, and they are often under a barrage of marketing and sales information. Dell Technologies, as a trusted advisor, is there to alleviate these pressures and help its customers navigate the complex decisions that turn vision and planning into reality.

With Dell Technologies’ this solution is designed and optimized to give CIOs options. With the Intel® Data Center GPU Max Series, developers working on multiple models for inference and training will find extensive resources to combat today’s competitive landscape. This comprehensive brief will provide an overview of how AI best meets the AI developer's needs.

Dell and Intel have partnered to deliver a server solution powered by Dell infrastructure with the Intel Max Series GPU. The PowerEdge XE9640 offers:

Versatility
Performance
High-density compute rack server
High Bandwidth Memory
X^e Link unified fabric
Largest L2 cache
Availability today

Business Challenges and Benefits

The Intel Max Series 1550 GPU meets industry challenges with flexible options that empower you to deliver everything you would expect from a modern high-performance graphics processing unit.

• Program once - No code changes between Intel ® Xeon® CPU and Max Series GPU

• Intel oneAPI to allow hardware vendor independence - No vendor lock-in

• AI-boosting Intel® Xe Matrix Extensions (XMX) with deep systolic arrays enabling vector and matrix capabilities in a single device

• Solving large problems - Largest L2 Cache for a GPU, 408MB (10x of A100 and 25x of MI250)

• Built-in hardware accelerated Ray Tracing cores, an advantage for visualization

• Xe Link – high-speed coherent, unified fabric offers flexibility to any form factor, enabling scale-up - 16Xe Links for GPU – to-GPU comms

• Advanced manufacturing processes - Modular & Flexible architecture that allows the SoC to be constructed from 47 individual silicon tiles

• High Bandwidth Memory - integrated on the package

• Versatile: Supports both HPC and AI workloads. AI support for popular models such as Resnet, Bert, Cosmic Tagger, Llama 2-(7B, 13B, 70B), GPT-J 6B, BLOOM-176B, and more.

• Available today from Dell

Open (using oneAPI - code is easily ported between CPU and GPU)
Versatile (performant with both HPC and AI workloads)

What is Intel® Data Center GPU Max 1550?

The Intel Max Series GPU provides support for over 60 AI models. It also offers application readiness for business applications in high-performance computing, including energy, life science, and physics as well as top applications in the financial services industry and manufacturing and more. It is available with the Dell PowerEdge XE964.

What do AI developers care about, and where and why will the XE9640 work for them?

AI developers look for several key features in a compute platform, such as performance, versatility, scalability, ease of use, and support for AI frameworks. The Dell XE9640 platform is designed with these needs in mind.

Performance: With the Max 1550 GPUs, the Dell XE9640 platform offers 2.7X peak throughput across various datatypes (FP64, FP32, TF32, BF16/FP16, and INT8).

Versatility: This performance translates into inference and training advantages for the top AI models used for image classification, image segmentation, object detection, natural language processing, speech recognition, speech synthesis, and recommendation. Details of these workloads are provided in the following section.

Scalability: With XeLink, developers can access high-speed connections via GPU-to-GPU fabric on Max 1550, thereby scaling up workloads to four cards on the Dell XE9640 platform. Moreover, ethernet or Infiniband fabric can connect GPUs across nodes in a scale-out configuration.

Ease of use: With numerous models now available for easy onboarding via GitHub (https://github.com/IntelAI/models) and a well-documented oneAPI software stack, developers can start building applications that leverage Intel’s advanced Data Center CPU capabilities, such as AVX512, and Max GPU

AI Frameworks: The oneAPI software stack supports the latest releases of PyTorch and Tensorflow through plugins, IPEX and ITEX, respectively. This makes writing code that runs efficiently on the Max 1550 GPU cards with as little as two lines of change. Find more details on this at https://software.intel.com/.

AI Workloads

Published AI and HPC workloads can be found here. Find detailed guides on running the workloads with supported frameworks and essential open-source libraries which provide developers with the tools and experiences they need to deliver value. The workloads are provided as deployable PyTorch and Tensorflow containers and include sample scripts that minimize deployment time.

Below are sample use cases and associated models:

Enterprise: Llama 2, GPT-J-6B, BLOOM-176, ResNet-50, BERT-Large, and many more
Financial Services: STAC-A2 and FSI Kernels
Life & Material Sciences: LAMMPS Multi-GPU scaling–Tungsten workload, NWChemEx PWDFT, AutoDock, NAMD, RELION
Astrophysics: DPEcho
Physics: 3D GAN for Particle Shower Simulation, DeepGalaxy, QMCPack,
Earth Systems Modelling: SpecFEM3D_Globe Multi-GPU Scaling – Global_s362ani_shakemovie,
ECMWF Cloudsc
Energy: Seismic Kernel Multi-GPU scaling
Manufacturing: CoMLSim, JacobiSolver

Generative AI is of high interest in delivering business impact. The Dell PowerEdge XE9640 has the software and hardware capabilities to drive GenAI use cases in an Enterprise setting. Along with traditional deep learning (CV, RecSys, NLP) models, there is growing support for GenAI workloads such as Llama-2, Mistral for use with inference, fine-tuning, and developing Retrieval Augmented Generation (RAG) pipelines. To see the performance results for a workload that interests you, contact your Dell representative.

oneAPI Software and AI Tools from Intel

Making developers’ life easier is the oneAPI open and standards-based specification which supports multiple architecture types including but not limited to GPU, CPU, and FPGA. The specification defines a set of library interfaces that are commonly used in a variety of workflows.

AI Tools from Intel is a toolkit that provides familiar Python tools and frameworks to data scientists, AI developers, and researchers to accelerate end-to-end data science and analytics pipelines on Intel® architecture, a vital component of the Dell PowerEdge XE9640. The components are built using oneAPI libraries for low-level compute optimizations.

The AI Tools maximize performance from preprocessing through machine learning and provide interoperability for efficient model development. Train on Intel® CPUs and GPUs and integrate fast inference into your AI development workflow with Intel®-optimized deep learning frameworks for TensorFlow and PyTorch, pre-trained models, and model optimization tools. Don’t forget to look at the Intel Distribution for Python with highly optimized scikit-learn which is part of the AI Tools from Intel.

With compute-intensive Python packages, Modin*, scikit-learn*, and XGBoost, you can achieve drop-in acceleration for data preprocessing and machine learning workflows.

For more details, refer to the oneAPI specification page here and the Resources section at the end of the guide, where you can download the oneAPI base toolkit and AI Tools from Intel.

Dell PowerEdge XE9640 Overview

Density-optimized AI acceleration with the Dell PowerEdge delivers real-time insights. Dell’s first liquid-cooled 4-way GPU platform is in the XE9640 2U server. It is designed to drive the latest cutting-edge AI, Machine Learning, and Deep Learning Neural Network applications.

Combines a high core count of up to 56 cores in the 4^th Gen Intel® Xeon® processors and the most GPU. memory and bandwidth available today to break through the bounds of today’s and tomorrow’s AI computing.
The Intel Data Center Max GPU series 1550 600W OAM GPUs is fully interconnected with XeLink.
Ideal 2U form factor building block for dense Supercomputer and HPC acceleration workloads and applications.
Supports Rack Direct Liquid Cooling Infrastructure: Cool IT with 42U XE9640 rack manifold and 48U XE9640 rack manifold.

Security

Security is integrated into every phase of the PowerEdge lifecycle, including a protected supply chain and factory-to-site integrity assurance. Silicon-based root of trust anchors end-to-end boot resilience while Multi-Factor Authentication (MFA) and role-based access controls ensure trusted operations.

Cryptographically signed firmware
Data at Rest Encryption (SEDs with local or external key mgmt)
Secure Boot
Secured Component Verification (Hardware integrity check)
Secure Erase
Silicon Root of Trust
System Lockdown (requires iDRAC9 Enterprise or Datacenter)
TPM 2.0 FIPS, CC-TCG certified, TPM 2.0 China NationZ

Accelerated I/O throughput

Direct liquid-cooled Processors and GPUs enable efficient cooling for the highest performance, efficient power utilization, and lower TCO
Dell Multi-vector cooling manages components to operate optimally
Is the ideal dual-socket 1U rack server for dense scale-out data center computing applications. Benefiting from the flexibility of 2.5” or 3.5” drives, the performance of NVMe, and embedded intelligence, it ensures optimized application performance in a secure platform.

Dell Infrastructure Components

The following Dell components provide the foundation for AI solutions that lend themselves to development and delivery.

Dell PowerScale is an AI-ready data platform designed to easily store, manage, and protect data. Accelerate your AI workloads wherever your unstructured data lives—on-premises, at the edge, and in any cloud.

Dell Unity XT Storage provides flexible hybrid flash storage for cost-sensitive enterprises that want to leverage a combination of flash and disk for lower cost than all flash/NVMe architectures. It supports unified block and file workloads, online upgrades without migrations, guaranteed 3:1 dedupe, and sync replication.

Dell PowerVault Storage is optimized for DAS and SAN applications and supports PowerEdge server capacity expansion via PowerEdge-ready JBODs. It provides management simplicity and low-cost block storage and is ideal for edge and high-capacity data warehouse deployments.

Dell ECS Storage is an enterprise-grade, cloud-scale object storage platform providing comprehensive protocol support for unstructured object and file workloads on a single modern platform. Depending on capacity requirements, either the ECS EX500 or EX5000 may be used.

Dell PowerSwitch Networking switches are based on open standards to free the data center from outdated, proprietary approaches: They support future-ready networking technology that helps you improve network performance, lower network management cost, and complexity, and adopt innovations in networking.

Why Dell Technologies

The technology required for data management and enterprise analytics is evolving quickly, and companies may not have experts on staff or who have the time to design, deploy, and manage solution stacks at the pace required. Dell Technologies has been a leader in AI, Big Data, and advanced analytics for over a decade with proven products, solutions, and expertise. Dell Technologies has teams of application and infrastructure experts dedicated to staying on the cutting edge, testing new technologies, and tuning solutions for your applications to help you keep pace with this constantly evolving landscape.

Dell Technologies is building a broad ecosystem of partners in the data space to bring our customers the necessary experts, resources, and capabilities and accelerate their data strategy. We believe customers should be able to deliver AI innovation using data irrespective of where it resides, across on-prem, public cloud, and edge. By partnering with industry leaders in enterprise data management and analytics, we create optimized solutions for our customers.

Dell Technologies uniquely provides an extensive portfolio of technologies to deliver the advanced infrastructure that underpins successful data implementations. With years of experience and an ecosystem of curated technology and service partners, Dell Technologies provides innovative solutions, servers, networking, storage, workstations, and services that reduce complexity and enable you to capitalize on a data universe.

Proof Points

One of the more recent and fast-growing use cases is Generative AI (GenAI). The following chart shows a sample benchmark of a GenAI workload run on a Dell PowerEdge XE9640 x4 platform:

This figure demonstrates efficient linear scale-up from a single card workload up to four cards for LlaMA-2 7B inference. Details on the workload configuration and environment setup can be found at the Dell Infohub blog: https://infohub.delltechnologies.com/p/expanding-gpu-choice-with-intel-data-center-gpu-max-series/. Scalers AI, A Dell partner, developed and performed the benchmark.

Conclusion

Whether you want to expand your existing capabilities or start your first project, talk to us about your AI vision and what you need.

Your company needs all tools and technologies working in concert to achieve success. Fast, effective systems that complement time management practices are crucial to maximizing every employee hour. High-level data collection and processing that provides rich, detailed analytics can ensure your marketing campaigns strategically target your ideal customers and encourage conversion. To top it off, you need affordable products in a timely fashion that meet your criteria and then some. The XE9640 with the Intel Max 1550 GPU will meet the needs of AI developers.

Understanding AI Language

AI Models

A model is a program that analyzes data. There are many different models in use in AI, and they are specialized for the type of data they analyze. The model being used for this brief is LLaMa V2, a collection of generative AI models that are pretrained and fine-tuned to generate text (can scale from 7 to 70 billion parameters). LLaMa V2 is part of a new trend of having “nimble” models. These models are more customized to specific business needs, smaller, and lower cost to train and deploy.

“Dell uses Llama 2 internally for both experimental work and production deployment. One use case provides a chatbot-style interface to support Retrieval Augmented Generation (RAG) to get information from Dell’s knowledge base of articles. Llama 2 itself is a freely available open-source technology.”^{^[i]}

What is “GenAI”

Generative AI, or GenAI, is a subset of artificial intelligence with the potential to transform the business world due to its ability to create new content from existing data. It is a powerful tool that can generate text, images, videos, and even code, revolutionizing businesses' operations.

Healthcare: generate synthetic data for research without violating privacy regulations.
Research: create new models for chemical compound molecules for pharmaceutical drug discovery. Manufacturing product and part design.
Creativity: create music, fashion design, and product design; edit images; create unique art; provide realistic images and immersive worlds for virtual and augmented reality; augment game development for in-game content creation and game play adaptation.
Natural language understanding and processing: human-like chatbot interaction, virtual assistants.
Software: write code, significantly reducing development time and costs.
Data generation: create synthetic data for training machine learning models and testing edge-based communication systems.

Parameters

A variable that indicates the size of the model. For instance, Llama-2 70B is around 70 billion parameters.

Tokens

A unit that a model uses to compute the length of a text can be pieces of words, punctuation, or emojis. It is used to learn context and semantics. Text is split up into smaller units to be processed, and then new text is generated. One way to measure its capacity is the number of tokens your hardware can process.

References

Read Full Blog

AI
NVIDIA
PowerEdge
ML
XE9680
H100

PowerEdge XE9680 Rack Integration

Robert Babbit Joe Bartole Michelle Chen Delmar Hernandez Ken Salisbury

Tue, 12 Sep 2023 13:22:37 -0000

Read Time: 0 minutes

Introduction

Proper server rack integration is crucial for a data center's efficient and reliable operation. Optimizing space, power, and cooling can reduce downtime, simplify fleet management, improve serviceability, and lower overall costs. However, successful server rack integration requires careful planning, attention to detail, and expertise in server hardware, networking, and system administration.

This paper focuses on the critical aspects of deploying the PowerEdge XE9680 server in your data center. It describes key factors such as selecting the appropriate rack type, sizing the rack to meet current and future needs, installing and configuring the server hardware and related components, and ensuring proper power and cooling.

At Dell Technologies, we understand the importance of meeting our customers where they are. Whether you require full-service rack integration and deployment services or expert advice, we are committed to providing the support you need to achieve your goals. By leveraging our expertise and resources, you can be confident in your ability to implement the server rack integration that meets your unique needs and requirements.

The PowerEdge XE9680

The Dell PowerEdge XE9680 is a high-performance server designed to deliver exceptional performance for machine learning workloads, AI inferencing, and high-performance computing. Table 1 lists key specifications to consider when installing it in a rack.

Table 1. Server specifications

Feature	Technical Specifications
Form Factor	6U Rack Server
Dimensions and Weight	Height — 263.2 mm / 10.36 inches Width — 482.0 mm / 18.98 inches Depth — 1008.77 mm / 39.72 inches with bezel — 995 mm / 39.17 inches without bezel —1075 mm /42.32 inches with Cable Management Arm (CMA) Weight —107 kg / 236 lbs.
Cooling Options	Air Cooling

XE9680 rack integration – critical factors

Server operating environment

The American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) data center specifications focus on temperature and humidity control, optimized air distribution, airflow management, air quality, and energy efficiency. Key recommendations include maintaining appropriate temperature and humidity ranges, implementing hot aisle/cold aisle configurations and containment systems, managing airflow effectively, ensuring high indoor air quality, and adopting energy-efficient technologies.

The Dell PowerEdge XE9680 complies with the A2 Class ASHRAE specifications in Table 2.

Table 2. Operating environment specifications

Product Operation

Product Power Off

Dry-Bulb Temp, °C

Humidity Range, Noncondensing

Max Dew Point, °C

Max Elevation, meters

Max. Rate of Change, °C/hour

Dry-Bulb Temp, °C

Relative Humidity, %

10-35

–12°C DP and 8% rh to 21°C DP and 80% rh

3050

5 to 45

8 to 80

Note: The maximum operating temperature is derated by 1°C per 300m above 900m in altitude.

For optimal performance and reliability, it is recommended to operate within the defined specification ranges. While it is possible to operate at the edge of these ranges, Dell does not recommend continuous operation under such conditions due to potential impacts on performance and reliability.

Cabinet recommendations

When choosing a cabinet, it is important to consider factors such as size, ventilation, cable management, and security. The right cabinet should provide ample space for equipment, efficient airflow to prevent overheating, organized cable routing, and robust physical protection for valuable server hardware. Careful consideration of these factors ensures optimal performance, reliability, and ease of maintenance for your server infrastructure. We recommend the following cabinet specifications for optimal XE9680 installation:

Minimum width of 600mm / 23.62 inches
Minimum depth of 1200mm / 47.24 inches
42 or 48RU height
Rear cable management support
Support for rear facing horizontal or vertical PDUs
To accommodate the depth of server and IO cables, it may be necessary to utilize cabinet extensions depending on cabinet vendor
Side panels for single cabinets

Rack and stack

Installing servers in a rack is a crucial aspect of server management. Proper placement within the rack ensures efficient use of space, ease of access, and optimal airflow. Each server should be securely mounted in the rack, taking into account factors such as weight distribution and cable management. Strategic placement allows for better cooling, reducing the risk of overheating, and prolonging the lifespan of the equipment. Additionally, thoughtful placement enables easy maintenance, troubleshooting, and scalability as the server environment evolves. By giving careful consideration to the placement of servers in a rack, you can create a well-organized and functional setup that maximizes performance and minimizes downtime. We recommend the following:

The PowerEdge XE9680 has a maximum chassis weight of 107kg/236 lbs. It is recommended to install the first XE9680 server in the 1RU location, and to install any additional servers directly above it. This configuration helps maintain a low center of gravity, reducing the risk of cabinet tipping.
For ease of assembly and seismic bracket installation, we recommend starting at the 3RU position when using seismic hardware. It is important for customers to adhere to local building codes, and to ensure that all necessary facility accommodations are in place and that the seismic brackets are correctly installed.

Figure 1. 4x PowerEdge XE9680 servers in a rack

Power distribution recommendations

The PowerEdge XE9680, equipped with H100 GPUs, has an approximate maximum power draw of 11.5kW. It comes with six 2800W Mixed Mode power supply units (PSUs) that feature a C22 input socket.

The XE9680 currently supports 5+1 fault-tolerant redundancy (FTR). (An additional 3+3 FTR configuration will be introduced in the Fall of 2023.) It is important to note that in 3+3 mode, system performance may throttle upon power supply failure to prevent overloading the remaining power supplies.

Figure 2. PowerEdge XE9680 with PDU

For the XE9680, we recommend the following PDU specifications:

Vertical or horizontal PDUs
One circuit breaker per power supply
C19 receptacles

Table 3. PDU specifications

PDU Input Voltage	XE9680s Per Cabinet	PDUs Per Cabinet	Circuit Breakers Per PDU (Min)	Single PDU Requirement (Min)
208V	2	2	6	60A (48A Rated) 17.3kW
208V	2	4	3	30A (24A Rated) 8.6kW
208V	4	2	12	100A (80A Rated) 28.8kW
208V	4	4	6	60A (48A Rated) 17.3kW
400/415V	2	2	6	30A (24A Rated) 16.6kW@400 / 17.3kW@415V
400/415V	2	4	3	20A (16A Rated) 11.1kW@400 / 11.5kW@415V
400/415V	4	2	12	60A (48A Rated) 33.2kW@400 / 34.6kW@415V
400/415V	4	4	6	30A (24A Rated) 16.6kW@400 / 17.3kW@415V

Note: Single PDU Power Requirement = Input Voltage * Current Rating * 1.73.

The factor of 1.73 (the square root of 3) is used to account for three-phase power systems commonly used in data centers and industrial settings. By multiplying the input voltage, current rating, and 1.73, you can determine the power capacity needed for a single PDU to adequately support the connected equipment. This calculation helps ensure that the PDU can handle the power load and prevent overloading or electrical issues.

Optimal thermal management for performance and reliability

Thermal management is important in data centers to ensure equipment reliability, optimize performance, improve energy efficiency, prolong equipment lifespan, and reduce environmental impact. By maintaining appropriate temperature levels, data centers can achieve a balance between operational reliability, energy efficiency, and cost-effectiveness.

Dell Technologies recommends the following best practices for thermal management:

Ensure a cold aisle inlet airflow of 1200 CFM.
If additional equipment is rear-facing, consider using 1 or 2 RU ducts.
Use filler panels for all open front U spaces.
For stand-alone racks, install cabinet side panels to optimize airflow.

The XE9680 is engineered to operate efficiently within ambient temperature conditions of up to 35°C. Although it is technically capable of functioning in such environments, maintaining lower temperatures is highly recommended to ensure the device's optimal performance and reliability. By operating the XE9680 in a cooler environment, the risk of overheating and potential performance degradation can be mitigated, resulting in a more stable and reliable operation overall.

Cabinet cable management

Proper cable management in a server rack improves organization, airflow, accessibility, safety, and scalability. It enhances the reliability, performance, and maintainability of the entire IT infrastructure.

The PowerEdge XE9680 supports Ethernet and InfiniBand network adaptors, which are installed at the front of the server for easy access in cold aisles. To ensure proper cable management, the chosen cabinet solution should provide a minimum clearance of 93.12mm from the face of the network adaptor to the cabinet door. This clearance is necessary to accommodate the bend radius of a typical DAC (Direct Attach Cable) cable (see Figure 3).

Figure 3. DAC clearance recommendations

The maximum cable length in the figure 6 is 2.07 meters or 81.49 inches.

With adjacent racks, it is possible to improve cable management by removing the inner side panels. This alteration provides an open space along the sides of the racks, allowing cables to be conveniently routed between adjacent racks. By eliminating the inner side panels, technicians or IT professionals gain unobstructed access to the interconnecting cables, making it simpler to establish and maintain organized cabling infrastructure.

The following two figures show power cables routed through the optional cable management arm (CMA). The CMA can be mounted to either side of the sliding server rails.

Figure 4. Power cables in cable arm

Network switch

AI server network switches play a crucial role in supporting high-performance and data-intensive artificial intelligence workloads. These switches handle the demanding requirements of AI applications, providing high bandwidth, low latency, and efficient data transfer. They facilitate seamless communication and data exchange between AI servers, to ensure optimal performance and to minimize bottlenecks.

Installing a switch in a rack for servers is vital for establishing a robust and efficient network infrastructure, enabling seamless communication, centralized management, scalability, and optimal performance for the server environment.

The network switch may require offsetting within the rack to accommodate the bend radius of specific networking cables. To achieve this, a bracket can be utilized to push the network switch towards the rear of the rack, creating space for the necessary cable bend radius while ensuring proper installation of the front door. The accompanying images demonstrate the process of using the bracket to adjust the network switch position within the rack. This allows for optimal cable management and ensures the smooth operation of the network infrastructure.

Figure 6. Switch offset brackets

Enterprise Infrastructure Planning Tool (EIPT)

The Dell Enterprise Infrastructure Planning Tool (EIPT) helps IT professionals, plan and tune their computer and infrastructure equipment for maximum efficiency. Offering a wide range of configuration flexibility and environmental inputs, this can help right size your IT environment. EIPT is a model driven tool supporting many products and configurations for infrastructure sizing purposes. EIPT models are based on hardware measurements with operating conditions representative of typical use cases. Workloads can impact the power consumption greatly. For example, the same percent CPU utilization and different workloads can lead to widely different power consumption. It is not possible to cover all the workload, environmental, and customer data center factors in a model and provide a percent accuracy figure with any degree of confidence. With that said, Dell Technologies would anticipate (NOT guarantee or claim) a potential for some variation. Customers are always advised to confirm EIPT estimates with actual measurements under their own actual workloads.

Figure 7. Dell EIPT tool

Dell Deployment Services

Leading edge technologies bring implementation challenges that can be reduced or eliminated with Dell Rack Integration Services. We have the experience and expertise to engineer, integrate, and install your Dell storage, server, or networking solution. Our proven integration methodology will take you step by step from a plan to a ready-to-use solution:

Project management
Solutioning and rack layout engineering
Physical integration and validation
Logistics and installation

Contact your account manager and go to Custom deployment services to learn more.

References

Read Full Blog

VMware
vSAN
XR4000

Dell PowerEdge XR4000 with VMware Edge Compute Stack for Edge Computing

VMware Dell Technologies

Thu, 31 Aug 2023 17:42:58 -0000

Read Time: 0 minutes

Overview

Enterprises want to build and operate applications that have low latency requirements to process and analyze real-time data, and they want to provide intelligence for smarter decision-making at the edge. However, they face many challenges: aging infrastructure, limited edge-computing resources, environmental factors, and lack of IT staff to deploy and support applications across many edge sites.

This document provides an overview of a combined edge platform built on Dell PowerEdge XR servers and VMware Edge Compute Stack to solve these challenges. It describes key use cases in retail, manufacturing, and other industries.

The PowerEdge XR server series is built to capture and process more data at the edge, with enterprise-grade compute abilities providing high performance with low latency for the edge. The XR servers can withstand unpredictable and challenging deployment environments. XR4000 is the new high-performance multi-node XR server, purpose-built for ultra-short depth and low power, and with flexible configurations. These configurations are also available on our Dell vSAN Ready Nodes.

1S Intel® Ice Lake Xeon-D® with integrated security and cyber-resilient architecture
355-mm-deep chassis with wall-mount option
Rugged operating range from –5°C to 55°C (32°F to 131°F)
Flexible 1U and 2U compute sled; self-contained 2-node for VMware vSAN cluster

Edge Compute Stack (ECS) is a fully integrated edge platform for customers with many edge sites. ECS empowers IT and OT to deliver intelligent real-time solutions, offering flexibility, consistency, security, and extensibility:

Flexibility to run virtual and container applications, standard and real-time operating systems
Consistent interoperability across edges, data centers, and clouds
Security to protect applications, users, devices, and data against threats
Open platform that offers component choices and extensibility

This document includes a combined XR4000 and ECS reference architecture validated and supported by Dell Technologies and VMware. It also provides sample configurations for customers and partners to use as a starting point to design and implement the combined edge platform.

Customer use cases

Key use cases for the solution are in the retail, manufacturing, and government sectors.

Retail

Retailers adapted to the pandemic with increased use of self-service checkout and new delivery mechanisms. They are deploying edge applications to improve customer experience and profitability:

Self-checkout—Camera and computer vision solutions help prevent loss from missed scans and switched products or price stickers by instantaneously matching products with prices.
Optimal shelf provisioning—Inventory tracking and data analysis solutions can optimize shelf-provisioning to increase sales.
Immersive experience—Interactive mirrors in apparel stores give customers an immersive experience when they are trying out an item by providing additional colors or variations.
POS—Virtualize and extend the point-of-sale life cycle and realize impactful ROI through faster innovation, a transformative customer experience, and proactive management of retail infrastructure.

The XR4000 and ECS platform provides high flexibility and performance to deploy and run these retail solutions while optimizing expensive retail space and meeting store environmental requirements.

Manufacturing

The Industry 4.0 movement is digitizing manufacturing for greater efficiency and flexibility. Manufacturers are deploying edge applications for the following use cases:

IT/OT convergence—Virtualization of industrialized PCs and programmable logic controllers (PLCs) enabled skilled operators to work from anywhere with low latency while allowing OT and IT applications to run on the same hardware for greater efficiency.
Predictive maintenance—Solutions that use smart sensor data can reduce machine downtime by 50 percent.
Simulated manufacturing—Digital twin software creates a simulation running in parallel to physical machines to optimize operational efficiency.
Quality control—Computer vision can spot defects to increase quality and yield.

The XR4000 and ECS platform provides a foundation for these solutions for machine aggregation and virtualization, OT/IT translation, industrial automation, and AI inferencing.

Government

Defense, law enforcement, and emergency response organizations have specific requirements for tactical and mobile edge deployments:

Tactical edge—Military and civil defense organizations are implementing real-time analytics solutions using ruggedized form factors at the tactical edge.
Mobile edge—Law enforcement and emergency response organizations are adopting vehicle-based mobile edge solutions.

XR4000 is highly portable and hardened for dusty, hot/cold operations. It is tested with NEBS Level 3 and MIL certifications. With ruggedized ATA-compliant compact and mobile systems from Dell OEM partners, the XR4000 and ECS platform is ideal for tactical and mobile edge workloads.

Features

Figure 1 illustrates the combined XR4000 and ECS reference architecture. It consolidates VMs and the Kubernetes management cluster in the central data center. It also includes self-contained 2-node vSAN and TKG Multi-Cloud (TKGm) clusters at every edge site. A purpose-built vSAN witness node XR4000w (Nano Processing Unit, shown in Figure 2) is integrated within several XR4000 chassis options, enabling a highly efficient and reliable edge stack. An optional SD-WAN virtual edge can provide optimal connectivity and additional security. The centralized VMware vCenter and TKG management cluster simplify vSAN and TKGm deployment at the edge sites.

Figure 1. XR4000 and ECS reference architecture

Figure 2. Nano Processing Unit

PowerEdge XR4000 and Edge Compute Stack configurations

Dell PowerEdge XR4000 is a rugged multi-node edge server available in two unique and flexible form factors. The “rackable” chassis supports up to four 1U sleds; the “stackable” chassis supports up to two 2U sleds. The 1U sled is provided for dense compute requirements. The 2U chassis shares the same “1st U” and common motherboard with the 1U sled but includes an additional riser to provide two more PCIe Gen4 FHFL I/O slots. Customers who need additional storage or PCIe expansion can choose a 2U sled option. All XR4000 chassis support both front-to-back and back-to-front airflow.

Sample configurations

The following table provides details for two sample configurations—one rackable and the other stackable.

Table 1. Sample configurations

	Rackable configuration 2 x 2U	Stackable configuration 2 x 1U

Edge Compute Stack (ECS)	VMware ECS Advanced (vSphere Edge, vSAN Standard for Edge, Tanzu Mission Control Advanced), 1/3/5-year term license, up to 128 cores per edge instance
Chassis	Dell PowerEdge XR4000r 2U, 14 inches deep,19 inches wide	Dell PowerEdge XR4000z 2U, 14 inches deep, 10.5 inches wide
Mounting options	Mounting ears to support a standard 19-inch-wide rack	Deployed in desktop, VESA plates, DIN rails, or stacked environments
Power supply	Front port access, dual, hot-plug (1+1), 1400 W, RAF
Operating range	–5°C to 55°C (32°F to 131°F)
Witness node	1 x Dell PowerEdge XR4000w, VMware Certified
Server	2 x Dell PowerEdge XR4520c sleds, VMware Certified	2 x Dell PowerEdge XR4510c sleds, VMware Certified
Server	Total capacity of 2 x 2U sleds	Total capacity of 2 x 1U sleds
Security	Trusted Platform Module 2.0 V3
CPU cores*	32 cores (2 x 1S Intel Ice Lake Xeon-D 16 cores CPU)
Memory*	256 GB (8 x 32 GB RDIMM)	128 GB (8 x 16 GB RDIMM)
Boot drive	2 x BOSS-N1 controller card +  with 2 M.2 960 GB - RAID 1	2 x BOSS-N1 controller card +  with 2 M.2 480 GB - RAID 1
Storage*	15.2 TB (8 x 1.9 TB, SSDR, 2E, M.2)
Network	4 x 10 GbE Base-T or SFP for 4/8 core CPU; 4 x 25 GbE for 12/16 core CPU (integrated)
GPU (optional)	2 x NVIDIA Ampere A2, PCIe, 60 W, 16 GB Passive, Full Height GPU, VMware Certified	Not Applicable
System management	iDRAC9, Dell OpenManage Enterprise Advanced Plus, integration for VMware vCenter

*In a High Availability (HA) 2-node vSAN cluster, for failover to work properly, total consumable CPU, Memory, and Storage for application workloads should not exceed the available resources of a single node.

Engage Dell and VMware

The edge platform built on Dell PowerEdge XR4000 server and VMware Edge Compute Stack aims to help retail, manufacturing, and government customer organizations build and operate applications that provide intelligence for smarter decision-making and deliver immersive digital experiences at the edge. The combined reference architecture and configuration examples described in this document are designed to help our joint customers in designing and implementing a consistent, flexible, secure, and extensible edge solution.

To learn more about the flexible configurations of the Dell XR4000 chassis and compute sleds, see PowerEdge XR Rugged Servers.

For more information about VMware Edge Compute Stack, see VMware Edge Compute Stack and contact the VMware team at edgecomputestack@vmware.com.

References

Read Full Blog

PowerEdge
edge
thermal
XR8000

Understanding Thermal Design and Capabilities for the PowerEdge XR8000 Server

Manya Rastogi Eric Tunks Donald Russell

Thu, 27 Jul 2023 20:40:00 -0000

Read Time: 0 minutes

Summary

This study is intended to help customers understand the behavior of the XR8000 PowerEdge server in harsh environmental conditions at the edge, and its resulting performance.

The need to improve power efficiency and provide sustainable solutions has been imminent for some time. According to a Bloomberg report, in some countries, data centers will account for an estimated 5-10% of energy consumption by 2030. This will include the demand for edge and cloud computing requirements[1]. Dell Technologies continues to innovate in this aspect and has launched its latest portfolio of XR servers for the edge and telecom this year.

The latest offering from the Dell XR portfolio is a series of rugged servers purpose-built for the edge and telecom, especially targeting workloads for retail, manufacturing, and defense. This document highlights the testing results for power consumption and fan speed across the varying temperature range of -5 to 55°C (23F to 122F) by running iPerf3 on the XR8000 server.

About PowerEdge XR8000 – a Flexible, innovative, sled-based, RAN-optimized server

The short-depth XR8000 server, which comes in a sledded server architecture (with 1U and 2U single-socket form factors), is optimized for total cost of ownership (TCO) and performance in O-RAN applications. It is RAN optimized with integrated networking and 1/0 PTP/SyncE support. Its front-accessible design radically simplifies sled serviceability in the field.

The PowerEdge XR8000 server is built rugged to operate in temperatures from -5°C to 55°C for select configurations. (For additional details, see the PowerEdge XR8000 Specification Sheet.)

Figure 1. Dell PowerEdge XR8000

Thermal chamber testing

For the purpose of conducting this test, we placed a 2U XR8000 inside the thermal chamber in our test lab. While in the thermal chamber, we ran the iPerf3 workload on the system for more than eight hours, stressing the system from 5-20%. We measured power consumption and fan speed using iDRAC at 10-degree intervals of Celsius temperature from 0C to 55C.

The iPerf3 throughput measured for 1GB, 10GB, and 25GB seemed consistent across the entire temperature range, with no impact on performance as temperature increased. The fan speed and power consumption increased with temperature, which is the expected behavior.

Figure 2. Thermal chamber in the Dell performance testing lab

System configuration

Table 1. System configuration

Node hardware configuration	Chassis configuration	SW configuration
1 x 6421N (4th Generation Intel® Xeon® Scalable Processors)	2 x 8610t	BIOS	1.1.0
8 x 64GB PC5 4800MT	2 x 1400w PSU	CPLD	1.1.1
1 x Dell NVMe 7400 M.2 960GB		iDRAC	6.10.89.00 Build X15
1 x DP 25GB BCM 57414		CM	1.10
		PCIe SSD	1.0.0
		BCM 57414	21.80.16.92

iPerf3

iPerf3 is an open-source tool for actively measuring the maximum achievable bandwidth on IP networks. It supports the tuning of various parameters related to timing, buffers, and protocols (TCP, UDP, SCTP with IPv4, and SCTP with IPv6). For each test it reports bandwidth, loss, and other parameters. An added advantage of using iPerf3 for testing network performance is that it is very reliable if you have two servers, in geographically different locations, and you want to measure network performance between them. (For additional details about iPerf3, see iPerf - The ultimate speed test tool for TCP, UDP and SCTP.)

Results

Figure 3. Constant networking performance with varying temperature and fan speed

Figure 3 shows that as the temperature and fan speed increases, the iPerf3 throughput stays the same. Fan speed is only 14% for temperatures near 20°C.

Figure 4. Power consumption and fan speed

Figure 4 shows that as temperature increases, Chassis power consumption for the system increases. It is 254W at 20°C.

A deep dive into the results

The consistent performance with increasing temperature and power can be attributed to several design considerations when designing and building these edge/telecom servers:

RAF: The Reverse Airflow option offered in these servers is carried from Dell’s innovation in Multi-Vector Cooling technology. While most of the innovations for MVC center around optimizing thermal controls and management, the physical cooling hardware and its architecture layout help. XR servers are shallower, which can mean less airflow impedance, resulting in more airflow.
Fans: XR servers are designed with high-performance fans, which have significantly increased airflow performance over previous fan generations.
Layout: The T-shape system motherboard layout, along with PSUs that are located at each corner of the chassis, allows improved airflow balancing and system cooling, and consequently, improved system cooling efficiency. This layout also improves PSU cooling due to reduced risk from high pre-heat coming from CPU heat sinks. The streamlined airflow helps with PCIe cooling and enables support for PCIe Gen5 adapters.
Smaller PSU Form Factor: In the 1U systems, a new, narrower, 60mm form factor PSU is implemented to increase the exhaust path space.
XR servers usually support CPUs with higher TCase requirements. TCase stands for Case Temperature and is the maximum temperature allowed at the processor Integrated Heat Spreader (IHS)[2].

For more details about the design considerations used for edge servers, see the blog Computing on the Edge–Other Design Considerations for the Edge.

iDRAC

To best supplement the improved cooling hardware, the PowerEdge engineering team carried on the key features from the previous generation of PowerEdge servers to deliver autonomous thermal solutions capable of cooling next-generation PowerEdge servers.

An iDRAC feature in XR8000 detects Dell PCIe cards and automatically delivers the correct airflow to the slot to cool that card. When non-Dell PCIe cards are detected, the customer is given the option to enter the airflow requirement (LFM – Linear Feet per Minute) as specified by the card manufacturer. iDRAC and the fan algorithm ‘learn’ this information and the card is automatically cooled with the proper airflow. This feature saves power by not having to run the fans to cool the worst-case card in the system. Noise is also reduced.

More information about thermal management, see “Thermal Manage” Features and Benefits.

Figure 5. iDRAC settings to view fan status during our XR8000 testing in the thermal chamber

Conclusion

Dell Technologies is continuing its efforts to test other XR devices and to determine power consumption for various workloads and its variation with changes in temperature. This study is intended to help customers understand the behavior of XR servers in harsh environmental conditions at the edge and their resulting performance.

References

[1] https://stlpartners.com/articles/sustainability/edge-computing-sustainability

[2] https://www.intel.com/content/www/us/en/support/articles/000038309/processors/intel-xeon-processors.html

Read Full Blog

AI
XE9680
Nvidia

Accelerating AI Performance: MLPerf 3.0 Training Results with Dell PowerEdge

Liz Raymond Naye Yoni Frank Han Delmar Hernandez

Wed, 28 Jun 2023 00:02:48 -0000

Read Time: 0 minutes

Executive Summary

The PowerEdge XE9680 is a high-performance server designed and optimized to enable uncompromising performance for artificial intelligence, machine learning, and high-performance computing workloads. Dell PowerEdge has launched our innovative 8-way GPU platform with advanced features and capabilities.

8x NVIDIA H100 80GB 700W SXM GPUs or 8x NVIDIA A100 80GB 500W SXM GPUs
2x Fourth Generation Intel® Xeon® Scalable Processors
32x DDR5 DIMMs at 4800MT/s
10x PCIe Gen 5 x16 FH Slots
8x SAS/NVMe SSD Slots (U.2 or E.3) and BOSS-N1 with NVMe RAID

We are thrilled to share this insightful report that provides performance insights into the exceptional capabilities of the PowerEdge XE9680. Through rigorous testing and evaluation using MLPerf 3.0 benchmarks from MLCommons, this document offers a detailed analysis of the PowerEdge XE9680's outstanding performance in AI model training.

MLPerf is a suite of benchmarks that assess the performance of machine learning (ML) workloads, focusing on two crucial aspects of the ML life cycle: training and inference. This tech note delves explicitly into the training aspect of MLPerf 3.0.

Performance

The Dell performance labs conducted MLPerf 3.0 Training benchmarks using the latest PowerEdge XE9680 with 8x NVIDIA H100 80GB SXM GPUs. For comparison, we also ran these tests on the previous generation PowerEdge XE8545, equipped with 4x NVIDIA A100 80GB SXM GPUs.

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based neural network model introduced by Google in 2018. It is designed to understand and generate human-like text by capturing the context and meaning of words in each sequence. We are thrilled that the PowerEdge XE9680 with H100 GPUs delivered a 6x time-to-train performance improvement in the MLPerf NLP benchmark results using the BERT-large model with the Wikipedia dataset. This translates to accelerated time-to-value as we help our customers unlock the potential of remarkably faster model training.

Please note that throughout this report, a lower time-to-train value indicates improved efficiency and faster model convergence. As you analyze the graphs and performance metrics, remember that achieving lower time-to-train values demonstrates the PowerEdge XE9680's ability to expedite AI model training, delivering enhanced speed and efficiency results.

In MLPerf 3.0, the RetinaNet model leverages the Open Images dataset of millions of diverse images. In this benchmark, we observed an impressive, nearly 6x enhancement in training time for the model.

By utilizing the RetinaNet model with the Open Images dataset, MLPerf enables comprehensive evaluations and comparisons of system capabilities. The scale and diversity of the dataset ensure a robust assessment of object detection performance across various domains and object categories.

The PowerEdge XE9680 consistently delivers remarkable results across the entire MLPerf 3.0 Training benchmark suite, as depicted in the following figure. This robust performance underscores the server's exceptional capabilities and reliability in tackling a wide range of demanding machine learning tasks.

Conclusion

The PowerEdge XE9680 server surpasses our previous generation offering by delivering up to a 6x performance boost. This remarkable advancement translates into significantly accelerated AI model training, enabling your team to complete training tasks faster. To learn more about this server, we encourage you to contact your dedicated account executive or visit www.dell.com.

Table 1. Server configuration

	PowerEdge XE8545	PowerEdge XE9680
CPU	2x AMD EPYC 7763 64-Core Processor	2x Intel® Xeon® 8470 52-core Processor
GPU	4x NVIDIA A100-SXM-80GB (500W)	8x NVIDIA H100-SXM-80GB (700W)

PowerEdge XE8545

PowerEdge XE9680

CPU

2x AMD EPYC 7763 64-Core Processor

2x Intel® Xeon® 8470 52-core Processor

GPU

4x NVIDIA A100-SXM-80GB (500W)

8x NVIDIA H100-SXM-80GB (700W)

Testing conducted by Dell in June of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA H100 SXM4-80GB and PowerEdge XE8545 with 4x NVIDIA A100-SXM-80GB. MLPerf v3.0 Training results in models BERT Large, Mask R-CNN, ResNet, RetinaNet, RNN-T, and 3D U-Net. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information. Individual results will vary.

References

MLPerf - https://mlcommons.org/en/training-normal-30/

Read Full Blog

NVIDIA
PowerEdge
KIOXIA
Artificial Intelligence
Scalers AI
Nexcocbot

Partnership Drives AI Innovation

Dell Technologies Scalers AI‚Ñ¢

Fri, 19 May 2023 19:49:42 -0000

Read Time: 0 minutes

Click here to get the github code!

AI-Driven Pop-up Manufacturing demo: This demo features Dell’s PowerEdge XR7620 Server, purpose-built for the Edge, using a computer vision AI model for the purpose of Defect detection in a flexible and efficient manufacturing process that can be set up almost anywhere.

Read Full Blog

NVIDIA
PowerEdge
HPC
GPU
performance comparison

Up to 29% Higher Inference Performance: PowerEdge R750xa and NVIDIA H100 PCIe GPU

Delmar Hernandez Frank Han

Tue, 11 Apr 2023 22:40:39 -0000

Read Time: 0 minutes

Executive Summary - PowerEdge R750xa

The Dell PowerEdge R750xa, powered by the 3rd Generation Intel® Xeon® Scalable processors, is a dual-socket/2U rack server that delivers outstanding performance for the most demanding emerging and intensive GPU workloads. It supports eight channels/CPU, and up to 32 DDR4 DIMMs @ 3200 MT/s DIMM speed. In addition, the PowerEdge R750xa supports PCIe Gen 4, and up to eight SAS/SATA SSD or NVMe drives.

Up to 29% higher inference performance PowerEdge R750xa and NVIDIA H100 PCIe GPU⁽¹⁾

One platform that supports all of the PCIe GPUs in the PowerEdge portfolio makes the PowerEdge R750xa the ideal server for workloads including AI-ML/DL Training and Inferencing, High-Performance Computing, and virtualization environments. The PowerEdge R750xa includes all of the benefits of core PowerEdge: serviceability, consistent systems management with IDRAC, and the latest in extreme acceleration.

NVIDIA H100 PCIe GPU

The new NVIDIA® H100 PCIe GPU is optimal for delivering the fastest business outcomes with the latest accelerated servers in the Dell PowerEdge portfolio, starting with the R750xa. The PowerEdge R750xa boosts workloads to new performance heights with GPU and accelerator support for demanding workloads, including enterprise AI. With its enhanced, air-cooled design and support for up to four NVIDIA double-width GPUs, the PowerEdge R750xa server is purpose-built for optimal performance for the entire spectrum of HPC, AI-ML/DL training, and inferencing workloads. Learn more here.

Next-Generation GPU Performance Analysis

The Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA® H100 PCIe 310W GPU to the last Gen A00 PCIe GPU in the Dell PowerEdge R750xa. They ran the popular TensorRT Inference benchmark across various batch sizes to evaluate inferencing performance.

The results are in Figure 1.

Figure 1. TensorRT

According to the industry standard TensorRT Inference Resnet50-v1.5 benchmark, the PowerEdge R750xa with NVIDIA's H100 PCIe 310W GPU processes approximately 29% more images per second than the NVIDIA A100 PCIe 300W GPU on the same server across various batch sizes. This significant improvement in image processing speed translates to higher overall throughput for inferencing workloads, making the PowerEdge R750xa with the H100 GPU an excellent choice for demanding applications.

Test Configuration

	R750xa with 4 NVIDIA H100	R750xa with 4 NVIDIA A100
Server	PowerEdge R750xa
CPU	2x Intel(R) Xeon(R) Gold 6338 CPU
Memory	512G system memory
Storage	1x 3.5T SSD
BIOS/iDRAC	1.9.0/6.0.0.0
Benchmark version	TensorRT Inference Resnet50-v1.5
Operating System	Ubuntu 20.04 LTS
GPU	NVIDIA H100-PCIe-80GB (310W)	NVIDIA A100-PCIe-80GB (300W)
Driver	CUDA 11.8	CUDA 11.8

Conclusion

The PowerEdge R750xa supports up to four NVIDIA H100 PCIe adaptor GPUs and is available with new orders or as a customer upgrade kit for existing deployments.

Legal Disclosure

Based on October 2022 Dell labs testing subjecting the PowerEdge R750xa 4x NVIDIA H100 PCIe Adaptor GPU configuration and the PowerEdge R750xa 4x NVIDIA A100 PCIe adaptor GPU configuration to TensorRT Inference Resnet50-v1.5 testing. Actual results will vary.

Read Full Blog

AI
PowerEdge
machine learning
Intel Xeon
ML
Artificial Intelligence
training

Unlocking Machine Learning with Dell PowerEdge XE9680: Insights into MLPerf 2.1 Training Performance

Liz Raymond Naye Yoni Frank Han Delmar Hernandez

Tue, 28 Mar 2023 23:05:15 -0000

Read Time: 0 minutes

Executive Summary

The Dell PowerEdge XE9680 is a high-performance server designed and optimized to enable uncompromising performance for artificial intelligence, machine learning, and high-performance computing workloads. Dell PowerEdge is launching our innovative 8-way GPU platform with advanced features and capabilities.

8x NVIDIA H100 80GB 700W SXM GPUs or 8x NVIDIA A100 80GB 500W SXM GPUs
2x Fourth Generation Intel® Xeon® Scalable Processors
32x DDR5 DIMMs at 4800MT/s
10x PCIe Gen 5 x16 FH Slots
8x SAS/NVMe SSD Slots (U.2) and BOSS-N1 with NVMe RAID

This tech note, Direct from Development (DfD), offers valuable insights into the performance of the PowerEdge XE9680 using MLPerf 2.1 benchmarks from MLCommons.

Testing

MLPerf is a suite of benchmarks that assess the performance of machine learning (ML) workloads, with a focus on two crucial aspects of the ML life cycle: training and inference. This tech note specifically delves into the training aspect of MLPerf.

The Dell CET AI Performance and the Dell HPC & AI Innovation Lab conducted MLPerf 2.1 Training benchmarks using the latest PowerEdge XE9680 equipped with 8x NVIDIA A100 80GB SXM GPUs. For comparison, we also ran these tests on the previous generation PowerEdge XE8545, equipped with 4x NVIDIA A100 80GB SXM GPUs. The following section presents the results of our tests. Please note that in the figure below, a lower number indicates better performance and the results have not been verified by MLCommons.

Performance

Figure 1. MLPERF 2.1 Training

Our latest server, the PowerEdge XE9680 with 8x NVIDIA A100 80GB SXM GPUs, delivers on average twice the performance of our previous-generation server. This translates to faster AI model training, enabling models to be trained in half the time! With the PowerEdge XE9680, you can accelerate your AI workloads and achieve better results, faster than ever before. Contact your account executive or visit www.dell.com to learn more.

Table 1. Server configuration

(1) Testing conducted by Dell in March of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA A100 SXM4-80GB and PowerEdge XE8545 with 4x NVIDIA A100-SXM-80GB. Unverified MLPerf v2.1 BERT NLP v2.1, Mask R-CNN object detection, heavy-weight v2.1 COCO 2017, 3D U-Net image segmentation v2.1 KiTS19, RNN-T speech recognition v2.1 rnnt Training. Result not verified by MLCommons Association. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.” Actual results will vary.

Read Full Blog

AI
PowerEdge
Intel Xeon
Artificial Intelligence
inferencing

Accelerating AI Inferencing with Dell PowerEdge XE9680: A Performance Analysis

Liz Raymond Naye Yoni Frank Han Delmar Hernandez

Tue, 28 Mar 2023 23:05:16 -0000

Read Time: 0 minutes

Executive Summary

8x NVIDIA H100 80GB 700W SXM GPUs or 8x NVIDIA A100 80GB 500W SXM GPUs
2x Fourth Generation Intel® Xeon® Scalable Processors
32x DDR5 DIMMs at 4800MT/s
10x PCIe Gen 5 x16 FH Slots
8x SAS/NVMe SSD Slots (U.2) and BOSS-N1 with NVMe RAID

This Direct from Development (DfD) tech note provides valuable insights on AI inferencing performance for the recently launched PowerEdge XE9680 server by Dell Technologies.

Testing

To evaluate the inferencing performance of each GPU option available on the new PowerEdge XE9680, the Dell CET AI Performance Lab, and the Dell HPC & AI Innovation Lab selected several popular AI models for benchmarking. Additionally, to provide a basis for comparison, they also ran benchmarks on our last-generation PowerEdge XE8545. The following workloads were chosen for the evaluation:

BERT-large (Bidirectional Encoder Representations from Transformers) – Natural language processing like text classification, sentiment analysis, question answering, and language translation
- XE8545 Batch Size 512
- XE9680-A100 Batch Size 512
- XE9680-H100 Batch Size 1024
ResNet (Residual Network) – Image recognition. Classify, object detection, and segmentation
- XE8545 Batch Size 2048
- XE9680-A100 Batch Size 2048
- XE9680-H100 Batch Size 2048
RNNT (Recurrent Neural Network Transducer) – Speech recognition. Converts audio signal to words
- XE8545 Batch Size 2048
- XE9680-A100 Batch Size 2048
- XE9680-H100 Batch Size 2048
RetinaNET – Object detection in images
- XE8545 Batch Size 16
- XE9680-A100 Batch Size 32
- XE9680-H100 Batch Size 16

Performance

The results are remarkable! The PowerEdge XE9680 demonstrates exceptional inferencing performance!

+300%: PowerEdge XE9680 NVIDIA A100 to H100 performance⁽¹⁾

+700%: When compared to PowerEdge XE8545⁽²⁾

Comparing the NVIDIA A100 SXM configuration with the NVIDIA H100 SXM configuration on the same PowerEdge XE9680 reveals up to a 300% improvement in inferencing performance! ⁽¹⁾

Even more impressive is the comparison between the PowerEdge XE9680 NVIDIA H100 SXM server and the XE8545 NVIDIA A100 SXM server, which shows up to a 700% improvement in inferencing performance! ⁽²⁾

Here are the results of each benchmark. In all cases, higher is better.

With exceptional AI inferencing performance, the PowerEdge XE9680 sets a high benchmark for today’s and tomorrow's AI demands. Its advanced features and capabilities provide a solid foundation for businesses and organizations to take advantage of AI and unlock new opportunities.

Contact your account executive or visit www.dell.com to learn more.

Table 1. Server configuration

(1) Testing conducted by Dell in March of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA H100 SXM5-80GB and PowerEdge XE9680 with 8x NVIDIA A100 SXM4-80G. Actual results will vary.

(2) Testing conducted by Dell in March of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA H100 SXM5-80GB and PowerEdge XE8545 with 4x NVIDIA A100-SXM-80GB. Actual results will vary.

Read Full Blog

AI
NVIDIA
PowerEdge
Intel Xeon
Artificial Intelligence

Accelerating High-Performance Computing with Dell PowerEdge XE9680: A Look at HPL Performance

Frank Han Delmar Hernandez

Tue, 28 Mar 2023 23:05:16 -0000

Read Time: 0 minutes

Executive Summary

8x NVIDIA H100 80GB 700W SXM GPUs or 8x NVIDIA A100 80GB 500W SXM GPUs
2x Fourth Generation Intel® Xeon® Scalable Processors
32x DDR5 DIMMs at 4800MT/s
10x PCIe Gen 5 x16 FH Slots
8x SAS/NVMe SSD Slots (U.2) and BOSS-N1 with NVMe RAID

This Direct from Development (DfD) tech note offers valuable performance insights for High-Performance Linpack (HPL), a widely accepted benchmark for measuring HPC system performance.

Testing

The TOP500 list frequently relies on HPL to assess and rank supercomputer performance. Utilizing the Linpack library, HPL measures FLOPS (floating-point operations per second) by creating and solving linear equations, making it a reliable benchmark for evaluating HPC system efficiency.

The Dell HPC & AI Innovation Lab used HPL to compare the performance of the PowerEdge XE9680 to our last generation PowerEdge XE8545. There are two key differentiators between the servers that affect HPL performance here: the quantity and model of GPUs supported by each platform.

Regarding GPU configuration, the PowerEdge XE9680 was equipped with 8x H100 80GB SXM GPUs, while the PowerEdge XE9680 was outfitted with 4x A100 80GB SXM GPUs.

Performance

In the HPL benchmark, the PowerEdge XE9680 equipped with NVIDIA's latest H100 80GB SXM GPU outperforms the PowerEdge XE8545 by an impressive 543% more TeraFLOPS!¹

The PowerEdge XE9680, with the latest NVIDIA H100 SXM GPU, advances HPC performance. With exceptional HPL performance, the PowerEdge XE9680 sets a high benchmark for today’s and tomorrow's HPC demands. Contact your account executive or visit www.dell.com to learn more.

Table 1. Server configuration

Testing conducted by Dell in March of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA H100 SXM5-80GB and PowerEdge XE8545 with 4x NVIDIA A100-SXM-80GB. Actual results will vary.

Read Full Blog

PowerEdge
machine learning
edge
Intel 4th Gen Xeon

Next-Generation Dell PowerEdge XR5610 Machine Learning Performance

Frank Han Rakshith Vasudev Manya Rastogi

Fri, 03 Mar 2023 20:01:50 -0000

Read Time: 0 minutes

Summary

Dell Technologies has recently announced the launch of next-generation Dell PowerEdge servers that deliver advanced performance and energy-efficient design.

This Direct from Development Tech Note describes the new capabilities you can expect from the next generation of PowerEdge servers. It discusses the test and results for machine learning (ML) performance of the PowerEdge XR5610 using the industry-standard MLPerf Inference v2.1 benchmarking suite. The XR5610 has target workloads in networking and communication, enterprise edge, military, and defense—all key workloads requiring AI/ML inferencing capabilities at the edge.

The 1U single-socket XR5610 is an edge-optimized short-depth rugged 1U server powered by 4th Generation Intel® Xeon® Scalable processors with the MCC SKU stack. It includes the latest generation of technologies, with slots up to 8x DDR5 and two PCIe Gen5x16 card slots, and is capable of 46 percent faster image classification (reduced latency) workload as compared to the previous-generation PowerEdge XR12.

PowerEdge XR5610—Designed for the edge

Edge computing, in essence, brings compute power close to the source of the data. As Internet of Things (IoT) endpoints and other devices generate more and more time-sensitive data, edge computing becomes increasingly important. Machine learning (ML) and artificial intelligence (AI) applications are particularly suitable for edge computing deployments. The environmental conditions for edge computing are typically vastly different than those at centralized data centers. Edge computing sites, at best, might consist of little more than a telecommunications closet with minimal or no HVAC.

Dell PowerEdge XR5610 is a rugged, short-depth (400 mm class) 1U server for the edge, designed for deployment in locations constrained by space or environmental challenges. It is well suited to operate at high temperatures ranging from –5^°C to 55^°C (23^°F to 131^°F) and designed to excel with telecom vRAN workloads, military and defense deployments, and retail AI including video monitoring, IoT device aggregation, and PoS analytics.

Figure 1. Dell PowerEdge XR5610 – 1U

The emerging technology of edge intelligence

According to a recent Forrester report, “Edge intelligence, a top 10 emerging technology in 2022, helps capture data, embed inferencing, and connect insight in a real-time network of application, device, and communication ecosystems.”

Figure 2. Forrester report excerpt, reprinted with permission

MLPerf Inference workload summary

MLPerf Inference is a multifaceted benchmark framework, measuring four different workload types and three processing scenarios. The workloads are image classification, object detection, medical imaging, speech-to-text, and natural language processing (BERT). The processing scenarios, as outlined in the following table, are single stream, multistream, and offline.

Table 1. MLPerf Inference benchmark scenarios

Scenario	Performance metric	Use case
Single stream	90th latency percentile	Search results. Waits until the query is made and returns the search results. Example: Google voice search
Multistream	99th latency percentile	Multicamera monitoring and quick decisions. Acts more like a CCTV backend system that processes multiple real-time streams and identifies suspicious behaviors. Example: Self-driving car that merges all multiple camera inputs and makes drive decisions in real time
Offline	Measured throughput	Batch processing, also known as offline processing. Example: Google Photos service that identifies pictures, tags people, and generates an album with specific people and locations or events offline

The MLPerf suite for inferencing includes the following benchmarks:

Table 2. MLPerf suite for inferencing benchmarks

Area	Task	Model	Dataset	QSL size	Quality
Vision	Image classification	Resnet50-v1.5	ImageNet (224x224)	1024	99% of FP32 (76.46%)
Vision	Object detection	Retinanet	OpenImages (800x800)	64	99% of FP32 (0.3755 mAP)
Vision	Medical image segmentation	3D UNET	KiTS 2019	42	99% of FP32 and 99.9% of FP32 (0.86330 mean DICE score)
Speech	Speech-to-text	RNNT	Librispeech dev-clean (samples < 15 seconds)	2513	99% of FP32 (1 – WER, where WER=7.452253714852645%)
Language	Language processing	BERT	SQuAD v1.1 (max_seq_len=384)	10833	99% of FP32 (f1_score=90.874%)

MLPerf Inference performance

The following table outlines the key specifications of the PowerEdge XR5610 that was used for the MLPerf Inference test suite.

Table 3. Dell PowerEdge XR5610 key specifications for MLPerf Inference test suite

Component	Specifications
CPU	4th Gen Intel Xeon Scalable processors MCC SKU
Operating system	CentOS 8.2.2004
Memory	256 GB
GPU	NVIDIA A2
GPU count	1
Networking	1x ConnectX-5 IB EDR 100 Gbps
Software stack	TensorRT 8.4.2 CUDA 11.6 cuDNN 8.4.1 GPU driver 510.73.08 DALI 0.31.0
Storage	NVMe SSD 1.8 TB

Table 4 shows the specifications of the NVIDIA GPUs that were used in the benchmark tests.

Table 4. NVIDIA GPUs tested

GPU model	GPU memory	Maximum power consumption	Form factor	2-way bridge	Recommended workloads
PCIe adapter form factor
A2	16 GB GDDR6	60 W	SW, HHHL, or FHHL	Not applicable	AI inferencing, edge, VDI

The edge server offloads the image processing to the GPU, and, just as servers have different price/performance levels to suit different requirements, so do GPUs. XR5610 supports up to 2x SW GPUs, as did the previous-generation XR11.

XR5610 was tested with the NVIDIA A2 GPU for the entire range of MLPerf workloads on the offline scenario. The following figure shows the results of the testing.

Figure 3. NVIDIA A2 GPU test results for MLPerf offline scenario

XR5610 also was tested with the NVIDIA A2 GPU for the entire range of MLPerf workloads on the single stream scenario. The following figure shows the results of that testing.

Figure 4. NVIDIA A2 GPU test results for MLPerf single stream scenario

In some tasks/workloads, the XR5610 showed improvement over previous generations, resulting from the integration of new technologies such as PCIe Gen 5.

Image classification

The PowerEdge XR5610 delivered 46 percent better image classification latency compared to the prior-generation PowerEdge server, as shown in the following figure.

Figure 5. Image classification latencies: XR5610 and prior-generation PowerEdge server

Speech to text

The Dell XR5610 delivered 15 percent better speech-to-text throughput compared to the prior-generation PowerEdge server, as shown in the following figure.

Figure 6. Speech to text latencies: XR5610 and prior-generation PowerEdge server

Conclusion

The PowerEdge XR portfolio continues to provide a streamlined approach for various edge and telecom deployment options based on different use cases. It provides a solution to the challenge of a small form factor at the edge with industry-standard rugged certifications (NEBS), providing a compact solution for scalability and for flexibility in a temperature range of –5°C to +55°C.

References

MLPerf Inference benchmark

Notes:

Based on testing conducted in Dell Cloud and Emerging Technology lab in January 2023. Results to be submitted to MLPerf in Q2, FY24.
Unverified MLPerf v2.1 Inference. Result not verified by MLCommons Association. MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

Read Full Blog

PowerEdge
Intel Xeon
CPU
Intel 4th Gen Xeon

Next-Generation Dell PowerEdge XR Server CPU Improvements

Manya Rastogi Donald Russell

Fri, 03 Mar 2023 19:57:24 -0000

Read Time: 0 minutes

Summary

Dell Technologies has recently introduced the next generation of Dell PowerEdge XR servers. Powered by 4th Gen Intel® Xeon® Scalable processors with the MCC SKU stack, these servers deliver advanced performance in an energy-efficient design. Dell continues to provide scalability and flexibility with its latest portfolio of short-depth XR servers. These servers integrate technologies such as 4th Gen Intel CPUs, PCIe Gen5, DDR5, NVMe drives, and GPU slots, and they are compliance-tested for NEBS and MIL-STD.

This tech note discusses our CPU performance benchmark testing of the next-generation PowerEdge XR server portfolio and the test results that show improvements over previous PowerEdge XR servers powered by 3rd Gen Intel Xeon Scalable processors and Xeon D processors.

Benchmarks

4th Gen Intel Xeon Scalable processors with the MCC SKU stack were tested using the STREAM and HPL benchmarks and compared with the CPU of the previous generation of XR servers.

STREAM

The STREAM benchmark is a simple, synthetic benchmark designed to measure sustainable memory bandwidth (in MB/s) and a corresponding computation rate for four simple vector kernels: Copy, Scale, Add, and Triad. The STREAM benchmark is designed to work with datasets much larger than the available cache on any system so that the results are (presumably) more indicative of the performance of very large, vector-style applications. Ultimately, we get a reference for compute performance.

HPL

HPL is a high-performance LINPACK benchmark implementation. The code solves a uniformly random system of linear equations and reports time and floating-point operations per second using a standard formula for operation count. It also helps to provide a reference for a system’s compute speed performance.

Performance results

Benchmark testing showed significant performance increases with the 4th Gen Intel Xeon Scalable MCC SKU stack when it was compared with both the Intel Xeon D SKU and the 3rd Gen Intel Xeon Scalable MCC SKU.

Comparison of 4th Gen Intel Xeon Scalable MCC SKU with Intel Xeon D SKU

In our tests, the single-socket PowerEdge XR servers with the 4th Gen Intel Xeon Scalable CPU (32 core) MCC SKU stack delivered a 253 percent increase in HPL performance and a 182 percent increase in STREAM performance. Thus, these servers are faster at the network edge or enterprise edge than the previous-generation PowerEdge XR servers powered by the Intel Xeon D (16 core) SKU.

Figure 1 and Figure 2 show the results of the benchmark tests that compared the performance of the 4thGen Intel Xeon Scalable processor MCC SKU stack with the Intel Xeon D SKU.

Figure 1. HPL performance comparison: Intel Xeon D SKU and 4th Gen Intel Xeon Scalable MCC SKU

Figure 2. STREAM performance comparison: Intel Xeon D SKU and 4th Gen Intel Xeon Scalable MCC SKU

Comparison of 4th Gen and 3rd Gen Intel Xeon Scalable MCC SKU

In our tests, the single-socket PowerEdge XR servers with the 4th Gen Intel Scalable CPU (32 core) MCC stack delivered a 52 percent increase in STREAM performance and a 72 percent increase in CPU FP rate base performance (floating point performance for the CPU). Thus, these servers are faster for compute at the network edge or enterprise edge than the previous generation of PowerEdge XR servers powered by the 3rd Gen Intel Xeon Scalable MCC SKU.

Figure 3 and Figure 4 show the results of the benchmark tests that compared the performance of the 4thGen and 3rd Gen Intel Xeon Scalable processor MCC SKU stacks.

Figure 3. STREAM performance for 4th and 3rd Gen Intel Xeon Scalable processors

Figure 4. CPU FP rate base performance for 4th and 3rd Gen Intel Xeon Scalable processors

Conclusion

The Dell PowerEdge XR portfolio continues to provide CPU-based improvements and a streamlined approach for various edge and telecom deployment options. The XR portfolio provides a solution to the challenge of needing a small form factor at the edge with industry-standard rugged certifications (NEBS). It provides a compact solution for scalability along with flexibility for operating in temperatures ranging from –5°C to +55°C.

References

Read Full Blog

PowerEdge
telecom
Intel 4th Gen Xeon

PTP and SyncE on Dell Next-Generation Servers

Pratik Sarkar Manya Rastogi

Fri, 03 Mar 2023 19:57:24 -0000

Read Time: 0 minutes

Summary

The telecom industry is on a journey of transformation, making pitstops to disaggregate hardware and software, virtualize networks, and introduce cloudification across RAN and core domains. The introduction of 5G and ORAN has accelerated the transformation, and we now see telecom becoming a universal phenomenon and touching all aspects of life.

This telecom evolution opened a number of opportunities for CSPs to diversify their revenue streams, but it also introduced stringent technological implementations. To support higher bandwidth and mMIMO technologies in new-generation systems, solution development teams were faced with strict requirements of latency and synchronization.

In this tech note, we discuss synchronization systems in 5G and ORAN fronthaul interfaces, and next-generation Dell PowerEdge support for synchronization standards.

Note: This document may contain language from third-party content that is not under Dell Technologies’ control and is not consistent with current guidelines for Dell Technologies’ own content. When such third-party content is updated by the relevant third parties, this document will be revised accordingly.

Network synchronization

Telecom networks have always required proper and accurate synchronization for handover between different cell sites, reducing interference and increasing performance at cell edge. 2G, 3G, and 4G networks all required a certain level of synchronization, but 5G requires timing in the range of nanoseconds. This enables features such as beamforming and time-division duplexing to function accurately.

Telecom systems generally work based on the following synchronization methods:

Frequency synchronization: Frequency synchronization ensures that the frequency of the local clock at the 5G/4G cell site is the same as the PRTC. The quality of frequency synchronization is measured in parts per billion (ppb) by the difference between the actual and desired frequency.
Time synchronization: Time synchronization refers to the distribution of time across clocks in a network. Time synchronization is one way of achieving phase synchronization.
Phase synchronization: Phase synchronization takes care of the timing signals occurring at the same time. It ensures that the rising edge of each clock input occurs at the same time. The quality of phase synchronization is measured in microseconds (ms) or nanoseconds (ns) by the difference between the timing of the signals.

The following table lists the telecom requirements for the frequency and phase specifications:

Table 1. Telecom technologies with standard units

Telecom technology	Frequency air interface/network	Phase/time air interface/network
GSM	16 ppb/50 pbb	-
LTE-FDD	16 ppb/50 ppb	-
LTE-TDD	16 ppb/50 ppb	1500 ns
LTE-A	16 ppb/50 ppb	500 ns
5G	16 ppb/50 ppb	65 ns

Methods of synchronization transmission

Different standards apply to the transmission of the synchronization signals, as outlined in the following table:

Table 2. Synchronization methods for distribution standards

Synchronization distribution standard	Time synchronization	Frequency synchronization	Phase synchronization
PTP (IEEE 1588)	Yes	Yes	Yes
SyncE	No	Yes	No
GNSS	Yes	Yes	Yes
NTP	Yes	Yes	Yes

Fronthaul synchronization requirements for 5G in ORAN

O-RAN ALLIANCE has defined four synchronization topologies— LLS-C1 , LLS-C2 , LLS-C3 and LLS-C4—to address different deployment topologies in telecom networks. The following figure shows a typical synchronization flow diagram with synchronization from PRTC flowing to end cell sites:

Figure 1. ORAN synchronization overview

PRTC: Primary Reference Time Clock
GNSS: Global Navigation Satellite System
T-GM: Telecom Grand Master Clock

Stay tuned for another tech note from Dell with more details about synchronization technologies.

In telecom systems, synchronization is delivered by various mechanisms:

Synchronization signals are delivered from GNSS directly to each node in the network. This system is accurate, but it becomes quite costly.
Synchronization signals are delivered from a centralized PRTC/T-GM in-band over a transport network.
Synchronization signals are delivered by a dedicated synchronization transport network that is separate from the transport network used to carry user and control signals.

The transport network for carrying synchronization can be either the backhaul network used to carry traffic or a dedicated network for transporting synchronization signals.

In 5G and ORAN, gNBs need frequency, phase, and time synchronization. The following two protocols are used for transporting synchronization signals over a packet-based network:

Precision Time Protocol (PTP)
Synchronous Ethernet (SyncE)

The same packed-based transport network can be used to carry users and control plane traffic.

PTP

PTP, defined by the IEEE 1588 standard, was developed to provide accurate distribution of time and frequency over a packet-based network. A PTP synchronization system is composed of PTP-aware devices and non-PTP-aware devices.

The following table describes the clock types in PTP:

Table 3. Types of clocks in PTP

Clock type	Definition	Usage
Telecom grandmaster (T-GM )	The master clock at the start of a PTP domain. It is typically located at the core network.	At the beginning of the network to provide timing signals to network.
Telecom boundary clock (T-BC)	Clock that can act both as a slave and master clock. It takes the sync signal from the master, adjusts for the delay, and generates a new master synchronization signal to pass downstream to the next device.	When the synchronization signal needs to travel through multiple nodes across a long distance.
Telecom transparent clock (T-TC)	Clock that timestamps a synchronization packet message and sends it forward to the secondary device. It enables the secondary device to calculate the delay of the network.	For scenarios where timing signals are passing through switches.
Telecom time slave clock (T-TSC)	The end device that receives the synchronization signal.	In telecom, the end node that receives the synchronization signals.

The following figure illustrates how various types of clocks in PTP interact with each other:

Figure 2. Types of clocks in PTP

In 5G and ORAN, PTP generally works with two types of timing profiles, G8275.1 and G8275.2 . As shown in the following figure, G8725.1 is the profile where all devices are PTP-aware devices, and G8275.2 is the profile where are all devices can be, but might not be, PTP-aware devices. Figure 3. G8275.1 and G8275.2 timing profiles

Why do ORAN and 5G need two PTP profiles? One reason is the use case and implementation perspective of the CSP, as outlined in the following table:

Table 4. PTP telecom profiles

PTP Telecom Profile	Description	Usage
G8275.1 (full timing support)	In G825.1, each transport node has to support PTP and SyncE. Each node updates the synchronization messages and participates in PTP, acting as a T-BC. G8275.1 provides the most accurate timing and has a predictable performance.	Used in ORAN and 5G fronthaul, which has strict and accurate synchronization requirements in frequency and phase. GNSS receivers are not required at each site.
G8275.2 (partial timing support)	G8275.2 can operate over an existing network that does not have PTP-aware nodes. It can be implemented in existing network infrastructure. Nodes are not required to participate in PTP. The synchronization and timing accuracy are not optimal.	Usage is limited to small and last-mile deployments. Used for deployments over existing networks. T-BCs are placed at strategic locations in the synchronization flow path to reduce noise.

SyncE

SyncE is a synchronization technology that enables the transfer of synchronization signals at the physical layer. It is used to provide accurate and stable frequency synchronization between the different components of a network architecture. Over the fronthaul interface in ORAN and 5G, both SyncE and PTP are used together to provide nanosecond-level synchronization accuracy. SyncE can deliver frequency synchronization, but it cannot deliver phase and time synchronization. It functions independently of the network load and supports the transfer of sync signals where all the interfaces on the intermediate path must support SyncE.

Why fronthaul synchronization requires both PTP and SyncE

In ORAN, for topology architectures LLS-C1, LLS-C2, and LLS-C3, both SyncE and PTP are used on the fronthaul interface between DU and RU or in the mid-haul interface between CU and DU. When PTP itself can cater to frequency, time, and phase synchronization, why do we need SyncE along with PTP?

The answer is that using both PTP and SyncE delivers these advantages:

Higher accuracy: SyncE is a physical layer protocol that can deliver a stable and consistent frequency. Combining the stability of SyncE for frequency synchronization with the precision of PTP for time and phase synchronization provides for a very accurate clock.
High availability: When a PTP clock fails, the node can fall back to the SyncE frequency clock and advance the time more accurately and for longer, rather than relying only upon the internal oscillator. Also, when the PTP clock source fails, locking to a new PTP source can occur quickly when SyncE is already in use. Without SyncE, PTP can take more time to identify the actual time and lock to a new clock source.

Implementation of PTP and SyncE in next-generation PowerEdge servers for telecom applications

Next-generation Dell PowerEdge servers come with Intel NIC cards such as Westport Channel and Logan Beach. All these NIC cards are timing aware and can be used to provide synchronization to downstream nodes. Because these servers can be positioned both as CU and DU, and support LLS-C1 , LLS-C2, and LLS-C3 deployment, support of SyncE and PTP makes these servers an apt choice for RAM and edge deployments.

Conclusion

Dell Technologies continues to provide best-in-class features and specifications for its constantly evolving PowerEdge server portfolio for telecom. The PowerEdge XR8000 (430 mm depth) and XR5610 (463 mm depth) provide scalability and flexibility, with the latest technologies for PCIe, storage, memory, I/O, and even node-chassis infrastructure in a dense (SA1) form factor. With support for PTP and SyncE technologies, these next-generation PowerEdge servers provide essential infrastructure support at the edge.

Read Full Blog

PowerEdge
machine learning
GPU
performance metrics
ML
Intel 4th Gen Xeon

Multi-Instance GPU on the Edge

Frank Han Rakshith Vasudev Manya Rastogi

Fri, 03 Mar 2023 19:57:25 -0000

Read Time: 0 minutes

Summary

Dell has recently announced the launch of Next-generation Dell PowerEdge servers that deliver advanced performance and energy efficient design.

This Direct from Development (DfD) tech note describes the new capabilities you can expect from the next-generation Dell PowerEdge servers powered by Intel 4th Gen Intel® Xeon® Scalable processors MCC SKU stack. This document covers the test and results for ML performance benchmarking for the offline scenario on Dell’s next generation PowerEdge XR 7620 using Multi-Instance GPU technology. XR7620 has target workloads in manufacturing, retail, defense, and telecom - all key workloads requiring AI/ML inferencing capabilities at the edge. Dell continues to provide scalability and flexibility with its latest short-depth XR servers portfolio, integrated with the latest technologies such as 4th Gen Intel CPU, PCIe Gen5, DDR5, NVMe drives, and GPU slots, along with compliance testing for NEBS and MIL-STD.

MIG overview

Multi-Instance GPU (MIG) expands the performance and value of NVIDIA H100, A100, and A30 Tensor Core GPUs. MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores. This gives administrators the ability to support every workload, from the smallest to the largest, with guaranteed quality of service (QoS) and extending the reach of accelerated computing resources to every user.

Without MIG, different jobs running on the same GPU, such as different AI inference requests, compete for the same resources. A job-consuming larger memory bandwidth starves others, resulting in several jobs missing their latency targets. With MIG, jobs run simultaneously on different instances, each with dedicated resources for compute, memory, and memory bandwidth, resulting in a predictable performance with QoS and maximum GPU utilization.

Figure 1. Seven different instances with MIG

MIG at the edge

Dell defines edge computing as technology that brings compute, storage, and networking closer to the source where data is created. This enables faster processing of data, and consequently, quicker decision making and faster insights. For edge use cases such as running an edge server on a factory floor or in a retail store, requires multiple applications to run simultaneously. One solution to solve this problem can be to add a piece of hardware for each application, but this solution is not scalable or sustainable in the long run. Thus, deploying multiple applications on the same piece of hardware is an option but it can cause much higher latency for different applications.

With multiple applications running on the same device, the device time-slices the applications in a queue so that applications are run sequentially as opposed to concurrently. There is always a delay in results while the device switches from processing data for one application to another.

MIG is an innovative technology to use in such use cases for the edge, where power, cost, and space are important constraints. AI inferencing applications such as computer vision and image detection need to run instantaneously and continuously to avoid any serious consequences due to lack of safety.

Jobs running simultaneously with different resources result in predictable performance with quality of service and maximum GPU utilization. This makes MIG an essential addition to every edge deployment.

MIG can be used in a multitenant environment. It is different from virtual GPU technology because MIG is hardware based, which makes edge computing even more secure.

Provision and configure instances as needed

A GPU can be partitioned into different-sized MIG instances. For example, in an NVIDIA A100 40GB, an administrator could create two instances with 20 gigabytes (GB) of memory each, three instances with 10GB each, or seven instances with 5GB each, or a combination of these.

MIG instances can also be dynamically reconfigured, enabling administrators to shift GPU resources in response to changing user and business demands. For example, seven MIG instances can be used during the day for low-throughput inference and reconfigured to one large MIG instance at night for deep learning training.

System Config for Next Generation Dell PowerEdge XR Server MIG Testing

Table 1. System architecture

MLPerf system suite type	Edge
Operating System	CentOS 8.2.2004
CPU	4th Gen Intel Xeon Scalable processors MCC SKU
Memory	512GB
GPU	NVIDIA A100
GPU Count	1
Networking	1x ConnectX-5 IB EDR 100Gb/Sec
Software Stack	TensorRT 8.4.2 CUDA 11.6 cuDNN 8.4.1 Driver 510.73.08 DALI 0.31.0

Table 2. MLPerf scenario used in this test and MIG specs

Scenario	Performance metric		Example use cases
Offline	Measured throughput		Batch processing aka Offline processing. Google photos identifies pictures, tags, and people and generates an album with specific people and locations/events offline.
MIG Specifications		A100
Instance types		7x 10GB 3x 20GB 2x 40GB 1x 80GB
GPU profiling and monitoring		Only one instance at a time
Secure Tenants		1x
Media decoders		Limited options

Table 3. High accuracy benchmarks and their degree of precision

	BERT	BERT H_A	DLRM	DLRM H_A	3D-Unet	3D-Unet H_A
Precision	int8	fp16	int8	int8	int8	int8

DLRM H_A and 3D-Unet H_A is the same as DLRM and 3D-unet respectively. They were able to reach the target accuracy with int8 precision.

Performance results

This section provides MIG performance results for various scenarios, showing that when divided into seven instances, each instance can provide equal performance without any loss in throughput.

Conclusion

Dell XR portfolio continues to provide a streamlined approach for various edge and telecom deployment options based on different use cases. It provides a solution to the challenge of small form factors at the edge with industry-standard rugged certifications (NEBS) that provide a compact solution for scalability, and flexibility in a temperature range of -5 to +55°C. The MIG capability for MLPerf workloads provides real-life scenarios for showcasing AI/Ml inferencing on multiple instances for edge use cases. Based on the results in this document, Dell servers continue to provide a complete solution.

References

Notes:

Based on testing conducted in Dell Cloud and Emerging Technology lab in January 2023. Results to be submitted to MLPerf in Q2, FY24.
Unverified MLPerf v2.1 Inference. Results not verified by MLCommons Association. MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

Read Full Blog

AI
Intel
PowerEdge
machine learning
ML
Artificial Intelligence
Intel 4th Gen Xeon

Next Generation Dell PowerEdge XR7620 Server Machine Learning (ML) Performance

Frank Han Rakshith Vasudev Manya Rastogi

Fri, 03 Mar 2023 19:57:26 -0000

Read Time: 0 minutes

Summary

Dell Technologies has recently announced the launch of next-generation Dell PowerEdge servers that deliver advanced performance and energy efficient design.

With up to 2x300W accelerator cards for GPUs to handle your most demanding edge workloads, XR7620 provides a 45% faster image classification workload as compared to the previous generation Dell XR 12 server with just one 300W GPU accelerator for the ML/AI scenarios at the enterprise edge. The combination of low latency and high processing power allows for faster and more efficient analysis of data, enabling organizations to make real-time decisions for more opportunities.

Edge computing

Edge computing, in a nutshell, brings computing power close to the source of the data. As the Internet of Things (IoT) endpoints and other devices generate more and more time-sensitive data, edge computing becomes increasingly important. Machine Learning (ML) and Artificial Intelligence (AI) applications are particularly suitable for edge computing deployments. The environmental conditions for edge computing are typically vastly different than those at centralized data centers. Edge computing sites might, at best, consist of little more than a telecommunications closet with minimal or no HVAC. Rugged, purpose-built, compact, and accelerated edge servers are therefore ideal for such deployments. The Dell PowerEdge XR7620 server checks all of those boxes. It is a high-performance, high-capacity server for the most demanding workloads, certified to operate in rugged, dusty environments ranging from -5C to 55C (23F to 131F), all within a short-depth 450mm (from ear-to-rack) form factor.

MLPerf Inference workload summary

MLPerf is a multi-faceted benchmark suite that benchmarks different workload types and different processing scenarios. There are five workloads and three processing scenarios. The workloads are:

Image classification
Object detection
Medical image segmentation
Speech-to-text
Language processing

The scenarios are single-stream (SS), multi-stream (MS), and Offline.

The tasks are self-explanatory and are listed in the following table below, along with the dataset used, the ML model used, and descriptions. The single-stream tests reported results at the 90th percentile; multi-stream tests reported results at the 99th percentile.

Table 1. MLPerf Inference benchmark scenarios

Scenario	Performance metric	Example use cases
Single-stream	90% percentile latency	Google voice search: Waits until the query is asked and returns the search results.
Offline	Measured throughput	Batch processing aka Offline processing. Google photos identifies pictures, tags people, and generates an album with specific people and locations/events Offline.
Multi-stream	99% percentile latency	Example 1: Multicamera monitoring and quick decisions. MultiStream is more like a CCTV backend system that processes multiple real-time streams on identifying suspicious behaviors. Example 2: Self driving cameras merge all multiple camera inputs and make drive decisions in real time.

Table 2. MLPerf EdgeSuite for inferencing benchmarks

Industry reports about the future of edge computing

According to Forrester’s report (“Five technology elements make workload affinity possible across the four business scenarios”), most systems today are designed to run software in a single place. This creates performance limitations as conditions change, such as when more sensors are installed in a factory, as more people gather for an event, or as cameras receive more video feed. Workload affinity is the concept of using distributed applications to deploy software automatically where it runs best: in a data center, in the cloud, or across a growing set of connected assets. Innovative AI/ML, analytics, IoT, and container solutions enable new applications, deployment options, and software design strategies. In the future, systems will choose where to run software across a spectrum of possible locations, depending on the needs of the moment.

ML/AI inference performance

Table 3. Dell PowerEdge XR7620 key specifications

MLPerf system suite type	Edge
Operating System	CentOS 8.2.2004
CPU	4th Gen Intel® Xeon® Scalable processors MCC SKU
Memory	512GB
GPU	NVIDIA A2
GPU Count	1
Networking	1x ConnectX-5 IB EDR 100Gb/Sec
Software Stack	TensorRT 8.4.2 CUDA 11.6 cuDNN 8.4.1 Driver 510.73.08 DALI 0.31.0

Figure 1. Dell PowerEdge XR7620: 2U 2S

Table 4. NVIDIA GPUs Tested:

Brand	GPU model	GPU memory	Max power consumption	Form factor	2-way bridge	Recommended workloads
PCIe Adapter Form Factor
NVIDIA	A2	16 GB GDDR6	60W	SW, HHHL or FHHL	n/a	AI Inferencing, Edge, VDI
NVIDIA	A30	24 GB HBM2	165W	DW, FHFL	Y	AI Inferencing, AI Training
NVIDIA	A100	80 GB HBM2e	300W	DW, FHFL	Y, Y	AI Training, HPC, AI Inferencing

The edge server offloads the image processing to the GPU. And just as servers have different price/performance levels to suit different requirements, so do GPUs. XR7620 supports up to 2xDW 300W GPUs or 4xSW 150W GPUs, part of the constantly evolving scalability and flexibility offered by the Dell PowerEdge server portfolio. In comparison, the previous gen XR11 could support up to 2xSW GPUs.

Edge server vs data center server comparison[1]

When testing with NVIDIA A100 GPU for the Offline scenario, the Dell XR7620 delivered a performance with less than 1% difference, as compared to the prior generation Dell PowerEdge rack server. The XR7620 edge server with a depth of 430mm is capable of providing similar performance for an AI inferencing scenario as a rack server. See Figure 2.

Figure 2. Rack vs edge server MLPerf Offline performance

XR7620 performance with NVIDIA A2 GPU

XR7620 was also tested with NVIDIA A2 GPU for the entire range of MLPerf workloads in the Offline scenario. For the results, see Figure 3.

Figure 3. XR7620 Offline performance results

XR7620 was also tested with NVIDIA A2 GPU for the entire range of MLPerf workloads in the Single Stream scenario. See Figure 4.

Figure 4. XR7620 Single Stream Performance results

XR7620 was also tested with NVIDIA A30 GPU for the entire range of MLPerf workloads in the Offline Scenario. See Figure 5.

Figure 5. XR7620 Offline Performance results on A30 GPU

XR7620 was also tested with NVIDIA A30 GPU for the entire range of MLPerf workloads in the Single Scenario. See Figure 6.

Figure 6. XR7620 SS Performance results on A30 GPU

In some scenarios, next generation Dell PowerEdge servers showed improvement over previous generations, due to the integration of the latest technologies such as PCIe Gen 5.

Speech to text

The Dell XR7620 delivered better throughput by 16%, as compared to the prior generation Dell server. See Figure 7

Figure 7. Offline Speech to Text performance improvement on XR7620

Image Classification

The Dell XR7620 delivered better latency by 45%, as compared to the prior generation Dell server. See Figure 8.

Figure 8. SS Image Classification performance improvement on XR7620

Conclusion

The Dell XR portfolio continues to provide a streamlined approach for various edge and telecom deployment options based on various use cases. It provides a solution to the challenge of small form factors at the edge with industry-standard rugged certifications (NEBS), with a compact solution for scalability and flexibility in a temperature range of -5 to +55°C. The MLPerf results provide a real-life scenario on edge inferencing for servers on AI inferencing. Based on the results in this document, Dell servers continue to provide a complete solution.

References

MLPerf Inference Benchmark

Notes:

Based on testing conducted in Dell Cloud and Emerging Technology lab, January 2023. Results to be submitted to MLPerf in Q2,FY24.
Unverified MLPerf v2.1 Inference. Results not verified by MLCommons Association. MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

[1] Based on testing conducted in Dell Cloud and Emerging Technology lab, January 2023.

Read Full Blog

Intel
PowerEdge
XR8000
Intel 4th Gen Xeon

XR8000 – Unique Sled Design

Manya Rastogi

Fri, 03 Mar 2023 19:57:26 -0000

Read Time: 0 minutes

Summary

The Dell PowerEdge XR8000 is a compact multi-node server designed for the edge and telecom. This DfD describes the unique form factor with chassis and sleds for the deployment of the XR8000.

Overview

The Dell PowerEdge XR8000 is a rugged multi-node edge server with Intel 4th Gen Intel® Xeon® Scalable processors MCC SKU stack. This short-depth sled-based server, purpose-built for telco at the edge, is configurable, environmentally agile, and RAN optimized. It is optimized to operate in Class 1 (-5C to 55C) environments and –20C to 65C for select configurations, provides a short depth of 430 mm from the front I/O wall to the rear wall, and is front-accessible.

Available in a unique sled-based chassis form factor, the 2U chassis supports 1U and 2U half-width sleds. It is an open and reusable chassis as opposed to fixed monolithic chassis. The entire sled can be replaced without removing the chassis and power, which simplifies serviceability and maintenance. Customers who need additional storage or PCIe expansion can choose the 2U sled, with options for compute, accelerators, or GPUs.

Each sled includes iDRAC for management, a CPU, memory, storage, networking, PCIe expansion (2U sled), and cooling.

The reversible airflow chassis

XR8000 offers a reverse airflow design for use in front accessed chassis configurations. It provides a front-accessible, multi-node sled-based rackable chassis (430mm depth). It offers dual 60 mm PSUs for reverse airflow, with the following options:

-48VDC options: 800W, 1100W, 1400W
100 to 240VAC options: 1400W, 1800W

Assuming redundant PSUs for each server, there would be between four and eight PSUs for equivalent compute capacity, and between four and eight additional power cables. This consolidation of PSUs and cables not only reduces the cost of the installation (due to fewer PSUs), it also reduces the cabling, clutter, and Power Distribution Unit (PDU) ports used in the installation.

The sleds

The compute sleds offer common features such as:

Power and management connector(s) to the chassis manager
Pull handle(s) and a mechanical lock for attachment to the chassis (for example, spring clips)
Side rails to aid insertion and stability in the chassis
Ventilation holes and baffles as appropriate for cooling
Intrusion detection for sled insertion/removal linked to iDRAC

The XR8000 provides two sled options:

1U sled: The 1U Compute Sled adds support for one x16 FHHL (Full Height Half Length) Slot (PCIe Gen5).

Figure 1. XR8610t

2U Sled: The 2U Compute Sled builds upon the foundation of the 1U Sled and adds support for an additional two x16 FHHL slots (Gen 5)

Figure 2. XR8620t

These slots can support GPUs*, SFP, DPUs, SoC Accelerators, and other NIC Options.

*More details will be available at RTS, planned for May 2023.

The chassis with sleds

Various configurations are available:

1. 1X4U - This option includes 4x1U compute sleds and PSUs:

2. 2x1U + 1x2U - This option includes 2x1U and 1x2U compute sleds and PSUs:

3. 2x2U - This option includes 2x2U compute sleds and PSUs:

The PowerEdge XR8000 offers various form factor options based on different workloads:

General purpose far edge 2U half-width: 3 x16 FHHL (Gen 5)
Dense Compute Optimized 1U half-width: 1 x16 FHHL (Gen 5)

You can create any of these compute node configurations to support a broad range of workloads in one chassis.

Ease of deployment

The XR8000 offers a front servicing (cold aisle) chassis, which allows it to be deployed with all cables connected to the front. This simplifies cable management and allows the server to be installed in areas where space is limited and access to the front and back of the chassis is not possible. Also, the sleds are designed to be easily field replaceable by non-IT personnel. Whether it is located at the top of the roof or in any other difficult environment, XR8000 has a dense form factor with a Class 1 temperature range (-5 to +55°C) with some configurations reaching –20C to 65C and is tested for NEBS Level 3 compliance.

Improved IT maintainance

The XR8000 multi-node server enables IT administrators to deploy compact, redundant server solutions. For example, two sleds can be configured identically and installed in the same chassis. One acts as the primary, and the other is the secondary, or backup. If the primary server goes down, the secondary server steps in to minimize or eliminate downtime. This redundant server configuration is also a great way for administrators to manage software updates seamlessly. For example, administrators can deploy the secondary server while performing maintenance, updates, or development work on the primary server.

Scalability and flexibility

XR8000 with its unique form factor and multiple deployment options provides flexibility to start with a single node and scales up to four independent nodes as needed. Depending on the needs of various workloads, deployment options can change.

The same sleds can work in either the flexible or rack mount chassis based on space constraints or user requirements.

Conclusion

XR8000 provides a streamlined approach for various edge and telecom deployment options based on different use cases. It provides a solution to the challenge of deploying small form factors at the edge, with industry-standard rugged tests (NEBS), providing a compact solution for scalability and flexibility in a temperature range of -20 to +65°C for select configurations.

Read Full Blog

Intel
PowerEdge
GPU
AMD
Intel 4th Gen Xeon

Dell PowerEdge Servers Offer Comprehensive GPU Acceleration Options

Delmar Hernandez

Fri, 03 Mar 2023 19:57:27 -0000

Read Time: 0 minutes

Summary

The next generation of PowerEdge servers is engineered to accelerate insights by enabling the latest technologies. These technologies include next-gen CPUs bringing support for DDR5 and PCIe Gen 5 and PowerEdge servers that support a wide range of enterprise-class GPUs. Over 75% of next generation Dell PowerEdge servers offer support for GPU acceleration.

Accelerate insights

For the digital enterprise, success hinges on leveraging big, fast data. But as data sets grow, traditional data centers are starting to hit performance and scale limitations — especially when ingesting and querying real-time data sources. While some have long taken advantage of accelerators for speeding visualization, modeling, and simulation, today, more mainstream applications than ever before can leverage accelerators to boost insight and innovation. Accelerators such as graphics processing units (GPUs) complement and accelerate CPUs, using parallel processing to crunch large volumes of data faster. Accelerated data centers can also deliver better economics, providing breakthrough performance with fewer servers, resulting in faster insights and lower costs. Organizations in multiple industries are adopting server accelerators to outpace the competition — honing product and service offerings with data-gleaned insights, enhancing productivity with better application performance, optimizing operations with fast and powerful analytics, and shortening time to market by doing it all faster than ever before. Dell Technologies offers a choice of server accelerators in Dell PowerEdge servers so you can turbo-charge your applications.

Accelerated server architecture

Our world-class engineering team designs PowerEdge servers with the latest technologies for ultimate performance. Here’s how.

Industry enabled technologies

Next Generation Intel and AMD Processors
DDR5 Memory
PCIe Gen5
GPU Form Factor Options

Next generation air and Direct Liquid Cooling (DLC) technology

PowerEdge ensures no-compromise system performance through innovative cooling solutions while offering customers options that fit their facility or usage model.

Innovations that extend the range of air-cooled configurations
Advanced designs - airflow pathways are streamlined within the server, directing the right amount of air to where it is needed
Latest generation fan and heat sinks – to manage the latest high-TDP CPUs and other key components
Intelligent thermal controls – automatically adjust airflow during workload or environmental changes, seamless support for channel add-in cards, plus enhanced customer control options for temp/power/acoustics
For high-performance CPU and GPU options in dense configurations, Dell DLC effectively manages heat while improving overall system efficiency

Our GPU partners

AMD

Dell Technologies and AMD have established a solid partnership to help organizations accelerate their AI initiatives. Together our technologies provide the foundation for successful AI solutions that drive the development of advanced DL software frameworks. These technologies also deliver massively parallel computing in the form of AMD Graphic Processing Units (GPUs) for parallel model training and scale-out file systems to support the concurrency, performance and capacity requirements of unstructured image and video data sets. With AMD ROCm open software platform built for flexibility and performance, the HPC and AI communities can gain access to open compute languages, compilers, libraries, and tools designed to accelerate code development and solve the toughest challenges in the world today.

Intel

Dell Technologies and Intel are giving customers new choices in enterprise-class GPUs. The Intel Data Center GPUs are available with our next generation of PowerEdge servers. These GPUs are designed to accelerate AI inferencing, VDI, and model training workloads. And with toolsets like Intel^® oneAPI and OpenVINO^TM, developers have the tools they need to develop new AI applications and migrate existing applications to run optimally on Intel GPUs.

NVIDIA

Dell Technologies solutions designed with NVIDIA hardware and software enable customers to deploy high-performance deep learning and AI-capable enterprise-class servers from the edge to the data center. This relationship allows Dell to offer Ready Solutions for AI and built-to-order PowerEdge servers with your choice of NVIDIA GPUs. With Dell Ready Solutions for AI, organizations can rely on a Dell-designed and validated set of best-of-breed technologies for software – including AI frameworks and libraries – with compute, networking, and storage. With NVIDIA CUDA, developers can accelerate computing applications by harnessing the power of the GPUs. Applications and operations (such as matrix multiplication) that are typically run serially in CPUs can run on thousands of GPU cores in parallel.

GPU options for next-generation PowerEdge servers

Turbo-charge your applications with performance accelerators available in select Dell PowerEdge tower and rack servers. The number and type of accelerators that fit in PowerEdge servers are based on the physical dimensions of the PCIe adapter cards and the GPU form factor.

Brand	GPU Model	GPU Memory	Max Power Consumption	Form Factor	2-way Bridge	Recommended Workloads
Brand	GPU Model	GPU Memory	Max Power Consumption	Form Factor	2-way Bridge	Recommended Workloads
PCIe Adapter Form Factor
NVIDIA	A2	16 GB GDDR6	60W	SW, HHHL or FHHL	n/a	AI Inferencing, Edge, VDI
NVIDIA	A16	64 GB GDDR6	250W	DW, FHFL	n/a	VDI
NVIDIA	A40, L40	48 GB GDDR6	300W	DW, FHFL	Y, N	Performance graphics, Multi-workload
NVIDIA	A30	24 GB HBM2	165W	DW, FHFL	Y	AI Inferencing, AI Training
NVIDIA	A100	80 GB HBM2e	300W	DW, FHFL	Y, Y	AI Training, HPC, AI Inferencing
NVIDIA	H100	80GB HBM2e	300 - 350W	DW, FHFL	Y	AI Training, HPC, AI Inferencing
AMD	MI210	64 GB HBM2e	300W	DW, FHFL	Y	HPC, AI Training
Intel	Max 1100*	48GB HBM2e	300W	DW, FHFL	Y	HPC, AI Training
Intel	Flex 140*	12GB GDDR6	75W	SW, HHHL or FHHL	n/a	AI Inferencing
SXM / OAM Form Factor
NVIDIA	HGX A100*	80GB HBM2	500W	SXM w/ NVLink	n/a	AI Training, HPC
NVIDIA	HGX H100*	80GB HBM3	700W	SXM w/ NVLink	n/a	AI Training, HPC
Intel	Max 1550 *	128GB HBM2e	600W	OAM w/ XeLink	n/a	AI Training, HPC
* Development or under evaluation

References

Read Full Blog

NVIDIA
PowerEdge
R750xa

PowerEdge R750xa and NVIDIA H100 PCIe GPU: 66% Increase in HPC Performance per Watt

Dell Performance Labs

Mon, 16 Jan 2023 19:49:21 -0000

Read Time: 0 minutes

Summary

Using the PowerEdge R750xa, the Dell HPC & AI Innovation Lab compared performance of the new NVIDIA H100 PCIe 310 W GPU to the previous- generation NVIDIA A100 PCIe GPU, using the supercomputer benchmark HPL. Results showed:

66% increase in performance per watt
67% increase in raw performance (TFLOPS), using four GPUs

The Dell PowerEdge R750xa, powered by 3rd Gen Intel Xeon Scalable processors, is a dual-socket/2U rack server that delivers outstanding performance for the most demanding emerging and intensive GPU workloads. It supports 8 channels per CPU and up to 32 DDR4 DIMMs with speeds up to 3200 MT/s. In addition, the PowerEdge R750xa supports PCIe Gen 4 and up to 8 SAS/SATA SSDs or NVMe drives. The PowerEdge R750xa, the one PowerEdge portfolio platform that supports all the PCIe GPUs, is the ideal server for virtualization environments and workloads such as high performance computing and AI-ML/DL training and inferencing. The PowerEdge R750xa includes all the core benefits of PowerEdge: serviceability, consistent systems management with iDRAC, and the latest in extreme acceleration.

The new NVIDIA H100 PCIe GPU is optimal for delivering the fastest business outcomes with the latest accelerated servers in the Dell PowerEdge portfolio, starting with the R750xa. The PowerEdge R750xa boosts workloads to new performance heights with GPU and accelerator support for demanding workloads, including enterprise AI. With its enhanced, air-cooled design and support for up to four NVIDIA double-width GPUs, the PowerEdge R750xa server is purpose-built for optimal performance for the entire spectrum of HPC, AI-ML/DL training, and inferencing workloads.

Next-generation GPU performance analysis

The Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA H100 PCIe 310 W GPU to the last Gen A100 PCIe GPU in the Dell PowerEdge R750xa. The team used HPL, a popular computing benchmark often used to evaluate the performance of supercomputers on the TOP500 list. This comparison included HPL performance and server power consumption throughout the benchmark. Here are the results:

Performance/watt

The performance per watt calculation is the HPL benchmark score divided by the average server power over the duration of the HPL benchmark. The PowerEdge R750xa with the NVIDIA H100 PCIe GPUs delivered a 66% increase in performance/watt compared to the PowerEdge R750xa with the NVIDIA A100 PCIe GPUs, as shown in the following figure.

PowerEdge R750xa - HPL Benchmark and Server Power

Figure 1. Performance/watt comparison

HPL benchmark performance

Figure 2 shows the raw HPL performance of each configuration. The PowerEdge R750xa with four NVIDIA H100 PCIe GPUs achieved a 67% increase in TFLOPS compared to the configuration with four NVIDIA A100 PCIe GPUs.

Figure 2. Raw performance comparison

Server power

Figure 3 shows the server power over the duration of the HPL benchmark. The NVIDIA H100 PCIe GPU configuration delivered better performance with slightly lower server power and finished the workload faster.

Figure 3. HPL server power

Configuration information

The following table shows the two test configurations.

Table 1. R750xa test configurations

	R750xa with four NVIDIA H100	R750xa with four NVIDIA A100
Server	PowerEdge R750xa
CPU	2 x Intel Xeon Gold 6338 CPU
Memory	512 GB system memory
Storage	1 x 3.5T SSD
BIOS/iDRAC	1.9.0/6.0.0.0
HPL version	HPL for H100 (Alpha version, results subject to change)
Operating system	Ubuntu 20.04 LTS
GPU	NVIDIA H100-PCIe-80GB (310 W)	NVIDIA A100-PCIe-80GB (300 W)
Driver	CUDA 11.8	CUDA 11.8

Conclusion

Using the PowerEdge R750xa, the Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA H100 PCIe 310 W GPU to the previous-generation NVIDIA A100 PCIe GPU. HPL benchmark results showed a 66 percent increase in performance/watt and a 67 percent increase in TFLOPS.

The PowerEdge R750xa supports up to four NVIDIA H100 PCIe GPUs and is available with new orders or as a customer upgrade kit for existing deployments. To learn more, reach out to your account executive or visit www.dell.com.

References

Read Full Blog

PowerEdge
XR
XR4000

Dell PowerEdge XR4000: Nano Processing Unit

Delmar Hernandez

Mon, 16 Jan 2023 19:39:35 -0000

Read Time: 0 minutes

Summary

The Nano Processing Unit is an optional sled supported by the Dell XR4000 multi-node server. While its design aligns perfectly with the technical requirements of a VMware vSAN witness host, it can be used for many interesting edge use cases.

What is a Nano Processing Unit?

Dell Technologies is committed to delivering best-in-class edge servers. The latest member of the Dell PowerEdge XR edge server series is the PowerEdge XR4000 featuring the next-generation Intel Xeon D processor. This short-depth multi-node server is available in two different chassis form factors: rackable and flexible. The rackable chassis supports up to four Xeon D sleds, and the flexible supports up to two. Additionally, each chassis supports an optional low-power server called the Dell Nano Processing Unit, or NPU, discussed here.

The NPU is an x86 sled built with the Intel Atom Processor C Series. Designed for the edge, the NPU includes industrial-grade components capable of reliable operation in an extended temperature range. It is installed adjacent to the Intel Xeon D sleds in the PowerEdge XR4000 chassis and includes independent memory, networking, and storage. Besides providing power, the NPU is a self-contained server delivering up to a total of five independent sleds in the rackable chassis or three in the flexible chassis.

Figure 1. The XR4000 is multi-node. Each sled includes CPU, memory, storage, networking, and fans.

From the factory, Dell offers SUSE Linux Enterprise Server (SLES) or VMware ESXi on the NPU. We validated both operating systems, giving our customers the flexibility to use this unique server as a VMware vSAN witness host or put it to work in other exciting edge workloads, which we will discuss later. In addition, each NPU includes a unique Dell Service Tag and a customer-programmable field asset tag for customized asset tracking. The NPU does not feature Dell iDRAC.

The following table provides technical specifications of the NPU:

Feature	Technical specifications
Processor	Intel Atom C3508
Memory	16 GB ECC DDR4 1866
Storage	1 x 960 GB M.2 NVMe
Embedded management	N/A
Embedded NIC	Intel i210 (2 ports)
Ports	1 x USB 3.0/2.0, 1 x Serial (micro-USB), 2 x 1 GbE RJ45, headless
Operating systems	ESXi, Linux
Operating temperature	0–55°C

NPU use case examples

The XR4000's unique NPU can serve a wide range of edge-computing use cases. Here are a few examples.

vSAN witness host

A two-node vSAN or vSAN stretched cluster configuration requires a witness host to act as a tie-breaker when a fault occurs. In a two-node vSAN, a fault could be a single node's power loss or hardware failure. In a stretched cluster, it might be the loss of an entire site due to a natural disaster. In either case, the witness host determines which node contains the valid data after the fault is resolved and nodes return to the cluster. The XR4000 NPU meets the requirements of a hardware vSAN witness host. It is installed in the same chassis as the compute nodes, enabling a compact vSAN cluster that can be deployed almost anywhere.

Emergency power server

Equipment deployed in a telephone network's central office, a manufacturing facility, or in a retail backroom might be more exposed to the effects of natural disasters or extreme temperatures. For example, a remote site might experience an extended power outage due to a natural disaster. During this

time, a site battery backup can keep some of the infrastructure running for a short period; however, high-power equipment can quickly consume the battery or fuel capacity. The low-power NPU can help preserve precious battery power until power returns when used as a site manager to monitor environmental sensors and security access, view camera feeds, and gracefully shut down high-power equipment to preserve power. Once site power returns, the NPU can remotely restore the site to full functionality by gracefully managing the power-on of connected site equipment, negating the need to send out a technician.

Private network security

Isolating a private network from the Internet increases security and reduces the number of potential attack vectors. Isolated networks improve network security but present challenges for IT administrators who access and manage them remotely. One solution is to use a "jump box" or "bastion host" that acts as a secure bridge between the Internet and a private network. This single, secure bridge can be hardened, monitored, and regularly audited to ensure only authorized users access the private network. IT administrators can configure the NPU as a secure bridge between the Internet and a private network.

Telemetry management host

Monitoring the health and performance metrics of servers, systems, and services operating at edge locations is critical. IT administrators use monitoring systems such as Prometheus to monitor, detect, and alert when collected metrics indicate potential issues in their fleet. They also use tools such as Grafana to visualize the data in easy-to-consume charts and graphs. The NPU's hardware specifications meet the hardware requirements of monitoring systems such as Grafana and Prometheus, and the NPU can serve as an out-of-band server running these tools.

Out-of-band management

Managing a fleet of servers and IT equipment is challenging. Administrators must manage the health and performance of equipment deployed across multiple sites or at remote locations. So, it might not always be cost-effective or feasible to send out a technician to resolve an issue, update or provision equipment, or check the status of a site. In these cases, having an out-of-band server like the NPU gives administrators the ability to remotely troubleshoot, deploy firmware updates, and manage devices such as intelligent PDUs and USB devices. When troubleshooting, administrators can use the out-of-band NPU server to power-cycle faulty devices connected to intelligent PDUs and collect debug logs from other devices; when provisioning, they can use it as a PXE server. Additionally, administrators can automate troubleshooting and provisioning functions, and the NPU can run those scripts.

Conclusion

The Nano Processing Unit is a unique and versatile computing server. Its edge-optimized design has industrial-grade components, a low-power processor, and more-than-capable memory, networking, and storage capacity. These features make it an excellent addition for customers looking to get the most out of their XR4000 server.

References

VMware Virtual Blocks Blog: Shared Witness for 2-Node vSAN Deployments

Read Full Blog

PowerEdge
XR
XR4000

Dell PowerEdge XR4000: Multi-Node Design

Manya Rastogi

Mon, 16 Jan 2023 19:31:49 -0000

Read Time: 0 minutes

Summary

The Dell XR4000 is a compact multi-node server designed for the edge. This document discusses the XR4000’s unique form factors and sled options.

What is a multi-node server?

The Dell PowerEdge XR4000 is a rugged multi-node edge server with Intel’s next- generation Xeon D processor, making it a perfect fit for edge deployments. Available in two unique and flexible form factors, the “rackable” chassis supports up to four 1U sleds, and the “stackable” chassis supports up to two. Customers who need additional storage or PCIe expansion can choose a 2U sled option.

In addition, the XR4000 supports an optional witness node for single-chassis VMware vSAN cluster deployments. Each sled includes iDRAC for management, a CPU, memory, storage, networking, PCIe expansion (2U sled), and cooling.

Compute sleds

The compute sleds offer common features such as power and management connectors to the chassis backplane, pull handles and mechanical locks (for example, spring clips) for attachment to the chassis, side rails to aid insertion and stability in the chassis, and ventilation holes and baffles as appropriate for cooling.

Figure 1. 1U compute sled interior

The XR4000 offers 1U and 2U sleds. The 1U sled is provided for dense compute requirements. The 2U chassis shares the same “1st U” and common motherboard with the 1U sled but includes an additional riser to provide two more PCIe Gen4 FHFL I/O slots.

Figure 2. 1U compute sled

The 1U sled meets dense compute requirements, with storage up to 4 x M.2 drives (from 480 GB up to 3.84 TB each) and up to 2 x M.2 NVMe BOSS N1 ET. The memory can scale up to 512 GB total with 4 x memory slots. It also includes a LAN on motherboard (LOM) option with 4 x SFP from CPU.

2U compute sled

Figure 3. 2U compute sled interior

The 2U compute sled builds upon the common first 1U of the 1U sled with additional 2 x 16 FHFL PCIe 4.0 lots, with a combined power capacity of 250 W. These slots can support GPUs, such Nvidia A2/A30s, SFP, DPUs, SoC accelerators, and other NIC options. The additional storage option supports optional 8 x M.2 storage drives (4 x per x16 slot) and 12 x

M.2 total (not including BOSS).

Nano Server node

Each chassis also supports an optional low-power server called the Dell Nano Processing Unit or NPU. The NPU is an x86 sled built with Intel's Atom Processor C Series. Designed for the edge, the NPU includes industrial-grade components capable of reliable operation in an extended temperature range. For more information, see Dell PowerEdge XR4000: Nano Processing Unit.

Chassis with sleds

The two chassis types share the components of common 100 to 240 VAC power supplies (PSUs), up to two per chassis, and an optional embedded controller card called the Nano Server.

Both chassis types optionally include a lockable bezel to prevent unwanted access to the Sleds and PSUs, with intelligent filter monitoring that creates a system alert when the filter needs to be changed.

XR4000 is offered in two options for chassis:

2U rackmount chassis

Figure 4. 2U rackmount chassis

The “rackable” chassis is a 2U, 14-inch (355 mm) deep, 19-inch-wide chassis, with mounting ears to support a standard 19-inch-wide rack. The rackable chassis supports both front-to-back and back-to-front airflow and the following combination of 1U and 2U compute sleds:

2U flexible/stackable mount chassis

The “stackable” chassis is also 2U, 14 inches (355 mm) deep, but is only 10.5 inches wide and is typically deployed in desktop, VESA plates, DIN rails, or stacked environments. The stackable chassis also supports both front-to-back and back-to-front airflow.

Figure 5. 2U stackable chassis

Service and manage

The XR4000 offers a front servicing (cold aisle) chassis option, which allows it to be deployed with all cables connected to the front. This option simplifies cable management and allows the server to be installed in areas where space is limited and access to the front and back of the chassis is not possible. Also, the sleds are designed to be easily field replaceable by non-IT personal.

Redundancy

The XR4000 multi-node server gives IT administrators the ability to deploy compact, redundant server solutions. For example, two sleds can be configured identically and installed in the same chassis. One acts as the primary, and the other is the secondary, or backup. If the primary server goes down, the secondary server steps in to minimize or eliminate downtime. This redundant server configuration is also a great way for administrators to seamlessly manage software updates. For example, administrators can deploy the secondary server while performing maintenance, updates, or development work on the primary server.

Scaling

The XR4000 server, with its unique form factor and multiple deployment options, provides flexibility to start with a single node and scale up to four independent nodes as needed. Depending on the requirements of various workloads, deployment options can change; for example, a user can add a 2U GPU-capable sled. The same sleds can work in either the flexible or rackmount chassis based on space constraints or user requirements.

Conclusion

PowerEdge XR4000 offers a streamlined approach for various edge deployment options based on different edge use cases. Addressing the need for a small form factor at the edge with industry-standard rugged certifications (NEBS and MIL-STD), the XR4000 ultimately provides a compact solution for improved edge performance, low power, reduced redundancy, and improved TCO.

References

PowerEdge XR Servers

Read Full Blog

PowerEdge
Intel Xeon
XR4000

Intel Xeon D Processor for the Edge

Manya Rastogi

Mon, 16 Jan 2023 19:17:52 -0000

Read Time: 0 minutes

Summary

Dell PowerEdge XR4000 is a compact multi-node server designed for the edge. This Tech Note discusses the Intel Xeon D processor, which powers the XR4000 server. These CPUs are unique, being primarily designed for edge deployments. New integrated technology that enables faster performance over the previous generation helps create a solution for designs with space and power constraints while also lowering TCO.

Introduction

Dell PowerEdge XR4000 is the latest addition to Dell Technologies’ portfolio of rugged PowerEdge servers. It is Dell’s shortest-depth edge server, with a unique sled and chassis form factor, withstanding an extended temperature range of –5°C to 55°C. The XR4000 provides a sustainable solution for customers to deploy various edge workloads in challenging environments.

The brain of this server is the Intel Xeon D CPU, which features a one-package design with integrated AI, security, advanced I/O, and Ethernet, plus dense compute, to deliver high data throughput and address key edge requirements. To broaden the range of usage models, the Xeon system-on-a-chip (SoC) is available in two distinct packages: the high-core-count Xeon D-2700 processor, optimized for performance, and the Xeon D-1700 processor, which is optimized for cost and power

consumption. With options ranging from 4 to 20 cores, the Xeon D-2700 processor is suited to demanding workloads, such as handling high data-plane throughput, making it more suitable for edge deployments. Extended operating temperature ranges and industrial-class reliability make Xeon D-1700 and D-2700 SoCs ideal for high- performance rugged equipment.

Improved performance for the edge

Dell PowerEdge XR4000 is based on the Xeon D-2700 SoC. This SoC is an HCC 52.5 x 45 mm package, supporting up to 20 cores and using Intel’s Sunny Cove cores to boost performance for edge use cases. The Xeon D-2700 offers a CPU performance gain of up to 2.97 times and improved AI inferencing that is 7.4 times faster than its previous-generation Xeon D-1577 processor.

Memory speeds have increased by 20 percent, jumping from 2,666 MT/s to 3,200 MT/s. Also, the maximum memory capacity for HCC Xeon D-2700 SKUs is now up to 1,024 GB (with LRDIMM)—two times as much as most Xeon D-2100 SKUs (code named Skylake-D). The increased memory speed and capacity significantly reduce data transfer times for memory-intensive workloads for the edge, such as manufacturing and retail applications, and AI/ML-based applications.

PCIe throughput has also improved by a factor of 2, with support for up to 32 lanes of PCIe Gen 4.0. Throughput speed is 16 GT/s for PCIe Gen 4.0 compared with 8 GT/s for PCIe Gen 3.0. The increased bandwidth of PCIe Gen 4.0 improves the efficiency of workloads such as AI/ML and of edge computing by providing high transfer speeds, while also reducing latency.

Ethernet connectivity

The Xeon D-2700 HCC SoC provides great strides for Ethernet when compared to the previous generation. It increases Ethernet connectivity by 400 percent, providing networking up to 100 GbE with a variety of port options with up to eight ports at 25 Gbps, 10 Gbps, or 1 Gbps with RDMA (iWARP and RoCEv2). Ethernet processing throughput is up by 150 percent with 50 Gbps and 100 Gbps throughput options.

Increased acceleration and compute performance

The Xeon D-2700 and D-1700 processors integrate the following hardware technologies to accelerate workloads:

Intel Deep Learning Boost (Intel DL Boost)
Intel AES New Instructions (Intel AES-NI)
Intel Advanced Vector Extensions 512 (Intel AVX-512)
Intel QuickAssist Technology (Intel QAT) with inline IPSec support

Intel QAT v1.8 accelerates crypto SSL up to 100 Gbps and compression up to 70 Gbps, which offers better Integrated cryptographic and AI acceleration compared to the previous generation. It includes new instructions to accelerate AI/deep learning workloads. (See Intel QAT: Performance, Scale, and Efficiency.)

Intel hardware-based security

Intel Xeon D processors offer integrated security features including Intel Total Memory Encryption, which provides full memory encryption with segmentation for up to 64 tenant-provided keys. The processors support Intel Software Guard Extensions (Intel SGX), which provides fine-grain data protection through application isolation in memory. This protection could be crucial for data exchange between the cloud and edge. Xeon D processors also support Intel Secure Hash Algorithm Extensions with integrated accelerators for SHA cryptographic algorithms and Intel Platform Firmware Resilience (Intel PFR), which uses an Intel field programmable gate array (FPGA) to protect, detect, and correct platform firmware.

Conclusion

The Intel Xeon D processor is a cost-effective offering that is built specifically for edge deployments. It allows users to tailor their solutions to the level of compute and performance they need while allowing for edge implementation-specific space and power constraints. The Xeon D processor and the PowerEdge XR4000 server help customers deploy solutions with lower TCO and a low-power budget in a rugged environment. The solutions are well suited for various edge workloads in retail, manufacturing, and defense.

Read Full Blog

PowerEdge
XR4000
iDRAC

iDRAC9 at the Edge with Dell PowerEdge XR4000

Manya Rastogi

Mon, 16 Jan 2023 19:12:54 -0000

Read Time: 0 minutes

Summary

The Dell XR4000 is a compact multi-node server designed for the edge and integrated with remote management system iDRAC9. This DfD discusses enhancements made for XR4000 and how iDRAC9 is essential for the edge.

Dell integrated Dell Remote Access Controller (iDRAC9)

The concept of edge computing has been constantly growing over the last few years. Edge computing is exactly as the name suggests: bringing the processing and computing of a data center to the edge and reducing latency to a minimum. Dell Technologies offers a variety of options from the hardware side for the edge, including the XR portfolio built with a unique sled-chassis small form factor, reliable for the rugged environment, with high performance and low latency requirements. Dell also offers software to help make the process of edge computing the smoothest experience for a user.

With the development of edge use cases, interoperability is one of the main requirements. Different software programs should be able to work with different hardware to optimize performance. Remote management is a must, keeping in mind the challenging environment for Rugged servers so that the administrators can remediate problems without physically visiting the server.

iDRAC9 is designed specifically to enable this portability, allowing server administrators to be more productive while optimizing the performance of Dell PowerEdge servers in the network. iDRAC9 is embedded management, built into Dell PowerEdge servers for monitoring, updating, and troubleshooting. It simplifies and automates the server’s lifecycle. PowerEdge XR4000 edge server is integrated with the latest version iDRAC9.

In the following sections, we describe some special features supported in XR4000, using iDRAC for customer ease at the edge, along with deployment, updates, service, and troubleshooting.

Deployment

According to a study conducted by Principled Technologies in 2020, “iDRAC9 reduce hands-on deployment times to near zero with iDRAC9 automation”, because iDRAC9 is part of every PowerEdge server, there is no additional software to install. In a few simple steps, iDRAC9 can be configured and ready to use. Even before installing an Operating System, IT admins have complete set of server management features like configuration, Firmware updates, OS deployment and more. Operating Systems can be deployed remotely via Remote File Share or Virtual Media console.

Automation

iDRAC9 offers agent-free operation to put IT admins in full control. When a PowerEdge server is connected to power and networking, it can be monitored and fully managed, whether standing in front of the server or remotely over a network. In fact, with no need for software agents, an IT administrator can monitor, manage, update, troubleshoot, and remediate Dell servers.

With features like Zero-Touch deployment and provisioning, Connection View, and System Lockdown, iDRAC9 is purpose-built to make server administration quick and easy by enabling the seamless automation of the entire server management lifecycle.

Powerful APIs

iDRAC9 offers support for DMTF Redfish. Redfish is a next-generation systems management interface standard that enables scalable, secure, and open server management. Redfish is an interface that uses RESTful interface semantics to access data that is defined in model format to perform out-of-band systems management. With iDRAC9, server administrators can easily monitor, customize, and optimize PCIe airflow and temperature, exhaust control, Delta T control, and overall airflow consumption remotely. In addition, iDRAC9 allows server administrators to pre-define power and cooling settings easily, as part of the server configuration profile. For more information, see iDRAC9 Redfish API Guide Firmware.

Security

iDRAC9 offers industry-leading security features that adhere to and are certified against well-known NIST standards, Common Criteria, and FIPS-140-2. For more information about iDRAC's certifications and standards, see the white paper Managing Web Server Certificates on iDRAC.

iDRAC9 uses a modern, secure, HMTL5-based GUI as a virtual console. The iDRAC9 web server uses a TLS/SSL certificate to establish and maintain secure communications with remote clients. Web browsers and command-line utilities, such as RACADM and WS-Man, use this TLS/SSL certificate for server authentication and establishing an encrypted connection. iDRAC9 now supports TLS 1.3.

XR4000 iDRAC9 features:

iDRAC9 on XR4000 enables dust monitoring of the bezel. If the bezel is in a dusty environment, iDRAC9 helps the administrator know when to clear or change the bezel to ensure the smooth functioning of the server. iDRAC9 is also able to gather information from a witness sled and PSUs, and then send it to each existing node in the chassis at deployment time and when any runtime changes occur. Ultimately, iDRAC can gather the overall health status for Chassis Manager, and display it using the bezel LED.

Conclusion

iDRAC9 is crucial to PowerEdge servers and will continue to be an integrated part of the entire XR series. It will also continue to address issues faced by administrators in the edge environment. We offer various licensing methods to provide what is best suited to your requirements.

To learn about iDRAC9, see the article Support for Integrated Dell Remote Access Controller 9 (iDRAC9).

Read Full Blog

PowerEdge
XR4000
VMmark

VMmark on XR4000

Bonisha Soundarraja Jay Engh Manya Rastogi

Mon, 16 Jan 2023 19:04:39 -0000

Read Time: 0 minutes

Summary

Dell Technologies has recently announced PowerEdge XR4000: an industry certified, multi-node, 2U short-depth rugged OEM- ready server with rack or wall mountable options. The XR4000 is optimized for edge use cases, including retail, manufacturing, and telecom. This Direct from Development (DfD) demonstrates VM deployment capability for virtualized environments using VMmark, a benchmark that measures the performance and scalability of virtualization platforms.

Market positioning

The new Dell PowerEdge XR4000 is a 2U server with an innovative sled-based design. Dell Technologies’ shortest depth server to date is purpose-built for the edge, delivering high-performance compute and ultimate deployment flexibility in two new chassis form factors. The chassis consists of two 14”-depth form factors, referred to as “rackable” and “stackable.” XR4000 comes with an optional nano server sled that can provide an in-chassis witness node for the vSAN cluster. Replacing the need for a virtual witness node, the Nano server can function as an in-chassis witness node, allowing for a native, self-contained 2-node vSAN cluster in even the 14” x 12” stackable server chassis. This allows for VM deployments where the option was previously unavailable, due to latency or bandwidth constraints.

This document describes the VMmark 3.1.1 benchmark that was used to test the outstanding performance delivered by Dell PowerEdge servers, powered by Intel® Xeon® D processors.

VMmark benchmark

Overview

The first version of VMmark was launched in 2007 as a single-host benchmark when organizations were in their infancy in terms of their virtualization maturity. VMmark 3.1.1, released in 2020, is the current release of the benchmark.

VMmark uses a unique tile-based implementation in which each “tile” consists of a collection of virtual machines running a set of diverse workloads. This tile-based approach is common across all versions of the VMmark benchmark. Since the initial release of VMmark, virtualization has become the norm for applications, and these applications have evolved. The workloads that are run in the VMmark tiles have also evolved to provide the closest to real-world metrics for users to assess their virtual environments.

Figure 1. A Web-Scale Multi-Server Virtualization Platform Benchmark

Power Measurement

Power and cooling expenses are a substantial — and increasing — part of the cost of running a data center. Environmental considerations are also a growing factor in data center design and selection. To address these issues, VMmark enables optional power measurement in addition to performance measurements. VMmark 3.1.1 benchmark results can be any of three types:

Performance only (no power measurement)
Performance with server power
Performance with server and storage power

VMmark results with power measurement allow hardware purchasers to see not just absolute performance, but also absolute power consumption and performance per kilowatt. This makes it possible to consider both capital expenses and operating expenses when selecting new data center components.

Solution architecture

This solution includes the following components:

Component	Details
SUTs	4 x Dell XR4510c servers
Clients	2 x Dell PowerEdge R740xd
Storage	vSAN used for all Workload VMs iSCSI SAN used for Infrastructure Operations
Network	Dell Z9432F-ON switch Intel® E823-C 25G 4P LOM
OS	Dell Customized Image of VMware ESXi 7.0U3 A08, Build# 20328353

The metrics of the application workloads within each tile are computed and aggregated into a score for that tile. This aggregation is performed by first normalizing the different performance metrics (such as actions/minute and operations/minute) for a reference platform. Then, a geometric mean of the normalized scores is computed as the final score for the tile. The resulting per-tile scores are then summed to create the application workload portion of the final metric. The metrics for the infrastructure workloads are aggregated separately. The final benchmark score is computed as a weighted average: 80 percent to the application workload component and 20 percent to the infrastructure workload component.

When power is to be measured using the PTDaemon, either for server only or for both server and storage, the VMmark harness starts the PTDdaemon, which initiates a connection between the PTD client (or clients) specified in the VMmark

3.1.1 properties file and the power meter (or meters) they are configured to monitor. Once the required connections are established and the benchmark run is underway, the harness captures each of the power meter (or meters) results into a single unified data stream. This data, like that from other VMmark workloads, is broken up into various sections (ramp up, three 40-minute steady-state phases, and ramp down). The reported VMmark .3.1.1 power consumption is the total average watts consumed during the steady-state phase of the benchmark run that resulted in the median score; the total average watts being the sum of the average watts reported by each power meter used in the run. The final VMmark Performance Per Kilowatt (PPKW) score is the VMmark 3.1.1 score divided by the average power consumption in kilowatts. The below results are based on the performance testing conducted in Dell Solution Performance Analytics (SPA) Lab on 9/30/2022.

The published result met all QoS thresholds and is compliant with VMmark 3.1.1 run and reporting rules. The following table shows the scores of the submitted test results.

vMotion (number of operations per hour)	57.00
SVMotion (number of operations per hour)	44.00
XVMotion (number of operations per hour)	34.00
Deploy (number of operations per hour)	17.00
Unreviewed_VMmark3_Applications_Score	4.93
Unreviewed_VMmark3_Infrastructure_Score	2.15
Unreviewed_VMmark3_Avg_Watts	1085.50
Unreviewed_VMmark3_Score	4.37 @ 4 Tiles
Unreviewed_VMmark3_PPKW	4.0285 @ 4 Tiles

Conclusion

Virtualization is imminent for an edge application. Without virtualization, it is very difficult to fully utilize the power of a modern server. In a virtualized environment, a software layer lets users create multiple independent VMs on a single physical server, taking full advantage of the hardware resources. A single-socket Dell PowerEdge XR4000 server equipped with the Intel Xeon D-2776NT has VMmark Power Performance Score of 4.0285 @ 4 Tiles^[1] and a VMmark Score of 4.37 @ 4tiles. This is representative of different virtualization workloads that can run optimally maintaining the constraints of latency important for the edge with a strong level of performance, making it an excellent choice for edge customers who want to take advantage of the benefits that vitalization has to offer.

Based on the performance testing conducted in Dell Solution Performance Analytics (SPA) Lab on 9/30/2022

References

Read Full Blog

PowerEdge
XE

The PowerEdge XE8545: Performance Summary

Liz Raymond

Mon, 16 Jan 2023 19:50:52 -0000

Read Time: 0 minutes

AI Infrastructure for Computing Without Compromise

Summary

This document is a brief summary of the performance advantages that customers can gain when using the PowerEdge XE8545 acceleration server. All performance and characteristics discussed are based on performance characteristics conducted in the Americas Data Center (CET) labs. Results accurate as of 3/15/2021. Ad Ref #G21000042

Dell’s Latest Acceleration Offering

The PowerEdge XE8545 is Dell EMC’s response to the needs of high-performance machine learning customers who immediately want all the innovation and horsepower provided by the latest NVIDIA GPU technology, without the need to make major cooling-

related changes to their data center. Its specifically air-cooled design provides delivery of four A100 40GB/400W GPUs with a low-latency, switchless SMX4 NVLink interconnect, while letting the data center maintain an energy efficient 35°C. It also has an 80GB/500W GPU option that has been shown to deliver 13-15% higher performance than 400W GPUs at only a slightly lower ambient input temperature (28°C).

500W Advantage Delivers More Performance

The XE8545 can unleash 500W power with its 80GB GPU to outperform any 400W-based competition by 13-15%

Unlike competitors, Dell worked with NVIDIA early in the design process to ensure that the XE8545 could run at 500 Watts of power when using the high capacity 80GB A100 GPUs – and still be air-cooled. This 80GB/500W GPU option allows the XE8545 to drive harder and derive more performance from each of the GPUs. Using the common industry benchmark ResNet50 v1.5 model to measure image classification speed with a standard batch size, the 500W GPU took 67.78 minutes to train, compared to 73.32 minutes for the 400W GPU – 7.56% faster. And when batch size is doubled, it results in up to 13- 15% better

performance! When speed of results is a customer’s primary concern, the XE8545 can deliver the power needed to get those results faster.

Training – A Generational Leap in Performance

XE8545 trains ResNet50 image classification to top-quality accuracy in less than HALF (1/2) the time of the previous generation PowerEdge Systems

It is clear from the chart above that an XE845 with 40GB memory is more than twice as fast as the previous generation C4140 when training an image classification model – in fact, faster than two C4140s running in parallel! And the 80GB GPU option is even faster! This is a great illustration of the combined power of the new technologies packed into the XE8545 – the latest NVIDIA GPUS, the latest AMD CPUs and the latest generation of PCIe IO fabric. Further gains in performance can be achieved by workloads that take advantage of the improvements in how the A100 performs the matrix multiplication involved in machine learning– by better accounting for “sparsity”. That is, the occurrence of many zeros in the matrix, that previously resulted in lots of time-consuming “multiplying-by-zero” operations that had no actual effect on the final result.

And as with all operations for the XE8545, it delivers the very top-level performance using only air-cooling. It does not require liquid cooling.

Inference Analysis of Images

XE8545 can analyze over 150k images per second – that’s 1.46x more images per second on each SXM4 A100 GPU compared to previous generation PowerEdge

Inference tends to scale linearly – as there is no peer-to-peer GPU communication involved - and the XE8545 has proven to have exceptional linear scalability. So, it is not surprising that the XE8545 produces excellent high-performance inference results. As with training, the 80GB/500W A100 GPU has a performance edge - 10% faster than the 400W GPU (at a proportional power increase).

MIG - Multi-Instance GPUs - 7X Faster for Inferencing

The innovative Multi-Instance GPU (MIG) technology introduced with the A100 GPU allows the XE8545 to partition each A100 GPU into as many as seven “slices”, each fully isolated with its own high-bandwidth memory, cache, and compute cores. So, if fully utilized, an XE8545 server can be running 28 separate high- performance instances of inferencing. Each of those instances has been determined by NVIDIA to provide performance equivalent to the previous generation V100. So, an A100 GPU can be thought of as 7 times faster than the previous generation – specifically for inferencing, where peer-to-peer communication does not come into play.

NVIDIA Certified - Faster Deployment of Machine Learning Environments

The XE8545 has undergone NVIDIA’s comprehensive certification program for Datacenter AI: NVIDIA GPU Cloud (NGC). It is now certified to run at the latest Gen4 networking speeds and can take advantage of the NGC catalog that hosts frameworks and containers for the top AI, ML and HPC software, already tuned, tested and optimized. With NGC certification data centers can quickly and easily deploy machine learning environments with confidence and get results faster. For more details on NVIDIA certified systems here.

A New Server - New Technologies - New Levels of Accelerated Performance

The PowerEdge XE8545 introduces the latest industry technologies in a combination that delivers the kind of high-performance, accelerated computing that can handle even the most demanding Artificial Intelligence and Machine Learning workloads or scientific high-performance computing analysis. It provides the highest levels of power and performance in an air-cooled environment, simplifying operational continuity in enterprise data centers.

Read Full Blog

PowerEdge
XR

Dell PowerEdge Servers for OpenRAN Edge Deployments

Benjamin Clark Mike Moore

Mon, 16 Jan 2023 19:50:53 -0000

Read Time: 0 minutes

SOLUTION BRIEF1•1Dell PowerEdge Servers forSummary

Dell Technologies is helping to shape the future of Open RAN solutions with our partnerships and our high performance, purpose-built XR11 and XR12 PowerEdge servers designed for Open RAN and edge deployments.

Introduction

The future of telecommunications includes an open, cloud-native architecture within an open ecosystem of vendors working together to build this new architecture. One of the more exciting aspects of this open future is Open Radio Access Networks (Open RAN). Open RAN is an industry-wide movement that promotes the adoption of open and interoperable solutions at the RAN.

Dell Technologies is helping to shape the future of Open RAN solutions with our partnerships and our high-performance, purpose-built XR11 and XR12 PowerEdge Servers designed for Open RAN and edge deployments.

PowerEdge XR11/XR12: Designed for O-RAN

Open RAN provides opportunities to replace the proprietary, purpose-built RAN equipment of the past with standardized, virtualized hardware that can be deployed anywhere—at the far edge, regional edge, or centralized data centers. Also, intelligent controllers can provide optimized performance and enhanced automation capabilities to improve operational efficiency.

In the O-RAN frameworks, you can separate the baseband unit (BBU) of the traditional RAN into virtualized distributed unit (vDU) and virtualized centralized unit (vCU) components. You can also scale these components independently as control- and user-plane traffic requirements dictate. When building an open-hardware platform for a vRAN architecture, you must consider six critical factors:

Form factor	Environment	Components
Security	Automation and management	Supply chain

With the growing number of edge deployments required to support 5G O-RAN services, edge-optimized cloud infrastructure is essential. These six factors ensure that telco providers build their 5G RAN on a scalable, highly available, and long-term sustainable foundation. Dell Technologies considered each of these factors when designing their PowerEdge XR11 and XR12 servers. These servers are built specifically for O-RAN and edge environments, including multi-access edge computing (MEC) and content delivery network (CDN) applications. The following sections examine how the XR11 and XR12 servers meet, and in many cases exceed, the criteria for O-RAN and edge deployments across these six critical factors.

Best-of-breed components built for harsh environments

Unlike data centers, which are carefully controlled environments, RAN components are often subject to extreme temperature changes and less-than-ideal conditions such as humidity, dust, and vibration. For years, the telecommunications industry has used the Network Equipment-Building System (NEBS) as a standard for telco-grade equipment design. The PowerEdge XR11 and XR12 are designed to exceed NEBS Level 3 compliance (meets or exceeds the GR-63-CORE and GR-1089-CORE standards). They also meet military and marine standards for shock, vibration, sand, dust, and other environmental challenges.

Fully operational within extreme temperature ranges from -5° C (23°F) to 55° C (131° F), you can deploy XR11/12 servers in almost any environment, even where exposure to heat, dust, and humidity are factors. The XR11/12 series is designed to withstand earthquakes and is fully tested to the NEBS Seismic Zone 4 levels. As a result, you can trust Dell PowerEdge servers to keep working no matter where they are deployed.

The PowerEdge XR11 and XR12 provide significant flexibility over purpose-built, all-in-one appliances by using the industry’s most-advanced, best-of-breed components. Also, by providing multiple CPU, storage, peripheral, and acceleration options, PowerEdge XR11/12 servers enable telecommunications providers to deploy their vRAN systems in many different environments.

Both models feature the following components:

3rd Generation Intel® Xeon® Scalable processors
Up to 8 DIMMs
PCI Express 4.0 enabled expansion slots
Choice of network interface technologies
Up to 90 TB storage

One example test shows the performance possibilities that the PowerEdge XR12 enabled by 3rd Gen Intel® Xeon® Scalable processors offers: The solution delivered 2x the massive MIMO throughput for a 5G vRAN deployment compared to the previous generation.¹

Dell security, management systems, and supply-chain advantage

PowerEdge XR11/12 servers are designed with a security-first approach to deliver proactive safeguards through integrated hardware and software protection. This security extends from a hardware-based silicon root of trust to asset retirement across the entire supply chain. From the moment a PowerEdge server leaves our factory, we can detect and verify whether a server has been tampered with, providing a foundation of trust that continues for the life of the server. The Dell Integrated Dell Remote Access Controller (iDRAC) is the source of this day-zero trust. iDRAC checks the firmware against the factory configuration down to the smallest detail after the XR11/12 server is plugged in. If you change the memory, iDRAC detects it. If you change the firmware, iDRAC detects it. Also, we build every PowerEdge server a cyber-resilient architecture² that includes firmware signatures, drift detection, and BIOS recovery.

Besides providing proactive and comprehensive security, PowerEdge XR11/12 servers combine ease-of-management with automation to reduce operational complexity and cost while accelerating time-to-market for new services. Dell OpenManage provides a single systems-management platform across all Dell components. This platform makes it easier for telecommunications providers to manage their hardware components remotely, from configuration to security patches. Also, Dell delivers powerful analytics capabilities to help manage server data and cloud storage. The iDRAC agent-less server monitoring also allows telecommunications providers to proactively detect and mitigate potential server issues before they impact production traffic. By analyzing telemetry data, iDRAC can detect the root cause for poor server performance and identify cluster events that can predict hardware failure in the future.

In the last year, the importance of a secure and stable supply chain has become apparent while many manufacturers struggle to adapt to widespread supply-chain disruption. As telecommunications providers look to ramp up 5G services, they require partners they can depend on to deliver, innovate, scale, and support their plans for the future. Because we are the world’s largest supplier of data-center servers, telecommunications providers can depend on Dell Technologies. We operate in 180 countries worldwide, including 25 unique manufacturing locations, 50 distribution and configuration centers, and over 900 parts-distribution centers. Our global, secure supply chain means that telecommunications providers can grow their business with confidence.

Dell Open RAN reference architecture

Dell Technologies does not stop at the server. We work closely with our open partner ecosystem to integrate and validate our technology in multivendor solutions that provide a best-of-breed, end-to-end vRAN system. You will find this partnership at work in our latest technology preview of the Dell Open RAN reference architecture featuring VMware Telco Cloud Platform (TCP) 1.0, Intel FlexRAN technology, and vRAN software from Mavenir. Our O-RAN solution architecture delivers the disaggregated components that compose the RAN network—vRU, vCU and vDU. Also you can deploy it in hybrid (private and public) clouds plus as bare-metal server environments. Having a pre-built, integrated solution allows telecommunications providers to deploy O-RAN solutions quickly and confidently, knowing that they have the power of our global supply chain and expert services behind them.

Conclusion

With many initial 5G core network transformations complete, telecommunications providers are now turning their attention to the RAN. For them, there are several paths to choose. They can continue to work with legacy vendors by growing out their proprietary RAN systems, missing out on the opportunity to build a best-of-breed RAN solution from multiple partners. Or, they can follow the path of Open RAN with Dell Technologies as a trusted partner to assemble and manage the right pieces from the industry’s O-RAN leaders.

Dell PowerEdge XR11/12 servers are the latest examples of our commitment to open 5G solutions. These servers are built by telco experts specifically for telco edge applications, using a security-first approach and featuring high- performance compute, storage, and analytics components. Also, they have been bundled with our broader Open RAN reference architecture to form the foundation of a seamless, complete vRAN solution that includes hardware, software, and services.

O-RAN is more than the edge of the future. It is a competitive edge for telecommunications providers that must quickly deliver and monetize 5G services, from private mobile networks to high-performance computing applications. Make Dell Technologies your competitive edge, and ask your Dell representative about our portfolio of telco-grade edge solutions.

¹ PowerEdge Cyber Resilient Architecture Infographic

² Bringing high performance and reliability to the edge with rugged Dell EMC PowerEdge XR servers

OLUTION BRIEF1•1Dell PowerEdge Servers for OpenRANEdge Deployments

Automation

Read Full Blog

Your Browser is Out of Date

Direct from Development: Tech Notes

Documents (25)