Tue, 12 Sep 2023 13:22:37 -0000
|Read Time: 0 minutes
Proper server rack integration is crucial for a data center's efficient and reliable operation. Optimizing space, power, and cooling can reduce downtime, simplify fleet management, improve serviceability, and lower overall costs. However, successful server rack integration requires careful planning, attention to detail, and expertise in server hardware, networking, and system administration.
This paper focuses on the critical aspects of deploying the PowerEdge XE9680 server in your data center. It describes key factors such as selecting the appropriate rack type, sizing the rack to meet current and future needs, installing and configuring the server hardware and related components, and ensuring proper power and cooling.
At Dell Technologies, we understand the importance of meeting our customers where they are. Whether you require full-service rack integration and deployment services or expert advice, we are committed to providing the support you need to achieve your goals. By leveraging our expertise and resources, you can be confident in your ability to implement the server rack integration that meets your unique needs and requirements.
The Dell PowerEdge XE9680 is a high-performance server designed to deliver exceptional performance for machine learning workloads, AI inferencing, and high-performance computing. Table 1 lists key specifications to consider when installing it in a rack.
Table 1. Server specifications
Feature | Technical Specifications |
Form Factor | 6U Rack Server |
Dimensions and Weight | Height — 263.2 mm / 10.36 inches Width — 482.0 mm / 18.98 inches Depth — 1008.77 mm / 39.72 inches with bezel — 995 mm / 39.17 inches without bezel —1075 mm /42.32 inches with Cable Management Arm (CMA) Weight —107 kg / 236 lbs. |
Cooling Options | Air Cooling |
The American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) data center specifications focus on temperature and humidity control, optimized air distribution, airflow management, air quality, and energy efficiency. Key recommendations include maintaining appropriate temperature and humidity ranges, implementing hot aisle/cold aisle configurations and containment systems, managing airflow effectively, ensuring high indoor air quality, and adopting energy-efficient technologies.
The Dell PowerEdge XE9680 complies with the A2 Class ASHRAE specifications in Table 2.
Table 2. Operating environment specifications
Product Operation | Product Power Off | |||||
Dry-Bulb Temp, °C | Humidity Range, Noncondensing | Max Dew Point, °C | Max Elevation, meters | Max. Rate of Change, °C/hour | Dry-Bulb Temp, °C | Relative Humidity, % |
10-35
| –12°C DP and 8% rh to 21°C DP and 80% rh | 21 | 3050 | 20 | 5 to 45 | 8 to 80 |
Note: The maximum operating temperature is derated by 1°C per 300m above 900m in altitude.
For optimal performance and reliability, it is recommended to operate within the defined specification ranges. While it is possible to operate at the edge of these ranges, Dell does not recommend continuous operation under such conditions due to potential impacts on performance and reliability.
When choosing a cabinet, it is important to consider factors such as size, ventilation, cable management, and security. The right cabinet should provide ample space for equipment, efficient airflow to prevent overheating, organized cable routing, and robust physical protection for valuable server hardware. Careful consideration of these factors ensures optimal performance, reliability, and ease of maintenance for your server infrastructure. We recommend the following cabinet specifications for optimal XE9680 installation:
Installing servers in a rack is a crucial aspect of server management. Proper placement within the rack ensures efficient use of space, ease of access, and optimal airflow. Each server should be securely mounted in the rack, taking into account factors such as weight distribution and cable management. Strategic placement allows for better cooling, reducing the risk of overheating, and prolonging the lifespan of the equipment. Additionally, thoughtful placement enables easy maintenance, troubleshooting, and scalability as the server environment evolves. By giving careful consideration to the placement of servers in a rack, you can create a well-organized and functional setup that maximizes performance and minimizes downtime. We recommend the following:
Figure 1. 4x PowerEdge XE9680 servers in a rack
The PowerEdge XE9680, equipped with H100 GPUs, has an approximate maximum power draw of 11.5kW. It comes with six 2800W Mixed Mode power supply units (PSUs) that feature a C22 input socket.
The XE9680 currently supports 5+1 fault-tolerant redundancy (FTR). (An additional 3+3 FTR configuration will be introduced in the Fall of 2023.) It is important to note that in 3+3 mode, system performance may throttle upon power supply failure to prevent overloading the remaining power supplies.
Figure 2. PowerEdge XE9680 with PDU
For the XE9680, we recommend the following PDU specifications:
Table 3. PDU specifications
PDU Input Voltage | XE9680s Per Cabinet | PDUs Per Cabinet | Circuit Breakers Per PDU (Min) | Single PDU Requirement (Min) |
208V | 2 | 2 | 6 | 60A (48A Rated) 17.3kW |
208V | 2 | 4 | 3 | 30A (24A Rated) 8.6kW |
208V | 4 | 2 | 12 | 100A (80A Rated) 28.8kW |
208V | 4 | 4 | 6 | 60A (48A Rated) 17.3kW |
400/415V | 2 | 2 | 6 | |
400/415V | 2 | 4 | 3 | 20A (16A Rated) 11.1kW@400 / 11.5kW@415V |
400/415V | 4 | 2 | 12 | |
400/415V | 4 | 4 | 6 |
Note: Single PDU Power Requirement = Input Voltage * Current Rating * 1.73.
The factor of 1.73 (the square root of 3) is used to account for three-phase power systems commonly used in data centers and industrial settings. By multiplying the input voltage, current rating, and 1.73, you can determine the power capacity needed for a single PDU to adequately support the connected equipment. This calculation helps ensure that the PDU can handle the power load and prevent overloading or electrical issues.
Thermal management is important in data centers to ensure equipment reliability, optimize performance, improve energy efficiency, prolong equipment lifespan, and reduce environmental impact. By maintaining appropriate temperature levels, data centers can achieve a balance between operational reliability, energy efficiency, and cost-effectiveness.
Dell Technologies recommends the following best practices for thermal management:
The XE9680 is engineered to operate efficiently within ambient temperature conditions of up to 35°C. Although it is technically capable of functioning in such environments, maintaining lower temperatures is highly recommended to ensure the device's optimal performance and reliability. By operating the XE9680 in a cooler environment, the risk of overheating and potential performance degradation can be mitigated, resulting in a more stable and reliable operation overall.
Proper cable management in a server rack improves organization, airflow, accessibility, safety, and scalability. It enhances the reliability, performance, and maintainability of the entire IT infrastructure.
The PowerEdge XE9680 supports Ethernet and InfiniBand network adaptors, which are installed at the front of the server for easy access in cold aisles. To ensure proper cable management, the chosen cabinet solution should provide a minimum clearance of 93.12mm from the face of the network adaptor to the cabinet door. This clearance is necessary to accommodate the bend radius of a typical DAC (Direct Attach Cable) cable (see Figure 3).
Figure 3. DAC clearance recommendations
The maximum cable length in the figure 6 is 2.07 meters or 81.49 inches.
With adjacent racks, it is possible to improve cable management by removing the inner side panels. This alteration provides an open space along the sides of the racks, allowing cables to be conveniently routed between adjacent racks. By eliminating the inner side panels, technicians or IT professionals gain unobstructed access to the interconnecting cables, making it simpler to establish and maintain organized cabling infrastructure.
The following two figures show power cables routed through the optional cable management arm (CMA). The CMA can be mounted to either side of the sliding server rails.
Figure 4. Power cables in cable arm
AI server network switches play a crucial role in supporting high-performance and data-intensive artificial intelligence workloads. These switches handle the demanding requirements of AI applications, providing high bandwidth, low latency, and efficient data transfer. They facilitate seamless communication and data exchange between AI servers, to ensure optimal performance and to minimize bottlenecks.
Installing a switch in a rack for servers is vital for establishing a robust and efficient network infrastructure, enabling seamless communication, centralized management, scalability, and optimal performance for the server environment.
The network switch may require offsetting within the rack to accommodate the bend radius of specific networking cables. To achieve this, a bracket can be utilized to push the network switch towards the rear of the rack, creating space for the necessary cable bend radius while ensuring proper installation of the front door. The accompanying images demonstrate the process of using the bracket to adjust the network switch position within the rack. This allows for optimal cable management and ensures the smooth operation of the network infrastructure.
Figure 6. Switch offset brackets
The Dell Enterprise Infrastructure Planning Tool (EIPT) helps IT professionals, plan and tune their computer and infrastructure equipment for maximum efficiency. Offering a wide range of configuration flexibility and environmental inputs, this can help right size your IT environment. EIPT is a model driven tool supporting many products and configurations for infrastructure sizing purposes. EIPT models are based on hardware measurements with operating conditions representative of typical use cases. Workloads can impact the power consumption greatly. For example, the same percent CPU utilization and different workloads can lead to widely different power consumption. It is not possible to cover all the workload, environmental, and customer data center factors in a model and provide a percent accuracy figure with any degree of confidence. With that said, Dell Technologies would anticipate (NOT guarantee or claim) a potential for some variation. Customers are always advised to confirm EIPT estimates with actual measurements under their own actual workloads.
Figure 7. Dell EIPT tool
Leading edge technologies bring implementation challenges that can be reduced or eliminated with Dell Rack Integration Services. We have the experience and expertise to engineer, integrate, and install your Dell storage, server, or networking solution. Our proven integration methodology will take you step by step from a plan to a ready-to-use solution:
Contact your account manager and go to Custom deployment services to learn more.
Thu, 31 Aug 2023 17:42:58 -0000
|Read Time: 0 minutes
Enterprises want to build and operate applications that have low latency requirements to process and analyze real-time data, and they want to provide intelligence for smarter decision-making at the edge. However, they face many challenges: aging infrastructure, limited edge-computing resources, environmental factors, and lack of IT staff to deploy and support applications across many edge sites.
This document provides an overview of a combined edge platform built on Dell PowerEdge XR servers and VMware Edge Compute Stack to solve these challenges. It describes key use cases in retail, manufacturing, and other industries.
The PowerEdge XR server series is built to capture and process more data at the edge, with enterprise-grade compute abilities providing high performance with low latency for the edge. The XR servers can withstand unpredictable and challenging deployment environments. XR4000 is the new high-performance multi-node XR server, purpose-built for ultra-short depth and low power, and with flexible configurations. These configurations are also available on our Dell vSAN Ready Nodes.
Edge Compute Stack (ECS) is a fully integrated edge platform for customers with many edge sites. ECS empowers IT and OT to deliver intelligent real-time solutions, offering flexibility, consistency, security, and extensibility:
This document includes a combined XR4000 and ECS reference architecture validated and supported by Dell Technologies and VMware. It also provides sample configurations for customers and partners to use as a starting point to design and implement the combined edge platform.
Key use cases for the solution are in the retail, manufacturing, and government sectors.
Retailers adapted to the pandemic with increased use of self-service checkout and new delivery mechanisms. They are deploying edge applications to improve customer experience and profitability:
The XR4000 and ECS platform provides high flexibility and performance to deploy and run these retail solutions while optimizing expensive retail space and meeting store environmental requirements.
The Industry 4.0 movement is digitizing manufacturing for greater efficiency and flexibility. Manufacturers are deploying edge applications for the following use cases:
The XR4000 and ECS platform provides a foundation for these solutions for machine aggregation and virtualization, OT/IT translation, industrial automation, and AI inferencing.
Defense, law enforcement, and emergency response organizations have specific requirements for tactical and mobile edge deployments:
XR4000 is highly portable and hardened for dusty, hot/cold operations. It is tested with NEBS Level 3 and MIL certifications. With ruggedized ATA-compliant compact and mobile systems from Dell OEM partners, the XR4000 and ECS platform is ideal for tactical and mobile edge workloads.
Figure 1 illustrates the combined XR4000 and ECS reference architecture. It consolidates VMs and the Kubernetes management cluster in the central data center. It also includes self-contained 2-node vSAN and TKG Multi-Cloud (TKGm) clusters at every edge site. A purpose-built vSAN witness node XR4000w (Nano Processing Unit, shown in Figure 2) is integrated within several XR4000 chassis options, enabling a highly efficient and reliable edge stack. An optional SD-WAN virtual edge can provide optimal connectivity and additional security. The centralized VMware vCenter and TKG management cluster simplify vSAN and TKGm deployment at the edge sites.
Figure 1. XR4000 and ECS reference architecture
Figure 2. Nano Processing Unit
Dell PowerEdge XR4000 is a rugged multi-node edge server available in two unique and flexible form factors. The “rackable” chassis supports up to four 1U sleds; the “stackable” chassis supports up to two 2U sleds. The 1U sled is provided for dense compute requirements. The 2U chassis shares the same “1st U” and common motherboard with the 1U sled but includes an additional riser to provide two more PCIe Gen4 FHFL I/O slots. Customers who need additional storage or PCIe expansion can choose a 2U sled option. All XR4000 chassis support both front-to-back and back-to-front airflow.
The following table provides details for two sample configurations—one rackable and the other stackable.
Table 1. Sample configurations
| Rackable configuration 2 x 2U | Stackable configuration 2 x 1U |
|
| |
Edge Compute Stack (ECS) | VMware ECS Advanced (vSphere Edge, vSAN Standard for Edge, Tanzu Mission Control Advanced), 1/3/5-year term license, up to 128 cores per edge instance | |
Chassis | Dell PowerEdge XR4000r 2U, 14 inches deep,19 inches wide | Dell PowerEdge XR4000z 2U, 14 inches deep, 10.5 inches wide |
Mounting options | Mounting ears to support a standard 19-inch-wide rack | Deployed in desktop, VESA plates, DIN rails, or stacked environments |
Power supply | Front port access, dual, hot-plug (1+1), 1400 W, RAF | |
Operating range | –5°C to 55°C (32°F to 131°F) | |
Witness node | 1 x Dell PowerEdge XR4000w, VMware Certified | |
Server
| 2 x Dell PowerEdge XR4520c sleds, VMware Certified | 2 x Dell PowerEdge XR4510c sleds, VMware Certified |
Total capacity of 2 x 2U sleds | Total capacity of 2 x 1U sleds | |
Security | Trusted Platform Module 2.0 V3 | |
CPU cores* | 32 cores (2 x 1S Intel Ice Lake Xeon-D 16 cores CPU) | |
Memory* | 256 GB (8 x 32 GB RDIMM) | 128 GB (8 x 16 GB RDIMM) |
Boot drive | 2 x BOSS-N1 controller card + with 2 M.2 960 GB - RAID 1 | 2 x BOSS-N1 controller card + with 2 M.2 480 GB - RAID 1 |
Storage* | 15.2 TB (8 x 1.9 TB, SSDR, 2E, M.2) | |
Network | 4 x 10 GbE Base-T or SFP for 4/8 core CPU; | |
GPU (optional) | 2 x NVIDIA Ampere A2, PCIe, 60 W, 16 GB Passive, Full Height GPU, VMware Certified | Not Applicable |
System management | iDRAC9, Dell OpenManage Enterprise Advanced Plus, integration for VMware vCenter |
*In a High Availability (HA) 2-node vSAN cluster, for failover to work properly, total consumable CPU, Memory, and Storage for application workloads should not exceed the available resources of a single node.
The edge platform built on Dell PowerEdge XR4000 server and VMware Edge Compute Stack aims to help retail, manufacturing, and government customer organizations build and operate applications that provide intelligence for smarter decision-making and deliver immersive digital experiences at the edge. The combined reference architecture and configuration examples described in this document are designed to help our joint customers in designing and implementing a consistent, flexible, secure, and extensible edge solution.
To learn more about the flexible configurations of the Dell XR4000 chassis and compute sleds, see PowerEdge XR Rugged Servers.
For more information about VMware Edge Compute Stack, see VMware Edge Compute Stack and contact the VMware team at edgecomputestack@vmware.com.
Thu, 27 Jul 2023 20:40:00 -0000
|Read Time: 0 minutes
This study is intended to help customers understand the behavior of the XR8000 PowerEdge server in harsh environmental conditions at the edge, and its resulting performance.
The need to improve power efficiency and provide sustainable solutions has been imminent for some time. According to a Bloomberg report, in some countries, data centers will account for an estimated 5-10% of energy consumption by 2030. This will include the demand for edge and cloud computing requirements[1]. Dell Technologies continues to innovate in this aspect and has launched its latest portfolio of XR servers for the edge and telecom this year.
The latest offering from the Dell XR portfolio is a series of rugged servers purpose-built for the edge and telecom, especially targeting workloads for retail, manufacturing, and defense. This document highlights the testing results for power consumption and fan speed across the varying temperature range of -5 to 55°C (23F to 122F) by running iPerf3 on the XR8000 server.
The short-depth XR8000 server, which comes in a sledded server architecture (with 1U and 2U single-socket form factors), is optimized for total cost of ownership (TCO) and performance in O-RAN applications. It is RAN optimized with integrated networking and 1/0 PTP/SyncE support. Its front-accessible design radically simplifies sled serviceability in the field.
The PowerEdge XR8000 server is built rugged to operate in temperatures from -5°C to 55°C for select configurations. (For additional details, see the PowerEdge XR8000 Specification Sheet.)
Figure 1. Dell PowerEdge XR8000
For the purpose of conducting this test, we placed a 2U XR8000 inside the thermal chamber in our test lab. While in the thermal chamber, we ran the iPerf3 workload on the system for more than eight hours, stressing the system from 5-20%. We measured power consumption and fan speed using iDRAC at 10-degree intervals of Celsius temperature from 0C to 55C.
The iPerf3 throughput measured for 1GB, 10GB, and 25GB seemed consistent across the entire temperature range, with no impact on performance as temperature increased. The fan speed and power consumption increased with temperature, which is the expected behavior.
Figure 2. Thermal chamber in the Dell performance testing lab
Table 1. System configuration
Node hardware configuration | Chassis configuration | SW configuration | |
1 x 6421N (4th Generation Intel® Xeon® Scalable Processors) | 2 x 8610t | BIOS | 1.1.0 |
8 x 64GB PC5 4800MT | 2 x 1400w PSU | CPLD | 1.1.1 |
1 x Dell NVMe 7400 M.2 960GB |
| iDRAC | 6.10.89.00 Build X15 |
1 x DP 25GB BCM 57414 |
| CM | 1.10 |
|
| PCIe SSD | 1.0.0 |
|
| BCM 57414 | 21.80.16.92 |
iPerf3 is an open-source tool for actively measuring the maximum achievable bandwidth on IP networks. It supports the tuning of various parameters related to timing, buffers, and protocols (TCP, UDP, SCTP with IPv4, and SCTP with IPv6). For each test it reports bandwidth, loss, and other parameters. An added advantage of using iPerf3 for testing network performance is that it is very reliable if you have two servers, in geographically different locations, and you want to measure network performance between them. (For additional details about iPerf3, see iPerf - The ultimate speed test tool for TCP, UDP and SCTP.)
Figure 3. Constant networking performance with varying temperature and fan speed
Figure 3 shows that as the temperature and fan speed increases, the iPerf3 throughput stays the same. Fan speed is only 14% for temperatures near 20°C.
Figure 4. Power consumption and fan speed
Figure 4 shows that as temperature increases, Chassis power consumption for the system increases. It is 254W at 20°C.
The consistent performance with increasing temperature and power can be attributed to several design considerations when designing and building these edge/telecom servers:
For more details about the design considerations used for edge servers, see the blog Computing on the Edge–Other Design Considerations for the Edge.
To best supplement the improved cooling hardware, the PowerEdge engineering team carried on the key features from the previous generation of PowerEdge servers to deliver autonomous thermal solutions capable of cooling next-generation PowerEdge servers.
An iDRAC feature in XR8000 detects Dell PCIe cards and automatically delivers the correct airflow to the slot to cool that card. When non-Dell PCIe cards are detected, the customer is given the option to enter the airflow requirement (LFM – Linear Feet per Minute) as specified by the card manufacturer. iDRAC and the fan algorithm ‘learn’ this information and the card is automatically cooled with the proper airflow. This feature saves power by not having to run the fans to cool the worst-case card in the system. Noise is also reduced.
More information about thermal management, see “Thermal Manage” Features and Benefits.
Figure 5. iDRAC settings to view fan status during our XR8000 testing in the thermal chamber
Dell Technologies is continuing its efforts to test other XR devices and to determine power consumption for various workloads and its variation with changes in temperature. This study is intended to help customers understand the behavior of XR servers in harsh environmental conditions at the edge and their resulting performance.
[1] https://stlpartners.com/articles/sustainability/edge-computing-sustainability
[2] https://www.intel.com/content/www/us/en/support/articles/000038309/processors/intel-xeon-processors.html
Wed, 28 Jun 2023 00:02:48 -0000
|Read Time: 0 minutes
Executive Summary
The PowerEdge XE9680 is a high-performance server designed and optimized to enable uncompromising performance for artificial intelligence, machine learning, and high-performance computing workloads. Dell PowerEdge has launched our innovative 8-way GPU platform with advanced features and capabilities.
We are thrilled to share this insightful report that provides performance insights into the exceptional capabilities of the PowerEdge XE9680. Through rigorous testing and evaluation using MLPerf 3.0 benchmarks from MLCommons, this document offers a detailed analysis of the PowerEdge XE9680's outstanding performance in AI model training.
MLPerf is a suite of benchmarks that assess the performance of machine learning (ML) workloads, focusing on two crucial aspects of the ML life cycle: training and inference. This tech note delves explicitly into the training aspect of MLPerf 3.0.
The Dell performance labs conducted MLPerf 3.0 Training benchmarks using the latest PowerEdge XE9680 with 8x NVIDIA H100 80GB SXM GPUs. For comparison, we also ran these tests on the previous generation PowerEdge XE8545, equipped with 4x NVIDIA A100 80GB SXM GPUs.
BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based neural network model introduced by Google in 2018. It is designed to understand and generate human-like text by capturing the context and meaning of words in each sequence. We are thrilled that the PowerEdge XE9680 with H100 GPUs delivered a 6x time-to-train performance improvement in the MLPerf NLP benchmark results using the BERT-large model with the Wikipedia dataset. This translates to accelerated time-to-value as we help our customers unlock the potential of remarkably faster model training.
Please note that throughout this report, a lower time-to-train value indicates improved efficiency and faster model convergence. As you analyze the graphs and performance metrics, remember that achieving lower time-to-train values demonstrates the PowerEdge XE9680's ability to expedite AI model training, delivering enhanced speed and efficiency results.
In MLPerf 3.0, the RetinaNet model leverages the Open Images dataset of millions of diverse images. In this benchmark, we observed an impressive, nearly 6x enhancement in training time for the model.
By utilizing the RetinaNet model with the Open Images dataset, MLPerf enables comprehensive evaluations and comparisons of system capabilities. The scale and diversity of the dataset ensure a robust assessment of object detection performance across various domains and object categories.
The PowerEdge XE9680 consistently delivers remarkable results across the entire MLPerf 3.0 Training benchmark suite, as depicted in the following figure. This robust performance underscores the server's exceptional capabilities and reliability in tackling a wide range of demanding machine learning tasks.
The PowerEdge XE9680 server surpasses our previous generation offering by delivering up to a 6x performance boost. This remarkable advancement translates into significantly accelerated AI model training, enabling your team to complete training tasks faster. To learn more about this server, we encourage you to contact your dedicated account executive or visit www.dell.com.
Table 1. Server configuration
PowerEdge XE8545 | PowerEdge XE9680 | |
CPU
| 2x AMD EPYC 7763 64-Core Processor | 2x Intel® Xeon® 8470 52-core Processor |
GPU | 4x NVIDIA A100-SXM-80GB (500W) | 8x NVIDIA H100-SXM-80GB (700W) |
Fri, 19 May 2023 19:49:42 -0000
|Read Time: 0 minutes
Click here to get the github code! |
Tue, 11 Apr 2023 22:40:39 -0000
|Read Time: 0 minutes
The Dell PowerEdge R750xa, powered by the 3rd Generation Intel® Xeon® Scalable processors, is a dual-socket/2U rack server that delivers outstanding performance for the most demanding emerging and intensive GPU workloads. It supports eight channels/CPU, and up to 32 DDR4 DIMMs @ 3200 MT/s DIMM speed. In addition, the PowerEdge R750xa supports PCIe Gen 4, and up to eight SAS/SATA SSD or NVMe drives.
Up to 29% higher inference performance PowerEdge R750xa and NVIDIA H100 PCIe GPU(1)
One platform that supports all of the PCIe GPUs in the PowerEdge portfolio makes the PowerEdge R750xa the ideal server for workloads including AI-ML/DL Training and Inferencing, High-Performance Computing, and virtualization environments. The PowerEdge R750xa includes all of the benefits of core PowerEdge: serviceability, consistent systems management with IDRAC, and the latest in extreme acceleration.
The new NVIDIA® H100 PCIe GPU is optimal for delivering the fastest business outcomes with the latest accelerated servers in the Dell PowerEdge portfolio, starting with the R750xa. The PowerEdge R750xa boosts workloads to new performance heights with GPU and accelerator support for demanding workloads, including enterprise AI. With its enhanced, air-cooled design and support for up to four NVIDIA double-width GPUs, the PowerEdge R750xa server is purpose-built for optimal performance for the entire spectrum of HPC, AI-ML/DL training, and inferencing workloads. Learn more here.
The Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA® H100 PCIe 310W GPU to the last Gen A00 PCIe GPU in the Dell PowerEdge R750xa. They ran the popular TensorRT Inference benchmark across various batch sizes to evaluate inferencing performance.
The results are in Figure 1.
Figure 1. TensorRT
According to the industry standard TensorRT Inference Resnet50-v1.5 benchmark, the PowerEdge R750xa with NVIDIA's H100 PCIe 310W GPU processes approximately 29% more images per second than the NVIDIA A100 PCIe 300W GPU on the same server across various batch sizes. This significant improvement in image processing speed translates to higher overall throughput for inferencing workloads, making the PowerEdge R750xa with the H100 GPU an excellent choice for demanding applications.
| R750xa with 4 NVIDIA H100 | R750xa with 4 NVIDIA A100 |
Server | PowerEdge R750xa | |
CPU | 2x Intel(R) Xeon(R) Gold 6338 CPU | |
Memory | 512G system memory | |
Storage | 1x 3.5T SSD | |
BIOS/iDRAC | 1.9.0/6.0.0.0 | |
Benchmark version | TensorRT Inference Resnet50-v1.5 | |
Operating System | Ubuntu 20.04 LTS | |
GPU | NVIDIA H100-PCIe-80GB (310W) | NVIDIA A100-PCIe-80GB (300W) |
Driver | CUDA 11.8 | CUDA 11.8 |
The PowerEdge R750xa supports up to four NVIDIA H100 PCIe adaptor GPUs and is available with new orders or as a customer upgrade kit for existing deployments.
Tue, 28 Mar 2023 23:05:15 -0000
|Read Time: 0 minutes
The Dell PowerEdge XE9680 is a high-performance server designed and optimized to enable uncompromising performance for artificial intelligence, machine learning, and high-performance computing workloads. Dell PowerEdge is launching our innovative 8-way GPU platform with advanced features and capabilities.
This tech note, Direct from Development (DfD), offers valuable insights into the performance of the PowerEdge XE9680 using MLPerf 2.1 benchmarks from MLCommons.
MLPerf is a suite of benchmarks that assess the performance of machine learning (ML) workloads, with a focus on two crucial aspects of the ML life cycle: training and inference. This tech note specifically delves into the training aspect of MLPerf.
The Dell CET AI Performance and the Dell HPC & AI Innovation Lab conducted MLPerf 2.1 Training benchmarks using the latest PowerEdge XE9680 equipped with 8x NVIDIA A100 80GB SXM GPUs. For comparison, we also ran these tests on the previous generation PowerEdge XE8545, equipped with 4x NVIDIA A100 80GB SXM GPUs. The following section presents the results of our tests. Please note that in the figure below, a lower number indicates better performance and the results have not been verified by MLCommons.
Figure 1. MLPERF 2.1 Training
Our latest server, the PowerEdge XE9680 with 8x NVIDIA A100 80GB SXM GPUs, delivers on average twice the performance of our previous-generation server. This translates to faster AI model training, enabling models to be trained in half the time! With the PowerEdge XE9680, you can accelerate your AI workloads and achieve better results, faster than ever before. Contact your account executive or visit www.dell.com to learn more.
Table 1. Server configuration
(1) Testing conducted by Dell in March of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA A100 SXM4-80GB and PowerEdge XE8545 with 4x NVIDIA A100-SXM-80GB. Unverified MLPerf v2.1 BERT NLP v2.1, Mask R-CNN object detection, heavy-weight v2.1 COCO 2017, 3D U-Net image segmentation v2.1 KiTS19, RNN-T speech recognition v2.1 rnnt Training. Result not verified by MLCommons Association. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.” Actual results will vary.
Tue, 28 Mar 2023 23:05:16 -0000
|Read Time: 0 minutes
The Dell PowerEdge XE9680 is a high-performance server designed and optimized to enable uncompromising performance for artificial intelligence, machine learning, and high-performance computing workloads. Dell PowerEdge is launching our innovative 8-way GPU platform with advanced features and capabilities.
This Direct from Development (DfD) tech note provides valuable insights on AI inferencing performance for the recently launched PowerEdge XE9680 server by Dell Technologies.
To evaluate the inferencing performance of each GPU option available on the new PowerEdge XE9680, the Dell CET AI Performance Lab, and the Dell HPC & AI Innovation Lab selected several popular AI models for benchmarking. Additionally, to provide a basis for comparison, they also ran benchmarks on our last-generation PowerEdge XE8545. The following workloads were chosen for the evaluation:
The results are remarkable! The PowerEdge XE9680 demonstrates exceptional inferencing performance!
Comparing the NVIDIA A100 SXM configuration with the NVIDIA H100 SXM configuration on the same PowerEdge XE9680 reveals up to a 300% improvement in inferencing performance! (1)
Even more impressive is the comparison between the PowerEdge XE9680 NVIDIA H100 SXM server and the XE8545 NVIDIA A100 SXM server, which shows up to a 700% improvement in inferencing performance! (2)
Here are the results of each benchmark. In all cases, higher is better.
With exceptional AI inferencing performance, the PowerEdge XE9680 sets a high benchmark for today’s and tomorrow's AI demands. Its advanced features and capabilities provide a solid foundation for businesses and organizations to take advantage of AI and unlock new opportunities.
Contact your account executive or visit www.dell.com to learn more.
Table 1. Server configuration
(1) Testing conducted by Dell in March of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA H100 SXM5-80GB and PowerEdge XE9680 with 8x NVIDIA A100 SXM4-80G. Actual results will vary.
(2) Testing conducted by Dell in March of 2023. Performed on PowerEdge XE9680 with 8x NVIDIA H100 SXM5-80GB and PowerEdge XE8545 with 4x NVIDIA A100-SXM-80GB. Actual results will vary.
Tue, 28 Mar 2023 23:05:16 -0000
|Read Time: 0 minutes
The Dell PowerEdge XE9680 is a high-performance server designed and optimized to enable uncompromising performance for artificial intelligence, machine learning, and high-performance computing workloads. Dell PowerEdge is launching our innovative 8-way GPU platform with advanced features and capabilities.
This Direct from Development (DfD) tech note offers valuable performance insights for High-Performance Linpack (HPL), a widely accepted benchmark for measuring HPC system performance.
The TOP500 list frequently relies on HPL to assess and rank supercomputer performance. Utilizing the Linpack library, HPL measures FLOPS (floating-point operations per second) by creating and solving linear equations, making it a reliable benchmark for evaluating HPC system efficiency.
The Dell HPC & AI Innovation Lab used HPL to compare the performance of the PowerEdge XE9680 to our last generation PowerEdge XE8545. There are two key differentiators between the servers that affect HPL performance here: the quantity and model of GPUs supported by each platform.
Regarding GPU configuration, the PowerEdge XE9680 was equipped with 8x H100 80GB SXM GPUs, while the PowerEdge XE9680 was outfitted with 4x A100 80GB SXM GPUs.
In the HPL benchmark, the PowerEdge XE9680 equipped with NVIDIA's latest H100 80GB SXM GPU outperforms the PowerEdge XE8545 by an impressive 543% more TeraFLOPS!1
The PowerEdge XE9680, with the latest NVIDIA H100 SXM GPU, advances HPC performance. With exceptional HPL performance, the PowerEdge XE9680 sets a high benchmark for today’s and tomorrow's HPC demands. Contact your account executive or visit www.dell.com to learn more.
Table 1. Server configuration
Fri, 03 Mar 2023 20:01:50 -0000
|Read Time: 0 minutes
Dell Technologies has recently announced the launch of next-generation Dell PowerEdge servers that deliver advanced performance and energy-efficient design.
This Direct from Development Tech Note describes the new capabilities you can expect from the next generation of PowerEdge servers. It discusses the test and results for machine learning (ML) performance of the PowerEdge XR5610 using the industry-standard MLPerf Inference v2.1 benchmarking suite. The XR5610 has target workloads in networking and communication, enterprise edge, military, and defense—all key workloads requiring AI/ML inferencing capabilities at the edge.
The 1U single-socket XR5610 is an edge-optimized short-depth rugged 1U server powered by 4th Generation Intel® Xeon® Scalable processors with the MCC SKU stack. It includes the latest generation of technologies, with slots up to 8x DDR5 and two PCIe Gen5x16 card slots, and is capable of 46 percent faster image classification (reduced latency) workload as compared to the previous-generation PowerEdge XR12.
Edge computing, in essence, brings compute power close to the source of the data. As Internet of Things (IoT) endpoints and other devices generate more and more time-sensitive data, edge computing becomes increasingly important. Machine learning (ML) and artificial intelligence (AI) applications are particularly suitable for edge computing deployments. The environmental conditions for edge computing are typically vastly different than those at centralized data centers. Edge computing sites, at best, might consist of little more than a telecommunications closet with minimal or no HVAC.
Dell PowerEdge XR5610 is a rugged, short-depth (400 mm class) 1U server for the edge, designed for deployment in locations constrained by space or environmental challenges. It is well suited to operate at high temperatures ranging from –5°C to 55°C (23°F to 131°F) and designed to excel with telecom vRAN workloads, military and defense deployments, and retail AI including video monitoring, IoT device aggregation, and PoS analytics.
Figure 1. Dell PowerEdge XR5610 – 1U
According to a recent Forrester report, “Edge intelligence, a top 10 emerging technology in 2022, helps capture data, embed inferencing, and connect insight in a real-time network of application, device, and communication ecosystems.”
Figure 2. Forrester report excerpt, reprinted with permission
MLPerf Inference is a multifaceted benchmark framework, measuring four different workload types and three processing scenarios. The workloads are image classification, object detection, medical imaging, speech-to-text, and natural language processing (BERT). The processing scenarios, as outlined in the following table, are single stream, multistream, and offline.
Table 1. MLPerf Inference benchmark scenarios
Scenario | Performance metric | Use case |
Single stream | 90th latency percentile | Search results. Waits until the query is made and returns the search results. Example: Google voice search |
Multistream | 99th latency percentile | Multicamera monitoring and quick decisions. Acts more like a CCTV backend system that processes multiple real-time streams and identifies suspicious behaviors. Example: Self-driving car that merges all multiple camera inputs and makes drive decisions in real time |
Offline | Measured throughput | Batch processing, also known as offline processing. Example: Google Photos service that identifies pictures, tags people, and generates an album with specific people and locations or events offline |
The MLPerf suite for inferencing includes the following benchmarks:
Table 2. MLPerf suite for inferencing benchmarks
Area | Task | Model | Dataset | QSL size | Quality |
Vision | Image classification | Resnet50-v1.5 | ImageNet (224x224) | 1024 | 99% of FP32 (76.46%) |
Vision | Object detection | Retinanet | OpenImages (800x800) | 64 | 99% of FP32 (0.3755 mAP) |
Vision | Medical image segmentation | 3D UNET | KiTS 2019 | 42 | 99% of FP32 and 99.9% of FP32 (0.86330 mean DICE score) |
Speech | Speech-to-text | RNNT | Librispeech dev-clean (samples < 15 seconds) | 2513 | 99% of FP32 (1 – WER, where WER=7.452253714852645%) |
Language | Language processing | BERT | SQuAD v1.1 (max_seq_len=384) | 10833 | 99% of FP32 (f1_score=90.874%) |
The following table outlines the key specifications of the PowerEdge XR5610 that was used for the MLPerf Inference test suite.
Table 3. Dell PowerEdge XR5610 key specifications for MLPerf Inference test suite
Component | Specifications |
CPU | 4th Gen Intel Xeon Scalable processors MCC SKU |
Operating system | CentOS 8.2.2004 |
Memory | 256 GB |
GPU | NVIDIA A2 |
GPU count | 1 |
Networking | 1x ConnectX-5 IB EDR 100 Gbps |
Software stack |
|
Storage | NVMe SSD 1.8 TB |
Table 4 shows the specifications of the NVIDIA GPUs that were used in the benchmark tests.
Table 4. NVIDIA GPUs tested
GPU model | GPU memory | Maximum power consumption | Form factor | 2-way bridge | Recommended workloads |
PCIe adapter form factor | |||||
A2 | 16 GB GDDR6 | 60 W | SW, HHHL, or FHHL | Not applicable | AI inferencing, edge, VDI |
The edge server offloads the image processing to the GPU, and, just as servers have different price/performance levels to suit different requirements, so do GPUs. XR5610 supports up to 2x SW GPUs, as did the previous-generation XR11.
XR5610 was tested with the NVIDIA A2 GPU for the entire range of MLPerf workloads on the offline scenario. The following figure shows the results of the testing.
Figure 3. NVIDIA A2 GPU test results for MLPerf offline scenario
XR5610 also was tested with the NVIDIA A2 GPU for the entire range of MLPerf workloads on the single stream scenario. The following figure shows the results of that testing.
Figure 4. NVIDIA A2 GPU test results for MLPerf single stream scenario
In some tasks/workloads, the XR5610 showed improvement over previous generations, resulting from the integration of new technologies such as PCIe Gen 5.
The PowerEdge XR5610 delivered 46 percent better image classification latency compared to the prior-generation PowerEdge server, as shown in the following figure.
Figure 5. Image classification latencies: XR5610 and prior-generation PowerEdge server
The Dell XR5610 delivered 15 percent better speech-to-text throughput compared to the prior-generation PowerEdge server, as shown in the following figure.
Figure 6. Speech to text latencies: XR5610 and prior-generation PowerEdge server
The PowerEdge XR portfolio continues to provide a streamlined approach for various edge and telecom deployment options based on different use cases. It provides a solution to the challenge of a small form factor at the edge with industry-standard rugged certifications (NEBS), providing a compact solution for scalability and for flexibility in a temperature range of –5°C to +55°C.
Notes:
Fri, 03 Mar 2023 19:57:24 -0000
|Read Time: 0 minutes
Dell Technologies has recently introduced the next generation of Dell PowerEdge XR servers. Powered by 4th Gen Intel® Xeon® Scalable processors with the MCC SKU stack, these servers deliver advanced performance in an energy-efficient design. Dell continues to provide scalability and flexibility with its latest portfolio of short-depth XR servers. These servers integrate technologies such as 4th Gen Intel CPUs, PCIe Gen5, DDR5, NVMe drives, and GPU slots, and they are compliance-tested for NEBS and MIL-STD.
This tech note discusses our CPU performance benchmark testing of the next-generation PowerEdge XR server portfolio and the test results that show improvements over previous PowerEdge XR servers powered by 3rd Gen Intel Xeon Scalable processors and Xeon D processors.
4th Gen Intel Xeon Scalable processors with the MCC SKU stack were tested using the STREAM and HPL benchmarks and compared with the CPU of the previous generation of XR servers.
The STREAM benchmark is a simple, synthetic benchmark designed to measure sustainable memory bandwidth (in MB/s) and a corresponding computation rate for four simple vector kernels: Copy, Scale, Add, and Triad. The STREAM benchmark is designed to work with datasets much larger than the available cache on any system so that the results are (presumably) more indicative of the performance of very large, vector-style applications. Ultimately, we get a reference for compute performance.
HPL is a high-performance LINPACK benchmark implementation. The code solves a uniformly random system of linear equations and reports time and floating-point operations per second using a standard formula for operation count. It also helps to provide a reference for a system’s compute speed performance.
Benchmark testing showed significant performance increases with the 4th Gen Intel Xeon Scalable MCC SKU stack when it was compared with both the Intel Xeon D SKU and the 3rd Gen Intel Xeon Scalable MCC SKU.
In our tests, the single-socket PowerEdge XR servers with the 4th Gen Intel Xeon Scalable CPU (32 core) MCC SKU stack delivered a 253 percent increase in HPL performance and a 182 percent increase in STREAM performance. Thus, these servers are faster at the network edge or enterprise edge than the previous-generation PowerEdge XR servers powered by the Intel Xeon D (16 core) SKU.
Figure 1 and Figure 2 show the results of the benchmark tests that compared the performance of the 4th Gen Intel Xeon Scalable processor MCC SKU stack with the Intel Xeon D SKU.
Figure 1. HPL performance comparison: Intel Xeon D SKU and 4th Gen Intel Xeon Scalable MCC SKU
Figure 2. STREAM performance comparison: Intel Xeon D SKU and 4th Gen Intel Xeon Scalable MCC SKU
In our tests, the single-socket PowerEdge XR servers with the 4th Gen Intel Scalable CPU (32 core) MCC stack delivered a 52 percent increase in STREAM performance and a 72 percent increase in CPU FP rate base performance (floating point performance for the CPU). Thus, these servers are faster for compute at the network edge or enterprise edge than the previous generation of PowerEdge XR servers powered by the 3rd Gen Intel Xeon Scalable MCC SKU.
Figure 3 and Figure 4 show the results of the benchmark tests that compared the performance of the 4th Gen and 3rd Gen Intel Xeon Scalable processor MCC SKU stacks.
Figure 3. STREAM performance for 4th and 3rd Gen Intel Xeon Scalable processors
Figure 4. CPU FP rate base performance for 4th and 3rd Gen Intel Xeon Scalable processors
The Dell PowerEdge XR portfolio continues to provide CPU-based improvements and a streamlined approach for various edge and telecom deployment options. The XR portfolio provides a solution to the challenge of needing a small form factor at the edge with industry-standard rugged certifications (NEBS). It provides a compact solution for scalability along with flexibility for operating in temperatures ranging from –5°C to +55°C.
Fri, 03 Mar 2023 19:57:24 -0000
|Read Time: 0 minutes
The telecom industry is on a journey of transformation, making pitstops to disaggregate hardware and software, virtualize networks, and introduce cloudification across RAN and core domains. The introduction of 5G and ORAN has accelerated the transformation, and we now see telecom becoming a universal phenomenon and touching all aspects of life.
This telecom evolution opened a number of opportunities for CSPs to diversify their revenue streams, but it also introduced stringent technological implementations. To support higher bandwidth and mMIMO technologies in new-generation systems, solution development teams were faced with strict requirements of latency and synchronization.
In this tech note, we discuss synchronization systems in 5G and ORAN fronthaul interfaces, and next-generation Dell PowerEdge support for synchronization standards.
Note: This document may contain language from third-party content that is not under Dell Technologies’ control and is not consistent with current guidelines for Dell Technologies’ own content. When such third-party content is updated by the relevant third parties, this document will be revised accordingly.
Telecom networks have always required proper and accurate synchronization for handover between different cell sites, reducing interference and increasing performance at cell edge. 2G, 3G, and 4G networks all required a certain level of synchronization, but 5G requires timing in the range of nanoseconds. This enables features such as beamforming and time-division duplexing to function accurately.
Telecom systems generally work based on the following synchronization methods:
The following table lists the telecom requirements for the frequency and phase specifications:
Table 1. Telecom technologies with standard units
Telecom technology | Frequency air interface/network | Phase/time air interface/network |
GSM | 16 ppb/50 pbb | - |
LTE-FDD | 16 ppb/50 ppb | - |
LTE-TDD | 16 ppb/50 ppb | 1500 ns |
LTE-A | 16 ppb/50 ppb | 500 ns |
5G | 16 ppb/50 ppb | 65 ns |
Different standards apply to the transmission of the synchronization signals, as outlined in the following table:
Table 2. Synchronization methods for distribution standards
Synchronization distribution standard | Time synchronization | Frequency synchronization | Phase synchronization |
PTP (IEEE 1588) | Yes | Yes | Yes |
SyncE | No | Yes | No |
GNSS | Yes | Yes | Yes |
NTP | Yes | Yes | Yes |
O-RAN ALLIANCE has defined four synchronization topologies— LLS-C1 , LLS-C2 , LLS-C3 and LLS-C4—to address different deployment topologies in telecom networks. The following figure shows a typical synchronization flow diagram with synchronization from PRTC flowing to end cell sites:
Figure 1. ORAN synchronization overview
PRTC: Primary Reference Time Clock
GNSS: Global Navigation Satellite System
T-GM: Telecom Grand Master Clock
Stay tuned for another tech note from Dell with more details about synchronization technologies.
In telecom systems, synchronization is delivered by various mechanisms:
The transport network for carrying synchronization can be either the backhaul network used to carry traffic or a dedicated network for transporting synchronization signals.
In 5G and ORAN, gNBs need frequency, phase, and time synchronization. The following two protocols are used for transporting synchronization signals over a packet-based network:
The same packed-based transport network can be used to carry users and control plane traffic.
PTP, defined by the IEEE 1588 standard, was developed to provide accurate distribution of time and frequency over a packet-based network. A PTP synchronization system is composed of PTP-aware devices and non-PTP-aware devices.
The following table describes the clock types in PTP:
Table 3. Types of clocks in PTP
Clock type | Definition | Usage |
Telecom grandmaster (T-GM ) | The master clock at the start of a PTP domain. It is typically located at the core network. | At the beginning of the network to provide timing signals to network. |
Telecom boundary clock (T-BC) | Clock that can act both as a slave and master clock. It takes the sync signal from the master, adjusts for the delay, and generates a new master synchronization signal to pass downstream to the next device. | When the synchronization signal needs to travel through multiple nodes across a long distance. |
Telecom transparent clock (T-TC) | Clock that timestamps a synchronization packet message and sends it forward to the secondary device. It enables the secondary device to calculate the delay of the network. | For scenarios where timing signals are passing through switches. |
Telecom time slave clock (T-TSC) | The end device that receives the synchronization signal. | In telecom, the end node that receives the synchronization signals. |
The following figure illustrates how various types of clocks in PTP interact with each other:
Figure 2. Types of clocks in PTP
In 5G and ORAN, PTP generally works with two types of timing profiles, G8275.1 and G8275.2 . As shown in the following figure, G8725.1 is the profile where all devices are PTP-aware devices, and G8275.2 is the profile where are all devices can be, but might not be, PTP-aware devices. Figure 3. G8275.1 and G8275.2 timing profiles
Why do ORAN and 5G need two PTP profiles? One reason is the use case and implementation perspective of the CSP, as outlined in the following table:
Table 4. PTP telecom profiles
PTP Telecom Profile | Description | Usage |
G8275.1 (full timing support) |
|
|
G8275.2 (partial timing support) |
|
|
SyncE is a synchronization technology that enables the transfer of synchronization signals at the physical layer. It is used to provide accurate and stable frequency synchronization between the different components of a network architecture. Over the fronthaul interface in ORAN and 5G, both SyncE and PTP are used together to provide nanosecond-level synchronization accuracy. SyncE can deliver frequency synchronization, but it cannot deliver phase and time synchronization. It functions independently of the network load and supports the transfer of sync signals where all the interfaces on the intermediate path must support SyncE.
In ORAN, for topology architectures LLS-C1, LLS-C2, and LLS-C3, both SyncE and PTP are used on the fronthaul interface between DU and RU or in the mid-haul interface between CU and DU. When PTP itself can cater to frequency, time, and phase synchronization, why do we need SyncE along with PTP?
The answer is that using both PTP and SyncE delivers these advantages:
Next-generation Dell PowerEdge servers come with Intel NIC cards such as Westport Channel and Logan Beach. All these NIC cards are timing aware and can be used to provide synchronization to downstream nodes. Because these servers can be positioned both as CU and DU, and support LLS-C1 , LLS-C2, and LLS-C3 deployment, support of SyncE and PTP makes these servers an apt choice for RAM and edge deployments.
Dell Technologies continues to provide best-in-class features and specifications for its constantly evolving PowerEdge server portfolio for telecom. The PowerEdge XR8000 (430 mm depth) and XR5610 (463 mm depth) provide scalability and flexibility, with the latest technologies for PCIe, storage, memory, I/O, and even node-chassis infrastructure in a dense (SA1) form factor. With support for PTP and SyncE technologies, these next-generation PowerEdge servers provide essential infrastructure support at the edge.
Fri, 03 Mar 2023 19:57:25 -0000
|Read Time: 0 minutes
Dell has recently announced the launch of Next-generation Dell PowerEdge servers that deliver advanced performance and energy efficient design.
This Direct from Development (DfD) tech note describes the new capabilities you can expect from the next-generation Dell PowerEdge servers powered by Intel 4th Gen Intel® Xeon® Scalable processors MCC SKU stack. This document covers the test and results for ML performance benchmarking for the offline scenario on Dell’s next generation PowerEdge XR 7620 using Multi-Instance GPU technology. XR7620 has target workloads in manufacturing, retail, defense, and telecom - all key workloads requiring AI/ML inferencing capabilities at the edge. Dell continues to provide scalability and flexibility with its latest short-depth XR servers portfolio, integrated with the latest technologies such as 4th Gen Intel CPU, PCIe Gen5, DDR5, NVMe drives, and GPU slots, along with compliance testing for NEBS and MIL-STD.
Multi-Instance GPU (MIG) expands the performance and value of NVIDIA H100, A100, and A30 Tensor Core GPUs. MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores. This gives administrators the ability to support every workload, from the smallest to the largest, with guaranteed quality of service (QoS) and extending the reach of accelerated computing resources to every user.
Without MIG, different jobs running on the same GPU, such as different AI inference requests, compete for the same resources. A job-consuming larger memory bandwidth starves others, resulting in several jobs missing their latency targets. With MIG, jobs run simultaneously on different instances, each with dedicated resources for compute, memory, and memory bandwidth, resulting in a predictable performance with QoS and maximum GPU utilization.
Figure 1. Seven different instances with MIG
Dell defines edge computing as technology that brings compute, storage, and networking closer to the source where data is created. This enables faster processing of data, and consequently, quicker decision making and faster insights. For edge use cases such as running an edge server on a factory floor or in a retail store, requires multiple applications to run simultaneously. One solution to solve this problem can be to add a piece of hardware for each application, but this solution is not scalable or sustainable in the long run. Thus, deploying multiple applications on the same piece of hardware is an option but it can cause much higher latency for different applications.
With multiple applications running on the same device, the device time-slices the applications in a queue so that applications are run sequentially as opposed to concurrently. There is always a delay in results while the device switches from processing data for one application to another.
MIG is an innovative technology to use in such use cases for the edge, where power, cost, and space are important constraints. AI inferencing applications such as computer vision and image detection need to run instantaneously and continuously to avoid any serious consequences due to lack of safety.
Jobs running simultaneously with different resources result in predictable performance with quality of service and maximum GPU utilization. This makes MIG an essential addition to every edge deployment.
MIG can be used in a multitenant environment. It is different from virtual GPU technology because MIG is hardware based, which makes edge computing even more secure.
A GPU can be partitioned into different-sized MIG instances. For example, in an NVIDIA A100 40GB, an administrator could create two instances with 20 gigabytes (GB) of memory each, three instances with 10GB each, or seven instances with 5GB each, or a combination of these.
MIG instances can also be dynamically reconfigured, enabling administrators to shift GPU resources in response to changing user and business demands. For example, seven MIG instances can be used during the day for low-throughput inference and reconfigured to one large MIG instance at night for deep learning training.
Table 1. System architecture
MLPerf system | Edge |
Operating System | CentOS 8.2.2004 |
CPU | 4th Gen Intel Xeon Scalable processors MCC SKU |
Memory | 512GB |
GPU | NVIDIA A100 |
GPU Count | 1 |
Networking | 1x ConnectX-5 IB EDR 100Gb/Sec |
Software Stack | TensorRT 8.4.2 CUDA 11.6 cuDNN 8.4.1 Driver 510.73.08 DALI 0.31.0 |
Table 2. MLPerf scenario used in this test and MIG specs
Scenario | Performance metric | Example use cases | |
Offline | Measured throughput | Batch processing aka Offline processing. Google photos identifies pictures, tags, and people and generates an album with specific people and locations/events offline. | |
MIG Specifications | A100 | ||
Instance types | 7x 10GB 3x 20GB 2x 40GB 1x 80GB | ||
GPU profiling and monitoring | Only one instance at a time | ||
Secure Tenants | 1x | ||
Media decoders | Limited options | ||
Table 3. High accuracy benchmarks and their degree of precision
| BERT | BERT H_A | DLRM | DLRM H_A | 3D-Unet | 3D-Unet H_A |
Precision | int8 | fp16 | int8 | int8 | int8 | int8 |
DLRM H_A and 3D-Unet H_A is the same as DLRM and 3D-unet respectively. They were able to reach the target accuracy with int8 precision.
This section provides MIG performance results for various scenarios, showing that when divided into seven instances, each instance can provide equal performance without any loss in throughput.
Dell XR portfolio continues to provide a streamlined approach for various edge and telecom deployment options based on different use cases. It provides a solution to the challenge of small form factors at the edge with industry-standard rugged certifications (NEBS) that provide a compact solution for scalability, and flexibility in a temperature range of -5 to +55°C. The MIG capability for MLPerf workloads provides real-life scenarios for showcasing AI/Ml inferencing on multiple instances for edge use cases. Based on the results in this document, Dell servers continue to provide a complete solution.
Notes:
Fri, 03 Mar 2023 19:57:26 -0000
|Read Time: 0 minutes
Dell Technologies has recently announced the launch of next-generation Dell PowerEdge servers that deliver advanced performance and energy efficient design.
This Direct from Development (DfD) tech note describes the new capabilities you can expect from the next-generation Dell PowerEdge servers powered by Intel 4th Gen Intel® Xeon® Scalable processors MCC SKU stack. This document covers the test and results for ML performance of Dell’s next generation PowerEdge XR 7620 using the industry standard MLPerf Inference v2.1 benchmarking suite. XR7620 has target workloads in manufacturing, retail, defense, and telecom - all key workloads requiring AI/ML inferencing capabilities at the edge.
With up to 2x300W accelerator cards for GPUs to handle your most demanding edge workloads, XR7620 provides a 45% faster image classification workload as compared to the previous generation Dell XR 12 server with just one 300W GPU accelerator for the ML/AI scenarios at the enterprise edge. The combination of low latency and high processing power allows for faster and more efficient analysis of data, enabling organizations to make real-time decisions for more opportunities.
Edge computing, in a nutshell, brings computing power close to the source of the data. As the Internet of Things (IoT) endpoints and other devices generate more and more time-sensitive data, edge computing becomes increasingly important. Machine Learning (ML) and Artificial Intelligence (AI) applications are particularly suitable for edge computing deployments. The environmental conditions for edge computing are typically vastly different than those at centralized data centers. Edge computing sites might, at best, consist of little more than a telecommunications closet with minimal or no HVAC. Rugged, purpose-built, compact, and accelerated edge servers are therefore ideal for such deployments. The Dell PowerEdge XR7620 server checks all of those boxes. It is a high-performance, high-capacity server for the most demanding workloads, certified to operate in rugged, dusty environments ranging from -5C to 55C (23F to 131F), all within a short-depth 450mm (from ear-to-rack) form factor.
MLPerf is a multi-faceted benchmark suite that benchmarks different workload types and different processing scenarios. There are five workloads and three processing scenarios. The workloads are:
The scenarios are single-stream (SS), multi-stream (MS), and Offline.
The tasks are self-explanatory and are listed in the following table below, along with the dataset used, the ML model used, and descriptions. The single-stream tests reported results at the 90th percentile; multi-stream tests reported results at the 99th percentile.
Table 1. MLPerf Inference benchmark scenarios
Scenario | Performance metric | Example use cases |
Single-stream | 90% percentile latency | Google voice search: Waits until the query is asked and returns the search results. |
Offline | Measured throughput | Batch processing aka Offline processing. Google photos identifies pictures, tags people, and generates an album with specific people and locations/events Offline. |
Multi-stream | 99% percentile latency | Example 1: Multicamera monitoring and quick decisions. MultiStream is more like a CCTV backend system that processes multiple real-time streams on identifying suspicious behaviors. Example 2: Self driving cameras merge all multiple camera inputs and make drive decisions in real time. |
Table 2. MLPerf EdgeSuite for inferencing benchmarks
According to Forrester’s report (“Five technology elements make workload affinity possible across the four business scenarios”), most systems today are designed to run software in a single place. This creates performance limitations as conditions change, such as when more sensors are installed in a factory, as more people gather for an event, or as cameras receive more video feed. Workload affinity is the concept of using distributed applications to deploy software automatically where it runs best: in a data center, in the cloud, or across a growing set of connected assets. Innovative AI/ML, analytics, IoT, and container solutions enable new applications, deployment options, and software design strategies. In the future, systems will choose where to run software across a spectrum of possible locations, depending on the needs of the moment.
Table 3. Dell PowerEdge XR7620 key specifications
MLPerf system suite type | Edge |
Operating System | CentOS 8.2.2004 |
CPU | 4th Gen Intel® Xeon® Scalable processors MCC SKU |
Memory | 512GB |
GPU | NVIDIA A2 |
GPU Count | 1 |
Networking | 1x ConnectX-5 IB EDR 100Gb/Sec |
Software Stack | TensorRT 8.4.2 CUDA 11.6 cuDNN 8.4.1 Driver 510.73.08 DALI 0.31.0 |
Figure 1. Dell PowerEdge XR7620: 2U 2S
Table 4. NVIDIA GPUs Tested:
Brand | GPU | GPU memory | Max power consumption | Form factor | 2-way bridge | Recommended workloads |
PCIe Adapter Form Factor | ||||||
NVIDIA | A2 | 16 GB GDDR6 | 60W | SW, HHHL or FHHL | n/a | AI Inferencing, Edge, VDI |
NVIDIA | A30 | 24 GB HBM2 | 165W | DW, FHFL | Y | AI Inferencing, AI Training |
NVIDIA | A100 | 80 GB HBM2e | 300W | DW, FHFL | Y, Y | AI Training, HPC, AI Inferencing |
The edge server offloads the image processing to the GPU. And just as servers have different price/performance levels to suit different requirements, so do GPUs. XR7620 supports up to 2xDW 300W GPUs or 4xSW 150W GPUs, part of the constantly evolving scalability and flexibility offered by the Dell PowerEdge server portfolio. In comparison, the previous gen XR11 could support up to 2xSW GPUs.
Edge server vs data center server comparison[1]
When testing with NVIDIA A100 GPU for the Offline scenario, the Dell XR7620 delivered a performance with less than 1% difference, as compared to the prior generation Dell PowerEdge rack server. The XR7620 edge server with a depth of 430mm is capable of providing similar performance for an AI inferencing scenario as a rack server. See Figure 2.
Figure 2. Rack vs edge server MLPerf Offline performance
XR7620 performance with NVIDIA A2 GPU
XR7620 was also tested with NVIDIA A2 GPU for the entire range of MLPerf workloads in the Offline scenario. For the results, see Figure 3.
Figure 3. XR7620 Offline performance results
XR7620 was also tested with NVIDIA A2 GPU for the entire range of MLPerf workloads in the Single Stream scenario. See Figure 4.
Figure 4. XR7620 Single Stream Performance results
XR7620 was also tested with NVIDIA A30 GPU for the entire range of MLPerf workloads in the Offline Scenario. See Figure 5.
Figure 5. XR7620 Offline Performance results on A30 GPU
XR7620 was also tested with NVIDIA A30 GPU for the entire range of MLPerf workloads in the Single Scenario. See Figure 6.
Figure 6. XR7620 SS Performance results on A30 GPU
In some scenarios, next generation Dell PowerEdge servers showed improvement over previous generations, due to the integration of the latest technologies such as PCIe Gen 5.
Speech to text
The Dell XR7620 delivered better throughput by 16%, as compared to the prior generation Dell server. See Figure 7
Figure 7. Offline Speech to Text performance improvement on XR7620
Image Classification
The Dell XR7620 delivered better latency by 45%, as compared to the prior generation Dell server. See Figure 8.
Figure 8. SS Image Classification performance improvement on XR7620
The Dell XR portfolio continues to provide a streamlined approach for various edge and telecom deployment options based on various use cases. It provides a solution to the challenge of small form factors at the edge with industry-standard rugged certifications (NEBS), with a compact solution for scalability and flexibility in a temperature range of -5 to +55°C. The MLPerf results provide a real-life scenario on edge inferencing for servers on AI inferencing. Based on the results in this document, Dell servers continue to provide a complete solution.
Notes:
[1] Based on testing conducted in Dell Cloud and Emerging Technology lab, January 2023.
Fri, 03 Mar 2023 19:57:26 -0000
|Read Time: 0 minutes
The Dell PowerEdge XR8000 is a compact multi-node server designed for the edge and telecom. This DfD describes the unique form factor with chassis and sleds for the deployment of the XR8000.
The Dell PowerEdge XR8000 is a rugged multi-node edge server with Intel 4th Gen Intel® Xeon® Scalable processors MCC SKU stack. This short-depth sled-based server, purpose-built for telco at the edge, is configurable, environmentally agile, and RAN optimized. It is optimized to operate in Class 1 (-5C to 55C) environments and –20C to 65C for select configurations, provides a short depth of 430 mm from the front I/O wall to the rear wall, and is front-accessible.
Available in a unique sled-based chassis form factor, the 2U chassis supports 1U and 2U half-width sleds. It is an open and reusable chassis as opposed to fixed monolithic chassis. The entire sled can be replaced without removing the chassis and power, which simplifies serviceability and maintenance. Customers who need additional storage or PCIe expansion can choose the 2U sled, with options for compute, accelerators, or GPUs.
Each sled includes iDRAC for management, a CPU, memory, storage, networking, PCIe expansion (2U sled), and cooling.
XR8000 offers a reverse airflow design for use in front accessed chassis configurations. It provides a front-accessible, multi-node sled-based rackable chassis (430mm depth). It offers dual 60 mm PSUs for reverse airflow, with the following options:
Assuming redundant PSUs for each server, there would be between four and eight PSUs for equivalent compute capacity, and between four and eight additional power cables. This consolidation of PSUs and cables not only reduces the cost of the installation (due to fewer PSUs), it also reduces the cabling, clutter, and Power Distribution Unit (PDU) ports used in the installation.
The compute sleds offer common features such as:
The XR8000 provides two sled options:
Figure 1. XR8610t
Figure 2. XR8620t
These slots can support GPUs*, SFP, DPUs, SoC Accelerators, and other NIC Options.
*More details will be available at RTS, planned for May 2023.
Various configurations are available:
1. 1X4U - This option includes 4x1U compute sleds and PSUs:
2. 2x1U + 1x2U - This option includes 2x1U and 1x2U compute sleds and PSUs:
3. 2x2U - This option includes 2x2U compute sleds and PSUs:
The PowerEdge XR8000 offers various form factor options based on different workloads:
You can create any of these compute node configurations to support a broad range of workloads in one chassis.
The XR8000 offers a front servicing (cold aisle) chassis, which allows it to be deployed with all cables connected to the front. This simplifies cable management and allows the server to be installed in areas where space is limited and access to the front and back of the chassis is not possible. Also, the sleds are designed to be easily field replaceable by non-IT personnel. Whether it is located at the top of the roof or in any other difficult environment, XR8000 has a dense form factor with a Class 1 temperature range (-5 to +55°C) with some configurations reaching –20C to 65C and is tested for NEBS Level 3 compliance.
The XR8000 multi-node server enables IT administrators to deploy compact, redundant server solutions. For example, two sleds can be configured identically and installed in the same chassis. One acts as the primary, and the other is the secondary, or backup. If the primary server goes down, the secondary server steps in to minimize or eliminate downtime. This redundant server configuration is also a great way for administrators to manage software updates seamlessly. For example, administrators can deploy the secondary server while performing maintenance, updates, or development work on the primary server.
XR8000 with its unique form factor and multiple deployment options provides flexibility to start with a single node and scales up to four independent nodes as needed. Depending on the needs of various workloads, deployment options can change.
The same sleds can work in either the flexible or rack mount chassis based on space constraints or user requirements.
XR8000 provides a streamlined approach for various edge and telecom deployment options based on different use cases. It provides a solution to the challenge of deploying small form factors at the edge, with industry-standard rugged tests (NEBS), providing a compact solution for scalability and flexibility in a temperature range of -20 to +65°C for select configurations.
Fri, 03 Mar 2023 19:57:27 -0000
|Read Time: 0 minutes
The next generation of PowerEdge servers is engineered to accelerate insights by enabling the latest technologies. These technologies include next-gen CPUs bringing support for DDR5 and PCIe Gen 5 and PowerEdge servers that support a wide range of enterprise-class GPUs. Over 75% of next generation Dell PowerEdge servers offer support for GPU acceleration.
For the digital enterprise, success hinges on leveraging big, fast data. But as data sets grow, traditional data centers are starting to hit performance and scale limitations — especially when ingesting and querying real-time data sources. While some have long taken advantage of accelerators for speeding visualization, modeling, and simulation, today, more mainstream applications than ever before can leverage accelerators to boost insight and innovation. Accelerators such as graphics processing units (GPUs) complement and accelerate CPUs, using parallel processing to crunch large volumes of data faster. Accelerated data centers can also deliver better economics, providing breakthrough performance with fewer servers, resulting in faster insights and lower costs. Organizations in multiple industries are adopting server accelerators to outpace the competition — honing product and service offerings with data-gleaned insights, enhancing productivity with better application performance, optimizing operations with fast and powerful analytics, and shortening time to market by doing it all faster than ever before. Dell Technologies offers a choice of server accelerators in Dell PowerEdge servers so you can turbo-charge your applications.
Our world-class engineering team designs PowerEdge servers with the latest technologies for ultimate performance. Here’s how.
PowerEdge ensures no-compromise system performance through innovative cooling solutions while offering customers options that fit their facility or usage model.
Dell Technologies and AMD have established a solid partnership to help organizations accelerate their AI initiatives. Together our technologies provide the foundation for successful AI solutions that drive the development of advanced DL software frameworks. These technologies also deliver massively parallel computing in the form of AMD Graphic Processing Units (GPUs) for parallel model training and scale-out file systems to support the concurrency, performance and capacity requirements of unstructured image and video data sets. With AMD ROCm open software platform built for flexibility and performance, the HPC and AI communities can gain access to open compute languages, compilers, libraries, and tools designed to accelerate code development and solve the toughest challenges in the world today.
Dell Technologies and Intel are giving customers new choices in enterprise-class GPUs. The Intel Data Center GPUs are available with our next generation of PowerEdge servers. These GPUs are designed to accelerate AI inferencing, VDI, and model training workloads. And with toolsets like Intel® oneAPI and OpenVINOTM, developers have the tools they need to develop new AI applications and migrate existing applications to run optimally on Intel GPUs.
Dell Technologies solutions designed with NVIDIA hardware and software enable customers to deploy high-performance deep learning and AI-capable enterprise-class servers from the edge to the data center. This relationship allows Dell to offer Ready Solutions for AI and built-to-order PowerEdge servers with your choice of NVIDIA GPUs. With Dell Ready Solutions for AI, organizations can rely on a Dell-designed and validated set of best-of-breed technologies for software – including AI frameworks and libraries – with compute, networking, and storage. With NVIDIA CUDA, developers can accelerate computing applications by harnessing the power of the GPUs. Applications and operations (such as matrix multiplication) that are typically run serially in CPUs can run on thousands of GPU cores in parallel.
Turbo-charge your applications with performance accelerators available in select Dell PowerEdge tower and rack servers. The number and type of accelerators that fit in PowerEdge servers are based on the physical dimensions of the PCIe adapter cards and the GPU form factor.
Brand | GPU Model | GPU Memory | Max Power Consumption | Form Factor | 2-way Bridge | Recommended Workloads | |
PCIe Adapter Form Factor | |||||||
NVIDIA | A2 | 16 GB GDDR6 | 60W | SW, HHHL or FHHL | n/a | AI Inferencing, Edge, VDI | |
NVIDIA | A16 | 64 GB GDDR6 | 250W | DW, FHFL | n/a | VDI | |
NVIDIA | A40, L40 | 48 GB GDDR6 | 300W | DW, FHFL | Y, N | Performance graphics, Multi-workload | |
NVIDIA | A30 | 24 GB HBM2 | 165W | DW, FHFL | Y | AI Inferencing, AI Training | |
NVIDIA | A100 | 80 GB HBM2e | 300W | DW, FHFL | Y, Y | AI Training, HPC, AI Inferencing | |
NVIDIA | H100 | 80GB HBM2e | 300 - 350W | DW, FHFL | Y | AI Training, HPC, AI Inferencing | |
AMD | MI210 | 64 GB HBM2e | 300W | DW, FHFL | Y | HPC, AI Training | |
Intel | Max 1100* | 48GB HBM2e | 300W | DW, FHFL | Y | HPC, AI Training | |
Intel | Flex 140* | 12GB GDDR6 | 75W | SW, HHHL or FHHL | n/a | AI Inferencing | |
SXM / OAM Form Factor | |||||||
NVIDIA | HGX A100* | 80GB HBM2 | 500W | SXM w/ NVLink | n/a | AI Training, HPC | |
NVIDIA | HGX H100* | 80GB HBM3 | 700W | SXM w/ NVLink | n/a | AI Training, HPC | |
Intel | Max 1550 * | 128GB HBM2e | 600W | OAM w/ XeLink | n/a | AI Training, HPC | |
* Development or under evaluation |
Mon, 16 Jan 2023 19:49:21 -0000
|Read Time: 0 minutes
Using the PowerEdge R750xa, the Dell HPC & AI Innovation Lab compared performance of the new NVIDIA H100 PCIe 310 W GPU to the previous- generation NVIDIA A100 PCIe GPU, using the supercomputer benchmark HPL. Results showed:
The Dell PowerEdge R750xa, powered by 3rd Gen Intel Xeon Scalable processors, is a dual-socket/2U rack server that delivers outstanding performance for the most demanding emerging and intensive GPU workloads. It supports 8 channels per CPU and up to 32 DDR4 DIMMs with speeds up to 3200 MT/s. In addition, the PowerEdge R750xa supports PCIe Gen 4 and up to 8 SAS/SATA SSDs or NVMe drives. The PowerEdge R750xa, the one PowerEdge portfolio platform that supports all the PCIe GPUs, is the ideal server for virtualization environments and workloads such as high performance computing and AI-ML/DL training and inferencing. The PowerEdge R750xa includes all the core benefits of PowerEdge: serviceability, consistent systems management with iDRAC, and the latest in extreme acceleration.
The new NVIDIA H100 PCIe GPU is optimal for delivering the fastest business outcomes with the latest accelerated servers in the Dell PowerEdge portfolio, starting with the R750xa. The PowerEdge R750xa boosts workloads to new performance heights with GPU and accelerator support for demanding workloads, including enterprise AI. With its enhanced, air-cooled design and support for up to four NVIDIA double-width GPUs, the PowerEdge R750xa server is purpose-built for optimal performance for the entire spectrum of HPC, AI-ML/DL training, and inferencing workloads.
Next-generation GPU performance analysis
The Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA H100 PCIe 310 W GPU to the last Gen A100 PCIe GPU in the Dell PowerEdge R750xa. The team used HPL, a popular computing benchmark often used to evaluate the performance of supercomputers on the TOP500 list. This comparison included HPL performance and server power consumption throughout the benchmark. Here are the results:
The performance per watt calculation is the HPL benchmark score divided by the average server power over the duration of the HPL benchmark. The PowerEdge R750xa with the NVIDIA H100 PCIe GPUs delivered a 66% increase in performance/watt compared to the PowerEdge R750xa with the NVIDIA A100 PCIe GPUs, as shown in the following figure.
PowerEdge R750xa - HPL Benchmark and Server Power
Figure 1. Performance/watt comparison
Figure 2 shows the raw HPL performance of each configuration. The PowerEdge R750xa with four NVIDIA H100 PCIe GPUs achieved a 67% increase in TFLOPS compared to the configuration with four NVIDIA A100 PCIe GPUs.
Figure 2. Raw performance comparison
Server power
Figure 3 shows the server power over the duration of the HPL benchmark. The NVIDIA H100 PCIe GPU configuration delivered better performance with slightly lower server power and finished the workload faster.
Figure 3. HPL server power
The following table shows the two test configurations.
Table 1. R750xa test configurations
| R750xa with four NVIDIA H100 | R750xa with four NVIDIA A100 |
Server | PowerEdge R750xa | |
CPU | 2 x Intel Xeon Gold 6338 CPU | |
Memory | 512 GB system memory | |
Storage | 1 x 3.5T SSD | |
BIOS/iDRAC | 1.9.0/6.0.0.0 | |
HPL version | HPL for H100 (Alpha version, results subject to change) | |
Operating system | Ubuntu 20.04 LTS | |
GPU | NVIDIA H100-PCIe-80GB (310 W) | NVIDIA A100-PCIe-80GB (300 W) |
Driver | CUDA 11.8 | CUDA 11.8 |
Using the PowerEdge R750xa, the Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA H100 PCIe 310 W GPU to the previous-generation NVIDIA A100 PCIe GPU. HPL benchmark results showed a 66 percent increase in performance/watt and a 67 percent increase in TFLOPS.
The PowerEdge R750xa supports up to four NVIDIA H100 PCIe GPUs and is available with new orders or as a customer upgrade kit for existing deployments. To learn more, reach out to your account executive or visit www.dell.com.
Mon, 16 Jan 2023 19:39:35 -0000
|Read Time: 0 minutes
The Nano Processing Unit is an optional sled supported by the Dell XR4000 multi-node server. While its design aligns perfectly with the technical requirements of a VMware vSAN witness host, it can be used for many interesting edge use cases.
Dell Technologies is committed to delivering best-in-class edge servers. The latest member of the Dell PowerEdge XR edge server series is the PowerEdge XR4000 featuring the next-generation Intel Xeon D processor. This short-depth multi-node server is available in two different chassis form factors: rackable and flexible. The rackable chassis supports up to four Xeon D sleds, and the flexible supports up to two. Additionally, each chassis supports an optional low-power server called the Dell Nano Processing Unit, or NPU, discussed here.
The NPU is an x86 sled built with the Intel Atom Processor C Series. Designed for the edge, the NPU includes industrial-grade components capable of reliable operation in an extended temperature range. It is installed adjacent to the Intel Xeon D sleds in the PowerEdge XR4000 chassis and includes independent memory, networking, and storage. Besides providing power, the NPU is a self-contained server delivering up to a total of five independent sleds in the rackable chassis or three in the flexible chassis.
Figure 1. The XR4000 is multi-node. Each sled includes CPU, memory, storage, networking, and fans.
From the factory, Dell offers SUSE Linux Enterprise Server (SLES) or VMware ESXi on the NPU. We validated both operating systems, giving our customers the flexibility to use this unique server as a VMware vSAN witness host or put it to work in other exciting edge workloads, which we will discuss later. In addition, each NPU includes a unique Dell Service Tag and a customer-programmable field asset tag for customized asset tracking. The NPU does not feature Dell iDRAC.
The following table provides technical specifications of the NPU:
Feature | Technical specifications |
Processor | Intel Atom C3508 |
Memory | 16 GB ECC DDR4 1866 |
Storage | 1 x 960 GB M.2 NVMe |
Embedded management | N/A |
Embedded NIC | Intel i210 (2 ports) |
Ports | 1 x USB 3.0/2.0, 1 x Serial (micro-USB), 2 x 1 GbE RJ45, headless |
Operating systems | ESXi, Linux |
Operating temperature | 0–55°C |
The XR4000's unique NPU can serve a wide range of edge-computing use cases. Here are a few examples.
A two-node vSAN or vSAN stretched cluster configuration requires a witness host to act as a tie-breaker when a fault occurs. In a two-node vSAN, a fault could be a single node's power loss or hardware failure. In a stretched cluster, it might be the loss of an entire site due to a natural disaster. In either case, the witness host determines which node contains the valid data after the fault is resolved and nodes return to the cluster. The XR4000 NPU meets the requirements of a hardware vSAN witness host. It is installed in the same chassis as the compute nodes, enabling a compact vSAN cluster that can be deployed almost anywhere.
Equipment deployed in a telephone network's central office, a manufacturing facility, or in a retail backroom might be more exposed to the effects of natural disasters or extreme temperatures. For example, a remote site might experience an extended power outage due to a natural disaster. During this
time, a site battery backup can keep some of the infrastructure running for a short period; however, high-power equipment can quickly consume the battery or fuel capacity. The low-power NPU can help preserve precious battery power until power returns when used as a site manager to monitor environmental sensors and security access, view camera feeds, and gracefully shut down high-power equipment to preserve power. Once site power returns, the NPU can remotely restore the site to full functionality by gracefully managing the power-on of connected site equipment, negating the need to send out a technician.
Isolating a private network from the Internet increases security and reduces the number of potential attack vectors. Isolated networks improve network security but present challenges for IT administrators who access and manage them remotely. One solution is to use a "jump box" or "bastion host" that acts as a secure bridge between the Internet and a private network. This single, secure bridge can be hardened, monitored, and regularly audited to ensure only authorized users access the private network. IT administrators can configure the NPU as a secure bridge between the Internet and a private network.
Telemetry management host
Monitoring the health and performance metrics of servers, systems, and services operating at edge locations is critical. IT administrators use monitoring systems such as Prometheus to monitor, detect, and alert when collected metrics indicate potential issues in their fleet. They also use tools such as Grafana to visualize the data in easy-to-consume charts and graphs. The NPU's hardware specifications meet the hardware requirements of monitoring systems such as Grafana and Prometheus, and the NPU can serve as an out-of-band server running these tools.
Out-of-band management
Managing a fleet of servers and IT equipment is challenging. Administrators must manage the health and performance of equipment deployed across multiple sites or at remote locations. So, it might not always be cost-effective or feasible to send out a technician to resolve an issue, update or provision equipment, or check the status of a site. In these cases, having an out-of-band server like the NPU gives administrators the ability to remotely troubleshoot, deploy firmware updates, and manage devices such as intelligent PDUs and USB devices. When troubleshooting, administrators can use the out-of-band NPU server to power-cycle faulty devices connected to intelligent PDUs and collect debug logs from other devices; when provisioning, they can use it as a PXE server. Additionally, administrators can automate troubleshooting and provisioning functions, and the NPU can run those scripts.
Conclusion
The Nano Processing Unit is a unique and versatile computing server. Its edge-optimized design has industrial-grade components, a low-power processor, and more-than-capable memory, networking, and storage capacity. These features make it an excellent addition for customers looking to get the most out of their XR4000 server.
References
VMware Virtual Blocks Blog: Shared Witness for 2-Node vSAN Deployments
Mon, 16 Jan 2023 19:31:49 -0000
|Read Time: 0 minutes
The Dell XR4000 is a compact multi-node server designed for the edge. This document discusses the XR4000’s unique form factors and sled options.
The Dell PowerEdge XR4000 is a rugged multi-node edge server with Intel’s next- generation Xeon D processor, making it a perfect fit for edge deployments. Available in two unique and flexible form factors, the “rackable” chassis supports up to four 1U sleds, and the “stackable” chassis supports up to two. Customers who need additional storage or PCIe expansion can choose a 2U sled option.
In addition, the XR4000 supports an optional witness node for single-chassis VMware vSAN cluster deployments. Each sled includes iDRAC for management, a CPU, memory, storage, networking, PCIe expansion (2U sled), and cooling.
Compute sleds
The compute sleds offer common features such as power and management connectors to the chassis backplane, pull handles and mechanical locks (for example, spring clips) for attachment to the chassis, side rails to aid insertion and stability in the chassis, and ventilation holes and baffles as appropriate for cooling.
Figure 1. 1U compute sled interior
The XR4000 offers 1U and 2U sleds. The 1U sled is provided for dense compute requirements. The 2U chassis shares the same “1st U” and common motherboard with the 1U sled but includes an additional riser to provide two more PCIe Gen4 FHFL I/O slots.
Figure 2. 1U compute sled
The 1U sled meets dense compute requirements, with storage up to 4 x M.2 drives (from 480 GB up to 3.84 TB each) and up to 2 x M.2 NVMe BOSS N1 ET. The memory can scale up to 512 GB total with 4 x memory slots. It also includes a LAN on motherboard (LOM) option with 4 x SFP from CPU.
Figure 3. 2U compute sled interior
The 2U compute sled builds upon the common first 1U of the 1U sled with additional 2 x 16 FHFL PCIe 4.0 lots, with a combined power capacity of 250 W. These slots can support GPUs, such Nvidia A2/A30s, SFP, DPUs, SoC accelerators, and other NIC options. The additional storage option supports optional 8 x M.2 storage drives (4 x per x16 slot) and 12 x
M.2 total (not including BOSS).
Each chassis also supports an optional low-power server called the Dell Nano Processing Unit or NPU. The NPU is an x86 sled built with Intel's Atom Processor C Series. Designed for the edge, the NPU includes industrial-grade components capable of reliable operation in an extended temperature range. For more information, see Dell PowerEdge XR4000: Nano Processing Unit.
The two chassis types share the components of common 100 to 240 VAC power supplies (PSUs), up to two per chassis, and an optional embedded controller card called the Nano Server.
Both chassis types optionally include a lockable bezel to prevent unwanted access to the Sleds and PSUs, with intelligent filter monitoring that creates a system alert when the filter needs to be changed.
XR4000 is offered in two options for chassis:
Figure 4. 2U rackmount chassis
The “rackable” chassis is a 2U, 14-inch (355 mm) deep, 19-inch-wide chassis, with mounting ears to support a standard 19-inch-wide rack. The rackable chassis supports both front-to-back and back-to-front airflow and the following combination of 1U and 2U compute sleds:
The “stackable” chassis is also 2U, 14 inches (355 mm) deep, but is only 10.5 inches wide and is typically deployed in desktop, VESA plates, DIN rails, or stacked environments. The stackable chassis also supports both front-to-back and back-to-front airflow.
Figure 5. 2U stackable chassis
The XR4000 offers a front servicing (cold aisle) chassis option, which allows it to be deployed with all cables connected to the front. This option simplifies cable management and allows the server to be installed in areas where space is limited and access to the front and back of the chassis is not possible. Also, the sleds are designed to be easily field replaceable by non-IT personal.
Redundancy
The XR4000 multi-node server gives IT administrators the ability to deploy compact, redundant server solutions. For example, two sleds can be configured identically and installed in the same chassis. One acts as the primary, and the other is the secondary, or backup. If the primary server goes down, the secondary server steps in to minimize or eliminate downtime. This redundant server configuration is also a great way for administrators to seamlessly manage software updates. For example, administrators can deploy the secondary server while performing maintenance, updates, or development work on the primary server.
Scaling
The XR4000 server, with its unique form factor and multiple deployment options, provides flexibility to start with a single node and scale up to four independent nodes as needed. Depending on the requirements of various workloads, deployment options can change; for example, a user can add a 2U GPU-capable sled. The same sleds can work in either the flexible or rackmount chassis based on space constraints or user requirements.
Conclusion
PowerEdge XR4000 offers a streamlined approach for various edge deployment options based on different edge use cases. Addressing the need for a small form factor at the edge with industry-standard rugged certifications (NEBS and MIL-STD), the XR4000 ultimately provides a compact solution for improved edge performance, low power, reduced redundancy, and improved TCO.
Mon, 16 Jan 2023 19:17:52 -0000
|Read Time: 0 minutes
Dell PowerEdge XR4000 is a compact multi-node server designed for the edge. This Tech Note discusses the Intel Xeon D processor, which powers the XR4000 server. These CPUs are unique, being primarily designed for edge deployments. New integrated technology that enables faster performance over the previous generation helps create a solution for designs with space and power constraints while also lowering TCO.
Dell PowerEdge XR4000 is the latest addition to Dell Technologies’ portfolio of rugged PowerEdge servers. It is Dell’s shortest-depth edge server, with a unique sled and chassis form factor, withstanding an extended temperature range of –5°C to 55°C. The XR4000 provides a sustainable solution for customers to deploy various edge workloads in challenging environments.
The brain of this server is the Intel Xeon D CPU, which features a one-package design with integrated AI, security, advanced I/O, and Ethernet, plus dense compute, to deliver high data throughput and address key edge requirements. To broaden the range of usage models, the Xeon system-on-a-chip (SoC) is available in two distinct packages: the high-core-count Xeon D-2700 processor, optimized for performance, and the Xeon D-1700 processor, which is optimized for cost and power
consumption. With options ranging from 4 to 20 cores, the Xeon D-2700 processor is suited to demanding workloads, such as handling high data-plane throughput, making it more suitable for edge deployments. Extended operating temperature ranges and industrial-class reliability make Xeon D-1700 and D-2700 SoCs ideal for high- performance rugged equipment.
Dell PowerEdge XR4000 is based on the Xeon D-2700 SoC. This SoC is an HCC 52.5 x 45 mm package, supporting up to 20 cores and using Intel’s Sunny Cove cores to boost performance for edge use cases. The Xeon D-2700 offers a CPU performance gain of up to 2.97 times and improved AI inferencing that is 7.4 times faster than its previous-generation Xeon D-1577 processor.
Memory speeds have increased by 20 percent, jumping from 2,666 MT/s to 3,200 MT/s. Also, the maximum memory capacity for HCC Xeon D-2700 SKUs is now up to 1,024 GB (with LRDIMM)—two times as much as most Xeon D-2100 SKUs (code named Skylake-D). The increased memory speed and capacity significantly reduce data transfer times for memory-intensive workloads for the edge, such as manufacturing and retail applications, and AI/ML-based applications.
PCIe throughput has also improved by a factor of 2, with support for up to 32 lanes of PCIe Gen 4.0. Throughput speed is 16 GT/s for PCIe Gen 4.0 compared with 8 GT/s for PCIe Gen 3.0. The increased bandwidth of PCIe Gen 4.0 improves the efficiency of workloads such as AI/ML and of edge computing by providing high transfer speeds, while also reducing latency.
Ethernet connectivity
The Xeon D-2700 HCC SoC provides great strides for Ethernet when compared to the previous generation. It increases Ethernet connectivity by 400 percent, providing networking up to 100 GbE with a variety of port options with up to eight ports at 25 Gbps, 10 Gbps, or 1 Gbps with RDMA (iWARP and RoCEv2). Ethernet processing throughput is up by 150 percent with 50 Gbps and 100 Gbps throughput options.
The Xeon D-2700 and D-1700 processors integrate the following hardware technologies to accelerate workloads:
Intel QAT v1.8 accelerates crypto SSL up to 100 Gbps and compression up to 70 Gbps, which offers better Integrated cryptographic and AI acceleration compared to the previous generation. It includes new instructions to accelerate AI/deep learning workloads. (See Intel QAT: Performance, Scale, and Efficiency.)
Intel hardware-based security
Intel Xeon D processors offer integrated security features including Intel Total Memory Encryption, which provides full memory encryption with segmentation for up to 64 tenant-provided keys. The processors support Intel Software Guard Extensions (Intel SGX), which provides fine-grain data protection through application isolation in memory. This protection could be crucial for data exchange between the cloud and edge. Xeon D processors also support Intel Secure Hash Algorithm Extensions with integrated accelerators for SHA cryptographic algorithms and Intel Platform Firmware Resilience (Intel PFR), which uses an Intel field programmable gate array (FPGA) to protect, detect, and correct platform firmware.
Conclusion
The Intel Xeon D processor is a cost-effective offering that is built specifically for edge deployments. It allows users to tailor their solutions to the level of compute and performance they need while allowing for edge implementation-specific space and power constraints. The Xeon D processor and the PowerEdge XR4000 server help customers deploy solutions with lower TCO and a low-power budget in a rugged environment. The solutions are well suited for various edge workloads in retail, manufacturing, and defense.
Mon, 16 Jan 2023 19:12:54 -0000
|Read Time: 0 minutes
The Dell XR4000 is a compact multi-node server designed for the edge and integrated with remote management system iDRAC9. This DfD discusses enhancements made for XR4000 and how iDRAC9 is essential for the edge.
The concept of edge computing has been constantly growing over the last few years. Edge computing is exactly as the name suggests: bringing the processing and computing of a data center to the edge and reducing latency to a minimum. Dell Technologies offers a variety of options from the hardware side for the edge, including the XR portfolio built with a unique sled-chassis small form factor, reliable for the rugged environment, with high performance and low latency requirements. Dell also offers software to help make the process of edge computing the smoothest experience for a user.
With the development of edge use cases, interoperability is one of the main requirements. Different software programs should be able to work with different hardware to optimize performance. Remote management is a must, keeping in mind the challenging environment for Rugged servers so that the administrators can remediate problems without physically visiting the server.
iDRAC9 is designed specifically to enable this portability, allowing server administrators to be more productive while optimizing the performance of Dell PowerEdge servers in the network. iDRAC9 is embedded management, built into Dell PowerEdge servers for monitoring, updating, and troubleshooting. It simplifies and automates the server’s lifecycle. PowerEdge XR4000 edge server is integrated with the latest version iDRAC9.
In the following sections, we describe some special features supported in XR4000, using iDRAC for customer ease at the edge, along with deployment, updates, service, and troubleshooting.
Deployment
According to a study conducted by Principled Technologies in 2020, “iDRAC9 reduce hands-on deployment times to near zero with iDRAC9 automation”, because iDRAC9 is part of every PowerEdge server, there is no additional software to install. In a few simple steps, iDRAC9 can be configured and ready to use. Even before installing an Operating System, IT admins have complete set of server management features like configuration, Firmware updates, OS deployment and more. Operating Systems can be deployed remotely via Remote File Share or Virtual Media console.
Automation
iDRAC9 offers agent-free operation to put IT admins in full control. When a PowerEdge server is connected to power and networking, it can be monitored and fully managed, whether standing in front of the server or remotely over a network. In fact, with no need for software agents, an IT administrator can monitor, manage, update, troubleshoot, and remediate Dell servers.
With features like Zero-Touch deployment and provisioning, Connection View, and System Lockdown, iDRAC9 is purpose-built to make server administration quick and easy by enabling the seamless automation of the entire server management lifecycle.
Powerful APIs
iDRAC9 offers support for DMTF Redfish. Redfish is a next-generation systems management interface standard that enables scalable, secure, and open server management. Redfish is an interface that uses RESTful interface semantics to access data that is defined in model format to perform out-of-band systems management. With iDRAC9, server administrators can easily monitor, customize, and optimize PCIe airflow and temperature, exhaust control, Delta T control, and overall airflow consumption remotely. In addition, iDRAC9 allows server administrators to pre-define power and cooling settings easily, as part of the server configuration profile. For more information, see iDRAC9 Redfish API Guide Firmware.
Security
iDRAC9 offers industry-leading security features that adhere to and are certified against well-known NIST standards, Common Criteria, and FIPS-140-2. For more information about iDRAC's certifications and standards, see the white paper Managing Web Server Certificates on iDRAC.
iDRAC9 uses a modern, secure, HMTL5-based GUI as a virtual console. The iDRAC9 web server uses a TLS/SSL certificate to establish and maintain secure communications with remote clients. Web browsers and command-line utilities, such as RACADM and WS-Man, use this TLS/SSL certificate for server authentication and establishing an encrypted connection. iDRAC9 now supports TLS 1.3.
iDRAC9 on XR4000 enables dust monitoring of the bezel. If the bezel is in a dusty environment, iDRAC9 helps the administrator know when to clear or change the bezel to ensure the smooth functioning of the server. iDRAC9 is also able to gather information from a witness sled and PSUs, and then send it to each existing node in the chassis at deployment time and when any runtime changes occur. Ultimately, iDRAC can gather the overall health status for Chassis Manager, and display it using the bezel LED.
iDRAC9 is crucial to PowerEdge servers and will continue to be an integrated part of the entire XR series. It will also continue to address issues faced by administrators in the edge environment. We offer various licensing methods to provide what is best suited to your requirements.
To learn about iDRAC9, see the article Support for Integrated Dell Remote Access Controller 9 (iDRAC9).
Mon, 16 Jan 2023 19:04:39 -0000
|Read Time: 0 minutes
Dell Technologies has recently announced PowerEdge XR4000: an industry certified, multi-node, 2U short-depth rugged OEM- ready server with rack or wall mountable options. The XR4000 is optimized for edge use cases, including retail, manufacturing, and telecom. This Direct from Development (DfD) demonstrates VM deployment capability for virtualized environments using VMmark, a benchmark that measures the performance and scalability of virtualization platforms.
The new Dell PowerEdge XR4000 is a 2U server with an innovative sled-based design. Dell Technologies’ shortest depth server to date is purpose-built for the edge, delivering high-performance compute and ultimate deployment flexibility in two new chassis form factors. The chassis consists of two 14”-depth form factors, referred to as “rackable” and “stackable.” XR4000 comes with an optional nano server sled that can provide an in-chassis witness node for the vSAN cluster. Replacing the need for a virtual witness node, the Nano server can function as an in-chassis witness node, allowing for a native, self-contained 2-node vSAN cluster in even the 14” x 12” stackable server chassis. This allows for VM deployments where the option was previously unavailable, due to latency or bandwidth constraints.
This document describes the VMmark 3.1.1 benchmark that was used to test the outstanding performance delivered by Dell PowerEdge servers, powered by Intel® Xeon® D processors.
Overview
The first version of VMmark was launched in 2007 as a single-host benchmark when organizations were in their infancy in terms of their virtualization maturity. VMmark 3.1.1, released in 2020, is the current release of the benchmark.
VMmark uses a unique tile-based implementation in which each “tile” consists of a collection of virtual machines running a set of diverse workloads. This tile-based approach is common across all versions of the VMmark benchmark. Since the initial release of VMmark, virtualization has become the norm for applications, and these applications have evolved. The workloads that are run in the VMmark tiles have also evolved to provide the closest to real-world metrics for users to assess their virtual environments.
Figure 1. A Web-Scale Multi-Server Virtualization Platform Benchmark
Power Measurement
Power and cooling expenses are a substantial — and increasing — part of the cost of running a data center. Environmental considerations are also a growing factor in data center design and selection. To address these issues, VMmark enables optional power measurement in addition to performance measurements. VMmark 3.1.1 benchmark results can be any of three types:
VMmark results with power measurement allow hardware purchasers to see not just absolute performance, but also absolute power consumption and performance per kilowatt. This makes it possible to consider both capital expenses and operating expenses when selecting new data center components.
This solution includes the following components:
Component | Details |
SUTs | 4 x Dell XR4510c servers |
Clients | 2 x Dell PowerEdge R740xd |
Storage | vSAN used for all Workload VMs iSCSI SAN used for Infrastructure Operations |
Network | Dell Z9432F-ON switch Intel® E823-C 25G 4P LOM |
OS | Dell Customized Image of VMware ESXi 7.0U3 A08, Build# 20328353 |
The metrics of the application workloads within each tile are computed and aggregated into a score for that tile. This aggregation is performed by first normalizing the different performance metrics (such as actions/minute and operations/minute) for a reference platform. Then, a geometric mean of the normalized scores is computed as the final score for the tile. The resulting per-tile scores are then summed to create the application workload portion of the final metric. The metrics for the infrastructure workloads are aggregated separately. The final benchmark score is computed as a weighted average: 80 percent to the application workload component and 20 percent to the infrastructure workload component.
When power is to be measured using the PTDaemon, either for server only or for both server and storage, the VMmark harness starts the PTDdaemon, which initiates a connection between the PTD client (or clients) specified in the VMmark
3.1.1 properties file and the power meter (or meters) they are configured to monitor. Once the required connections are established and the benchmark run is underway, the harness captures each of the power meter (or meters) results into a single unified data stream. This data, like that from other VMmark workloads, is broken up into various sections (ramp up, three 40-minute steady-state phases, and ramp down). The reported VMmark .3.1.1 power consumption is the total average watts consumed during the steady-state phase of the benchmark run that resulted in the median score; the total average watts being the sum of the average watts reported by each power meter used in the run. The final VMmark Performance Per Kilowatt (PPKW) score is the VMmark 3.1.1 score divided by the average power consumption in kilowatts. The below results are based on the performance testing conducted in Dell Solution Performance Analytics (SPA) Lab on 9/30/2022.
The published result met all QoS thresholds and is compliant with VMmark 3.1.1 run and reporting rules. The following table shows the scores of the submitted test results.
vMotion (number of operations per hour) | 57.00 |
SVMotion (number of operations per hour) | 44.00 |
XVMotion (number of operations per hour) | 34.00 |
Deploy (number of operations per hour) | 17.00 |
Unreviewed_VMmark3_Applications_Score | 4.93 |
Unreviewed_VMmark3_Infrastructure_Score | 2.15 |
Unreviewed_VMmark3_Avg_Watts | 1085.50 |
Unreviewed_VMmark3_Score | 4.37 @ 4 Tiles |
Unreviewed_VMmark3_PPKW | 4.0285 @ 4 Tiles |
Virtualization is imminent for an edge application. Without virtualization, it is very difficult to fully utilize the power of a modern server. In a virtualized environment, a software layer lets users create multiple independent VMs on a single physical server, taking full advantage of the hardware resources. A single-socket Dell PowerEdge XR4000 server equipped with the Intel Xeon D-2776NT has VMmark Power Performance Score of 4.0285 @ 4 Tiles[1] and a VMmark Score of 4.37 @ 4tiles. This is representative of different virtualization workloads that can run optimally maintaining the constraints of latency important for the edge with a strong level of performance, making it an excellent choice for edge customers who want to take advantage of the benefits that vitalization has to offer.
Mon, 16 Jan 2023 19:50:52 -0000
|Read Time: 0 minutes
Summary
This document is a brief summary of the performance advantages that customers can gain when using the PowerEdge XE8545 acceleration server. All performance and characteristics discussed are based on performance characteristics conducted in the Americas Data Center (CET) labs. Results accurate as of 3/15/2021. Ad Ref #G21000042
The PowerEdge XE8545 is Dell EMC’s response to the needs of high-performance machine learning customers who immediately want all the innovation and horsepower provided by the latest NVIDIA GPU technology, without the need to make major cooling-
related changes to their data center. Its specifically air-cooled design provides delivery of four A100 40GB/400W GPUs with a low-latency, switchless SMX4 NVLink interconnect, while letting the data center maintain an energy efficient 35°C. It also has an 80GB/500W GPU option that has been shown to deliver 13-15% higher performance than 400W GPUs at only a slightly lower ambient input temperature (28°C).
Unlike competitors, Dell worked with NVIDIA early in the design process to ensure that the XE8545 could run at 500 Watts of power when using the high capacity 80GB A100 GPUs – and still be air-cooled. This 80GB/500W GPU option allows the XE8545 to drive harder and derive more performance from each of the GPUs. Using the common industry benchmark ResNet50 v1.5 model to measure image classification speed with a standard batch size, the 500W GPU took 67.78 minutes to train, compared to 73.32 minutes for the 400W GPU – 7.56% faster. And when batch size is doubled, it results in up to 13- 15% better
performance! When speed of results is a customer’s primary concern, the XE8545 can deliver the power needed to get those results faster.
It is clear from the chart above that an XE845 with 40GB memory is more than twice as fast as the previous generation C4140 when training an image classification model – in fact, faster than two C4140s running in parallel! And the 80GB GPU option is even faster! This is a great illustration of the combined power of the new technologies packed into the XE8545 – the latest NVIDIA GPUS, the latest AMD CPUs and the latest generation of PCIe IO fabric. Further gains in performance can be achieved by workloads that take advantage of the improvements in how the A100 performs the matrix multiplication involved in machine learning– by better accounting for “sparsity”. That is, the occurrence of many zeros in the matrix, that previously resulted in lots of time-consuming “multiplying-by-zero” operations that had no actual effect on the final result.
And as with all operations for the XE8545, it delivers the very top-level performance using only air-cooling. It does not require liquid cooling.
Inference tends to scale linearly – as there is no peer-to-peer GPU communication involved - and the XE8545 has proven to have exceptional linear scalability. So, it is not surprising that the XE8545 produces excellent high-performance inference results. As with training, the 80GB/500W A100 GPU has a performance edge - 10% faster than the 400W GPU (at a proportional power increase).
The innovative Multi-Instance GPU (MIG) technology introduced with the A100 GPU allows the XE8545 to partition each A100 GPU into as many as seven “slices”, each fully isolated with its own high-bandwidth memory, cache, and compute cores. So, if fully utilized, an XE8545 server can be running 28 separate high- performance instances of inferencing. Each of those instances has been determined by NVIDIA to provide performance equivalent to the previous generation V100. So, an A100 GPU can be thought of as 7 times faster than the previous generation – specifically for inferencing, where peer-to-peer communication does not come into play.
The XE8545 has undergone NVIDIA’s comprehensive certification program for Datacenter AI: NVIDIA GPU Cloud (NGC). It is now certified to run at the latest Gen4 networking speeds and can take advantage of the NGC catalog that hosts frameworks and containers for the top AI, ML and HPC software, already tuned, tested and optimized. With NGC certification data centers can quickly and easily deploy machine learning environments with confidence and get results faster. For more details on NVIDIA certified systems here.
The PowerEdge XE8545 introduces the latest industry technologies in a combination that delivers the kind of high-performance, accelerated computing that can handle even the most demanding Artificial Intelligence and Machine Learning workloads or scientific high-performance computing analysis. It provides the highest levels of power and performance in an air-cooled environment, simplifying operational continuity in enterprise data centers.
Mon, 16 Jan 2023 19:50:53 -0000
|Read Time: 0 minutes
Dell Technologies is helping to shape the future of Open RAN solutions with our partnerships and our high performance, purpose-built XR11 and XR12 PowerEdge servers designed for Open RAN and edge deployments.
Introduction
The future of telecommunications includes an open, cloud-native architecture within an open ecosystem of vendors working together to build this new architecture. One of the more exciting aspects of this open future is Open Radio Access Networks (Open RAN). Open RAN is an industry-wide movement that promotes the adoption of open and interoperable solutions at the RAN.
Dell Technologies is helping to shape the future of Open RAN solutions with our partnerships and our high-performance, purpose-built XR11 and XR12 PowerEdge Servers designed for Open RAN and edge deployments.
Open RAN provides opportunities to replace the proprietary, purpose-built RAN equipment of the past with standardized, virtualized hardware that can be deployed anywhere—at the far edge, regional edge, or centralized data centers. Also, intelligent controllers can provide optimized performance and enhanced automation capabilities to improve operational efficiency.
In the O-RAN frameworks, you can separate the baseband unit (BBU) of the traditional RAN into virtualized distributed unit (vDU) and virtualized centralized unit (vCU) components. You can also scale these components independently as control- and user-plane traffic requirements dictate. When building an open-hardware platform for a vRAN architecture, you must consider six critical factors:
Form factor | Environment | Components |
Security | Automation and management | Supply chain |
With the growing number of edge deployments required to support 5G O-RAN services, edge-optimized cloud infrastructure is essential. These six factors ensure that telco providers build their 5G RAN on a scalable, highly available, and long-term sustainable foundation. Dell Technologies considered each of these factors when designing their PowerEdge XR11 and XR12 servers. These servers are built specifically for O-RAN and edge environments, including multi-access edge computing (MEC) and content delivery network (CDN) applications. The following sections examine how the XR11 and XR12 servers meet, and in many cases exceed, the criteria for O-RAN and edge deployments across these six critical factors.
Unlike data centers, which are carefully controlled environments, RAN components are often subject to extreme temperature changes and less-than-ideal conditions such as humidity, dust, and vibration. For years, the telecommunications industry has used the Network Equipment-Building System (NEBS) as a standard for telco-grade equipment design. The PowerEdge XR11 and XR12 are designed to exceed NEBS Level 3 compliance (meets or exceeds the GR-63-CORE and GR-1089-CORE standards). They also meet military and marine standards for shock, vibration, sand, dust, and other environmental challenges.
Fully operational within extreme temperature ranges from -5° C (23°F) to 55° C (131° F), you can deploy XR11/12 servers in almost any environment, even where exposure to heat, dust, and humidity are factors. The XR11/12 series is designed to withstand earthquakes and is fully tested to the NEBS Seismic Zone 4 levels. As a result, you can trust Dell PowerEdge servers to keep working no matter where they are deployed.
The PowerEdge XR11 and XR12 provide significant flexibility over purpose-built, all-in-one appliances by using the industry’s most-advanced, best-of-breed components. Also, by providing multiple CPU, storage, peripheral, and acceleration options, PowerEdge XR11/12 servers enable telecommunications providers to deploy their vRAN systems in many different environments.
Both models feature the following components:
One example test shows the performance possibilities that the PowerEdge XR12 enabled by 3rd Gen Intel® Xeon® Scalable processors offers: The solution delivered 2x the massive MIMO throughput for a 5G vRAN deployment compared to the previous generation.1
PowerEdge XR11/12 servers are designed with a security-first approach to deliver proactive safeguards through integrated hardware and software protection. This security extends from a hardware-based silicon root of trust to asset retirement across the entire supply chain. From the moment a PowerEdge server leaves our factory, we can detect and verify whether a server has been tampered with, providing a foundation of trust that continues for the life of the server. The Dell Integrated Dell Remote Access Controller (iDRAC) is the source of this day-zero trust. iDRAC checks the firmware against the factory configuration down to the smallest detail after the XR11/12 server is plugged in. If you change the memory, iDRAC detects it. If you change the firmware, iDRAC detects it. Also, we build every PowerEdge server a cyber-resilient architecture2 that includes firmware signatures, drift detection, and BIOS recovery.
Besides providing proactive and comprehensive security, PowerEdge XR11/12 servers combine ease-of-management with automation to reduce operational complexity and cost while accelerating time-to-market for new services. Dell OpenManage provides a single systems-management platform across all Dell components. This platform makes it easier for telecommunications providers to manage their hardware components remotely, from configuration to security patches. Also, Dell delivers powerful analytics capabilities to help manage server data and cloud storage. The iDRAC agent-less server monitoring also allows telecommunications providers to proactively detect and mitigate potential server issues before they impact production traffic. By analyzing telemetry data, iDRAC can detect the root cause for poor server performance and identify cluster events that can predict hardware failure in the future.
In the last year, the importance of a secure and stable supply chain has become apparent while many manufacturers struggle to adapt to widespread supply-chain disruption. As telecommunications providers look to ramp up 5G services, they require partners they can depend on to deliver, innovate, scale, and support their plans for the future. Because we are the world’s largest supplier of data-center servers, telecommunications providers can depend on Dell Technologies. We operate in 180 countries worldwide, including 25 unique manufacturing locations, 50 distribution and configuration centers, and over 900 parts-distribution centers. Our global, secure supply chain means that telecommunications providers can grow their business with confidence.
Dell Technologies does not stop at the server. We work closely with our open partner ecosystem to integrate and validate our technology in multivendor solutions that provide a best-of-breed, end-to-end vRAN system. You will find this partnership at work in our latest technology preview of the Dell Open RAN reference architecture featuring VMware Telco Cloud Platform (TCP) 1.0, Intel FlexRAN technology, and vRAN software from Mavenir. Our O-RAN solution architecture delivers the disaggregated components that compose the RAN network—vRU, vCU and vDU. Also you can deploy it in hybrid (private and public) clouds plus as bare-metal server environments. Having a pre-built, integrated solution allows telecommunications providers to deploy O-RAN solutions quickly and confidently, knowing that they have the power of our global supply chain and expert services behind them.
With many initial 5G core network transformations complete, telecommunications providers are now turning their attention to the RAN. For them, there are several paths to choose. They can continue to work with legacy vendors by growing out their proprietary RAN systems, missing out on the opportunity to build a best-of-breed RAN solution from multiple partners. Or, they can follow the path of Open RAN with Dell Technologies as a trusted partner to assemble and manage the right pieces from the industry’s O-RAN leaders.
Dell PowerEdge XR11/12 servers are the latest examples of our commitment to open 5G solutions. These servers are built by telco experts specifically for telco edge applications, using a security-first approach and featuring high- performance compute, storage, and analytics components. Also, they have been bundled with our broader Open RAN reference architecture to form the foundation of a seamless, complete vRAN solution that includes hardware, software, and services.
O-RAN is more than the edge of the future. It is a competitive edge for telecommunications providers that must quickly deliver and monetize 5G services, from private mobile networks to high-performance computing applications. Make Dell Technologies your competitive edge, and ask your Dell representative about our portfolio of telco-grade edge solutions.
1 PowerEdge Cyber Resilient Architecture Infographic
2 Bringing high performance and reliability to the edge with rugged Dell EMC PowerEdge XR servers
Automation