PowerEdge R750xa and NVIDIA H100 PCIe GPU: 66% Increase in HPC Performance per Watt
Download PDFMon, 16 Jan 2023 19:49:21 -0000
|Read Time: 0 minutes
Summary
Using the PowerEdge R750xa, the Dell HPC & AI Innovation Lab compared performance of the new NVIDIA H100 PCIe 310 W GPU to the previous- generation NVIDIA A100 PCIe GPU, using the supercomputer benchmark HPL. Results showed:
- 66% increase in performance per watt
- 67% increase in raw performance (TFLOPS), using four GPUs
The Dell PowerEdge R750xa, powered by 3rd Gen Intel Xeon Scalable processors, is a dual-socket/2U rack server that delivers outstanding performance for the most demanding emerging and intensive GPU workloads. It supports 8 channels per CPU and up to 32 DDR4 DIMMs with speeds up to 3200 MT/s. In addition, the PowerEdge R750xa supports PCIe Gen 4 and up to 8 SAS/SATA SSDs or NVMe drives. The PowerEdge R750xa, the one PowerEdge portfolio platform that supports all the PCIe GPUs, is the ideal server for virtualization environments and workloads such as high performance computing and AI-ML/DL training and inferencing. The PowerEdge R750xa includes all the core benefits of PowerEdge: serviceability, consistent systems management with iDRAC, and the latest in extreme acceleration.
The new NVIDIA H100 PCIe GPU is optimal for delivering the fastest business outcomes with the latest accelerated servers in the Dell PowerEdge portfolio, starting with the R750xa. The PowerEdge R750xa boosts workloads to new performance heights with GPU and accelerator support for demanding workloads, including enterprise AI. With its enhanced, air-cooled design and support for up to four NVIDIA double-width GPUs, the PowerEdge R750xa server is purpose-built for optimal performance for the entire spectrum of HPC, AI-ML/DL training, and inferencing workloads.
Next-generation GPU performance analysis
The Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA H100 PCIe 310 W GPU to the last Gen A100 PCIe GPU in the Dell PowerEdge R750xa. The team used HPL, a popular computing benchmark often used to evaluate the performance of supercomputers on the TOP500 list. This comparison included HPL performance and server power consumption throughout the benchmark. Here are the results:
Performance/watt
The performance per watt calculation is the HPL benchmark score divided by the average server power over the duration of the HPL benchmark. The PowerEdge R750xa with the NVIDIA H100 PCIe GPUs delivered a 66% increase in performance/watt compared to the PowerEdge R750xa with the NVIDIA A100 PCIe GPUs, as shown in the following figure.
PowerEdge R750xa - HPL Benchmark and Server Power
Figure 1. Performance/watt comparison
HPL benchmark performance
Figure 2 shows the raw HPL performance of each configuration. The PowerEdge R750xa with four NVIDIA H100 PCIe GPUs achieved a 67% increase in TFLOPS compared to the configuration with four NVIDIA A100 PCIe GPUs.
Figure 2. Raw performance comparison
Server power
Figure 3 shows the server power over the duration of the HPL benchmark. The NVIDIA H100 PCIe GPU configuration delivered better performance with slightly lower server power and finished the workload faster.
Figure 3. HPL server power
Configuration information
The following table shows the two test configurations.
Table 1. R750xa test configurations
| R750xa with four NVIDIA H100 | R750xa with four NVIDIA A100 |
Server | PowerEdge R750xa | |
CPU | 2 x Intel Xeon Gold 6338 CPU | |
Memory | 512 GB system memory | |
Storage | 1 x 3.5T SSD | |
BIOS/iDRAC | 1.9.0/6.0.0.0 | |
HPL version | HPL for H100 (Alpha version, results subject to change) | |
Operating system | Ubuntu 20.04 LTS | |
GPU | NVIDIA H100-PCIe-80GB (310 W) | NVIDIA A100-PCIe-80GB (300 W) |
Driver | CUDA 11.8 | CUDA 11.8 |
Conclusion
Using the PowerEdge R750xa, the Dell HPC & AI Innovation Lab compared the performance of the new NVIDIA H100 PCIe 310 W GPU to the previous-generation NVIDIA A100 PCIe GPU. HPL benchmark results showed a 66 percent increase in performance/watt and a 67 percent increase in TFLOPS.
The PowerEdge R750xa supports up to four NVIDIA H100 PCIe GPUs and is available with new orders or as a customer upgrade kit for existing deployments. To learn more, reach out to your account executive or visit www.dell.com.
References
Related Documents
Introducing the PowerEdge T360 & R360: Gain up to Double the Performance with Intel® Xeon® E-Series Processors
Thu, 04 Jan 2024 22:08:42 -0000
|Read Time: 0 minutes
Summary
The launch of the PowerEdge T360 and R360 is a prominent addition to the Dell Technologies PowerEdge portfolio. These cost-effective 1-socket servers deliver powerful performance with the latest Intel® Xeon® E-series processors, added GPU support, DDR5 memory, and PCIe Gen 5 I/O slots. They are designed to meet evolving compute demands in Small and Medium Businesses (SMB), Remote Office/Branch Office (ROBO) and Near-Edge deployments.
Both the T360 and R360 boost compute performance up to 108% compared to the prior generation servers. Consequently, customers gain up to 1.8x the performance per every dollar spent on the new E-series CPUs [1]. The rest of this document covers key product features and differentiators, as well as the details behind the performance testing conducted in our labs.
Feature Additions and Upgrades
We break down the new features that are common across both the rack and tower form factors as shown in the table below. Perhaps the most salient upgrades over the prior generation servers – the PowerEdge T350 and R350 – are the significantly more performant CPUs, added entry GPU support, and up to nearly 1.4x faster memory.
- T360 and R360 key feature additions
| Prior-Gen PowerEdge T350, R350 | New PowerEdge T360, R360 |
CPU | 1x Intel Xeon E-2300 Processor, up to 8 cores | 1x Intel Xeon E-2400 Processor, up to 8 cores |
Memory | 4x UDDR4, up to 3200 MT/s DIMM speed | 4x UDDR5, up to 4400 MT/s DIMM speed |
Storage | Hot Plug SATA BOSS S-2 | Hot Plug NVMe BOSS N-1 |
GPU | Not supported | 1 x NVIDIA A2 entry GPU |
- From left to right, PowerEdge R360 and T360
Entry GPU Support
We have seen a growing demand for video and audio computing particularly in retail, manufacturing, and logistics industries.To meet this demand, the PowerEdge T360 and R360 now supports 1 NVIDIA A2 entry datacenter GPU that accelerates these media intensive workloads, as well as emerging AI inferencing workloads. The A2 is a single-width GPU stacked with 16GB of GPU memory and 40-60W configurable thermal design power (TDP). Read more about the A2 GPU’s up to 20x inference speedup and features here: A2 Tensor Core GPU | NVIDIA.
This upgrade could not come at a more apropos time for businesses looking to scale up and explore entry AI use cases. In fact, IDC projects $154 billion in global AI spending this year, with retail and banking topping the industries with the greatest AI investment. For example, a retailer could leverage the power of the A2 GPU and latest CPUs to stream video of store aisles for inventory management and customer behavior analytics.
Product Differentiation – Rack vs Tower Form Factor
The biggest differentiator between T360 and R360 is their form factors. The T360 is a tower server that can fit under a desk or even in a storage closet, while maintaining office-friendly acoustics. The R360 is a traditional 1U rack server. The table below further details the differences in the product specifications. Namely, the PowerEdge T360 has greater drive capacity for customers with data-intensive workloads or those who anticipate growing storage demand.
2. T360 and R360 differentiators
| PowerEdge R360 | PowerEdge T360 |
Storage | Up to 4 x 3.5'' or 8 x 2.5'' SATA/SAS, max 64GB | Up to 8 x 3.5'' or 8 x 2.5'' SATA/SAS, max 128G |
PCIe Slots | 2 x PCIe Gen 5 (QNS) or 2 x PCIe Gen4 | 3x PCIe Gen 4 + 1x PCIe Gen 5 |
Dimensions & Form Factor | H x W x D: 1U x 17.08 in x 22.18 in 1U Rack Server | H x W x D: 14.54 in x 6.88 in x 22.06 in 4.5U Tower Server |
Processor Performance Testing
The Dell Solutions Performance Analysis Lab (SPA) ran the SPEC CPU® 2017 benchmark on both the PowerEdge T360 and R360 servers with the latest Intel Xeon E-2400 series processors. SPEC CPU is an industry-standard benchmark that measures compute performance for both floating point (FP) and integer operations. We compare these new results with the prior-generation PowerEdge T350 and R350 servers that have Intel Xeon E-2300 series processors.
The following gen-over-gen comparisons represent common Intel CPU configurations for R350/T350 and R360/T360 customers, respectively:
3. Selected CPUs for T/R350 vs T/R360 comparison
Comparison # | PowerEdge R350/T350 | PowerEdge R360/T360 |
1 | E-2388G, 8 cores, 3.2 GHz base frequency | E-2488, 8 cores, 3.2 GHz base frequency |
2 | E-2374G, 4 cores, 3.7 GHz base frequency | E-2456, 6 cores, 3.3 GHz base frequency |
3 | E-2334, 4 cores, 3.4 GHz base frequency | E-2434, 4 cores, 3.4 GHz base frequency |
4 | E-2324G, 4 cores, 3.1 GHz base frequency | E-2414, 4 cores, 2.6 GHz base frequency
|
5 | E-2314, 4 cores, 2.8 GHz base frequency |
Results
We report SPEC CPU’s FP rate metric and integer rate metric which measures throughput in terms of work per unit of time (so higher results are better).[1] Across all CPU comparisons and for both FP and Int rates, there was a 20% or greater uplift in performance gen-over-gen. Overall, customers can expect up to 108% better CPU performance when upgrading from the PowerEdge T/R350 to the T/R360.[2] Below Figure 1 displays the results for the FP base metric, and Table 4 details results for integer rates and FP peak metric.
Figure 1. SPEC CPU results gen-over-gen
4. Results for each CPU comparison
Comparison # | Processor | Int Rate (Base) | Int Rate (Peak) | FP Rate (Base) | FP Rate (Peak) |
1 | E-2388G | 68.1 | 71.2 | 55.9 | 60.3 |
E-2488 | 95.1 | 99.2 | 110 | 110 | |
% Increase | 39.65% | 39.33% | 96.78% | 82.42% | |
2 | E-2374G | 42.3 | 43.8 | 43.2 | 45.3 |
E-2456 | 68.3 | 71.1 | 90.1 | 90.3 | |
% Increase | 61.47% | 62.33% | 108.56% | 99.34% | |
3 | E-2334 | 39.8 | 41.2 | 41.5 | 43.4 |
E-2434 | 50.8 | 52.6 | 68.7 | 68.9 | |
% Increase | 27.64% | 27.67% | 65.54% | 58.76% | |
4 | E-2324G | 33 | 34 | 40.9 | 41.4 |
E-2414 | 39.7 | 41.1 | 65.2 | 65.7 | |
% Increase | 20.30% | 20.88% | 59.41% | 58.70% | |
5 | E-2314 | 29.4 | 30.2 | 38.6 | 39 |
E-2414 | 39.7 | 41.1 | 65.2 | 65.7 | |
% Increase | 35.03% | 36.09% | 68.91% | 68.46% |
In addition to better performance, Figure 2 below illustrates the high return on investment associated with these new Intel Xeon E-2400 series processors. Specifically, customers gain up to 1.8x the performance per every dollar spent on CPUs [1]. We calculated performance by dollar by dividing the FP base results reported in Table 4 by the US list price for the corresponding CPU. Please note that pricing varies by region and is subject to change.
Figure 2. Performance per Dollar gen-over-gen
Conclusion
The PowerEdge T360 and R360 are impressive upgrades from the prior-generation servers, especially considering the performance gains with the latest Intel Xeon E-series CPUs and added GPU support. These highly cost-effective servers empower businesses to accelerate their traditional use cases while exploring the realm of emerging AI workloads.
References
- A2 Tensor Core GPU | NVIDIA
- Worldwide Spending on AI-Centric Systems Forecast to Reach $154 Billion in 2023, According to IDC
- Overview - CPU 2017 (spec.org)
Legal Disclosures
[1] Based on SPEC CPU® 2017 benchmarking of the E-2456 and E-2374G Intel Xeon E-series processors in the PowerEdge R360 and R350, respectively. Testing was conducted by Dell Performance Analysis Labs in October 2023, available on spec.org/cpu2017/. Actual results will vary. Pricing is based on Dell US list prices for Intel Xeon E-series processors and varies by region. Please contact your local sales representative for more information.
PowerEdge XE9680 Rack Integration
Mon, 29 Apr 2024 16:12:31 -0000
|Read Time: 0 minutes
Introduction
Proper server rack integration is crucial for a data center's efficient and reliable operation. Optimizing space, power, and cooling can reduce downtime, simplify fleet management, improve serviceability, and lower overall costs. However, successful server rack integration requires careful planning, attention to detail, and expertise in server hardware, networking, and system administration.
This paper focuses on the critical aspects of deploying the PowerEdge XE9680 server in your data center. It describes key factors such as selecting the appropriate rack type, sizing the rack to meet current and future needs, installing and configuring the server hardware and related components, and ensuring proper power and cooling.
At Dell Technologies, we understand the importance of meeting our customers where they are. Whether you require full-service rack integration and deployment services or expert advice, we are committed to providing the support you need to achieve your goals. By leveraging our expertise and resources, you can be confident in your ability to implement the server rack integration that meets your unique needs and requirements.
The PowerEdge XE9680
The Dell PowerEdge XE9680 is a high-performance server designed to deliver exceptional performance for machine learning workloads, AI inferencing, and high-performance computing. Table 1 lists key specifications to consider when installing it in a rack.
Table 1. Server specifications
Feature | Technical Specifications |
Form Factor | 6U Rack Server |
Dimensions and Weight | Height — 263.2 mm / 10.36 inches Width — 482.0 mm / 18.98 inches Depth — 1008.77 mm / 39.72 inches with bezel — 995 mm / 39.17 inches without bezel —1075 mm /42.32 inches with Cable Management Arm (CMA) Weight —107 kg / 236 lbs. |
Cooling Options | Air Cooling |
XE9680 rack integration – critical factors
Server operating environment
The American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) data center specifications focus on temperature and humidity control, optimized air distribution, airflow management, air quality, and energy efficiency. Key recommendations include maintaining appropriate temperature and humidity ranges, implementing hot aisle/cold aisle configurations and containment systems, managing airflow effectively, ensuring high indoor air quality, and adopting energy-efficient technologies.
The Dell PowerEdge XE9680 complies with the A2 Class ASHRAE specifications in Table 2.
Table 2. Operating environment specifications
Product Operation | Product Power Off | |||||
Dry-Bulb Temp, °C | Humidity Range, Noncondensing | Max Dew Point, °C | Max Elevation, meters | Max. Rate of Change, °C/hour | Dry-Bulb Temp, °C | Relative Humidity, % |
10-35
| –12°C DP and 8% rh to 21°C DP and 80% rh | 21 | 3050 | 20 | 5 to 45 | 8 to 80 |
Note: The maximum operating temperature is derated by 1°C per 300m above 900m in altitude.
For optimal performance and reliability, it is recommended to operate within the defined specification ranges. While it is possible to operate at the edge of these ranges, Dell does not recommend continuous operation under such conditions due to potential impacts on performance and reliability.
Cabinet recommendations
When choosing a cabinet, it is important to consider factors such as size, ventilation, cable management, and security. The right cabinet should provide ample space for equipment, efficient airflow to prevent overheating, organized cable routing, and robust physical protection for valuable server hardware. Careful consideration of these factors ensures optimal performance, reliability, and ease of maintenance for your server infrastructure. We recommend the following cabinet specifications for optimal XE9680 installation:
- Minimum width of 600mm / 23.62 inches
- Minimum depth of 1200mm / 47.24 inches
- 42 or 48RU height
- Rear cable management support
- Support for rear facing horizontal or vertical PDUs
- To accommodate the depth of server and IO cables, it may be necessary to utilize cabinet extensions depending on cabinet vendor
- Side panels for single cabinets
Rack and stack
Installing servers in a rack is a crucial aspect of server management. Proper placement within the rack ensures efficient use of space, ease of access, and optimal airflow. Each server should be securely mounted in the rack, taking into account factors such as weight distribution and cable management. Strategic placement allows for better cooling, reducing the risk of overheating, and prolonging the lifespan of the equipment. Additionally, thoughtful placement enables easy maintenance, troubleshooting, and scalability as the server environment evolves. By giving careful consideration to the placement of servers in a rack, you can create a well-organized and functional setup that maximizes performance and minimizes downtime. We recommend the following:
- The PowerEdge XE9680 has a maximum chassis weight of 107kg/236 lbs. It is recommended to install the first XE9680 server in the 1RU location, and to install any additional servers directly above it. This configuration helps maintain a low center of gravity, reducing the risk of cabinet tipping.
- For ease of assembly and seismic bracket installation, we recommend starting at the 3RU position when using seismic hardware. It is important for customers to adhere to local building codes, and to ensure that all necessary facility accommodations are in place and that the seismic brackets are correctly installed.
Figure 1. 4x PowerEdge XE9680 servers in a rack
Power distribution recommendations
The PowerEdge XE9680, equipped with H100 GPUs, has an approximate maximum power draw of 11.5kW. It comes with six 2800W Mixed Mode power supply units (PSUs) that feature a C22 input socket.
The XE9680 currently supports 5+1 fault-tolerant redundancy (FTR). (An additional 3+3 FTR configuration will be introduced in the Fall of 2023.) It is important to note that in 3+3 mode, system performance may throttle upon power supply failure to prevent overloading the remaining power supplies.
Figure 2. PowerEdge XE9680 with PDU
For the XE9680, we recommend the following PDU specifications:
- Vertical or horizontal PDUs
- One circuit breaker per power supply
- C19 receptacles
Table 3. PDU specifications
PDU Input Voltage | XE9680s Per Cabinet | PDUs Per Cabinet | Circuit Breakers Per PDU (Min) | Single PDU Requirement (Min) |
208V | 2 | 2 | 6 | 60A (48A Rated) 17.3kW |
208V | 2 | 4 | 3 | 30A (24A Rated) 8.6kW |
208V | 4 | 2 | 12 | 100A (80A Rated) 28.8kW |
208V | 4 | 4 | 6 | 60A (48A Rated) 17.3kW |
400/415V | 2 | 2 | 6 | |
400/415V | 2 | 4 | 3 | 20A (16A Rated) 11.1kW@400 / 11.5kW@415V |
400/415V | 4 | 2 | 12 | |
400/415V | 4 | 4 | 6 |
Note: Single PDU Power Requirement = Input Voltage * Current Rating * 1.73.
The factor of 1.73 (the square root of 3) is used to account for three-phase power systems commonly used in data centers and industrial settings. By multiplying the input voltage, current rating, and 1.73, you can determine the power capacity needed for a single PDU to adequately support the connected equipment. This calculation helps ensure that the PDU can handle the power load and prevent overloading or electrical issues.
Optimal thermal management for performance and reliability
Thermal management is important in data centers to ensure equipment reliability, optimize performance, improve energy efficiency, prolong equipment lifespan, and reduce environmental impact. By maintaining appropriate temperature levels, data centers can achieve a balance between operational reliability, energy efficiency, and cost-effectiveness.
Dell Technologies recommends the following best practices for thermal management:
- Ensure a cold aisle inlet airflow of 1200 CFM.
- If additional equipment is rear-facing, consider using 1 or 2 RU ducts.
- Use filler panels for all open front U spaces.
- For stand-alone racks, install cabinet side panels to optimize airflow.
The XE9680 is engineered to operate efficiently within ambient temperature conditions of up to 35°C. Although it is technically capable of functioning in such environments, maintaining lower temperatures is highly recommended to ensure the device's optimal performance and reliability. By operating the XE9680 in a cooler environment, the risk of overheating and potential performance degradation can be mitigated, resulting in a more stable and reliable operation overall.
Cabinet cable management
Proper cable management in a server rack improves organization, airflow, accessibility, safety, and scalability. It enhances the reliability, performance, and maintainability of the entire IT infrastructure.
The PowerEdge XE9680 supports Ethernet and InfiniBand network adaptors, which are installed at the front of the server for easy access in cold aisles. To ensure proper cable management, the chosen cabinet solution should provide a minimum clearance of 93.12mm from the face of the network adaptor to the cabinet door. This clearance is necessary to accommodate the bend radius of a typical DAC (Direct Attach Cable) cable (see Figure 3).
Figure 3. DAC clearance recommendations
The maximum cable length in the figure 6 is 2.07 meters or 81.49 inches.
With adjacent racks, it is possible to improve cable management by removing the inner side panels. This alteration provides an open space along the sides of the racks, allowing cables to be conveniently routed between adjacent racks. By eliminating the inner side panels, technicians or IT professionals gain unobstructed access to the interconnecting cables, making it simpler to establish and maintain organized cabling infrastructure.
The following two figures show power cables routed through the optional cable management arm (CMA). The CMA can be mounted to either side of the sliding server rails.
Figure 4. Power cables in cable arm
Network switch
AI server network switches play a crucial role in supporting high-performance and data-intensive artificial intelligence workloads. These switches handle the demanding requirements of AI applications, providing high bandwidth, low latency, and efficient data transfer. They facilitate seamless communication and data exchange between AI servers, to ensure optimal performance and to minimize bottlenecks.
Installing a switch in a rack for servers is vital for establishing a robust and efficient network infrastructure, enabling seamless communication, centralized management, scalability, and optimal performance for the server environment.
The network switch may require offsetting within the rack to accommodate the bend radius of specific networking cables. To achieve this, a bracket can be utilized to push the network switch towards the rear of the rack, creating space for the necessary cable bend radius while ensuring proper installation of the front door. The accompanying images demonstrate the process of using the bracket to adjust the network switch position within the rack. This allows for optimal cable management and ensures the smooth operation of the network infrastructure.
Figure 6. Switch offset brackets
Enterprise Infrastructure Planning Tool (EIPT)
The Dell Enterprise Infrastructure Planning Tool (EIPT) helps IT professionals, plan and tune their computer and infrastructure equipment for maximum efficiency. Offering a wide range of configuration flexibility and environmental inputs, this can help right size your IT environment. EIPT is a model driven tool supporting many products and configurations for infrastructure sizing purposes. EIPT models are based on hardware measurements with operating conditions representative of typical use cases. Workloads can impact the power consumption greatly. For example, the same percent CPU utilization and different workloads can lead to widely different power consumption. It is not possible to cover all the workload, environmental, and customer data center factors in a model and provide a percent accuracy figure with any degree of confidence. With that said, Dell Technologies would anticipate (NOT guarantee or claim) a potential for some variation. Customers are always advised to confirm EIPT estimates with actual measurements under their own actual workloads.
Figure 7. Dell EIPT tool
Dell Deployment Services
Leading edge technologies bring implementation challenges that can be reduced or eliminated with Dell Rack Integration Services. We have the experience and expertise to engineer, integrate, and install your Dell storage, server, or networking solution. Our proven integration methodology will take you step by step from a plan to a ready-to-use solution:
- Project management
- Solutioning and rack layout engineering
- Physical integration and validation
- Logistics and installation
Contact your account manager and go to Custom deployment services to learn more.