Which Server Platform Can Better Support Massive Cloud Infrastructure Needs?
See the Science Research Abstract Read the ReportThu, 14 Mar 2024 16:42:38 -0000
|Read Time: 0 minutes
Prowess Consulting examined two cloud-scale server platforms from Dell Technologies and Supermicro to see how these platforms meet the high infrastructure demands of cloud service providers (CSPs).
The cloud is a popular place to be. That’s because even organizations that do not traditionally operate in the virtualized digital realm, such as auto manufacturers and supermarkets, want what the cloud provides: agility, efficiency, and the ability to glean intelligence like never before. With the cloud, these organizations can achieve their goals thanks to AI and data analytics, making use of large amounts of data stored in different locations.
This trend toward cloud adoption has led to the rise of cloud service providers (CSPs), who serve not only as public cloud hosts (including providing hyperscale solutions like Microsoft Azure® and Amazon Web Services® [AWS®]), but who also offer “as-a-service” products like business-to-business or consumer software-as-a-service (SaaS), infrastructure-as-a-service (IaaS), and platform-as-a-service (PaaS) solutions. These CSPs face immense, unique challenges in managing servers in their data centers around the world. Operating and managing data centers is difficult enough for typical enterprises, and CSPs’ needs are multiplied due to the size and number of their facilities and the components housed within those facilities.
As a result, CSPs must rely on high-quality, high-performing products that can provide a high degree of serviceability, interoperability, and sustainability benefits. This is true for CSPs of all sizes, especially those that do not have the massive resources of hyperscalers. Failure to meet these requirements can cost a CSP in revenue, business opportunities, and customers.
Because increasing numbers of enterprises depend on CSP platforms to operate day to day, Prowess Consulting conducted a study to examine how two major platforms from Dell Technologies and Supermicro compare in helping CSPs run efficiently while meeting incredible demands:
- Dell™ PowerEdge™ HS5610 (1U, with cold-aisle support) and HS5620 (2U) servers with Broadcom® BRCM5720 network interface controllers (NICs)
- Supermicro® SuperServer® SYS-121C-TN10R (1U) and SYS-621C-TN12R (2U) servers with Supermicro® AOCA25G-12SM NICs
We first compared the two platforms in quantitative benchmark testing using the HammerDB online transaction processing (OLTP) benchmark on MySQL®. We also compared virtual machine (VM) workload performance, and we then conducted qualitative testing using other criteria besides performance that are key to meeting the specialized needs of CSPs:
- Ease of acquisition and deployment
- Ease of management and serviceability
More Requirements on a Larger Scale
A typical CSP can operate tens of thousands of servers, as opposed to the hundreds in a typical mid-sized data center, and these servers could be spread across several facilities, with some hyperscale data centers reaching more than 10,000 square feet.[1] These data centers use high-density racks to squeeze as many servers as possible into a given space, and they consume increasing amounts of energy as demand for cloud services continues to grow.
CSPs differ from typical data center operators in several ways. They favor disaggregated, modular server components and open-source code across the software stack. They also use industry-standard NICs and storage firmware for low costs and easy manageability of updates and migration paths. These providers lean heavily on automated features that allow them to make quick and efficient updates across many server systems at the same time. Automation is key to CSP colocation facilities, which can be sparsely staffed, and equipment needs to be reliable and remotely manageable.
Business-related challenges accompany these physical challenges. CSPs serve a wide variety of customer needs, and they need to deploy new applications and services to customers on demand. They need to deliver these solutions with easy deployment, and they need to manage heterogeneous environments with servers from multiple vendors seamlessly. They also must balance different consumption models, from low-cost solutions (buy what you need) to right-sized, streamlined products that avoid too much processor or memory utilization.
Benchmark Testing
Prowess Consulting first compared the performance of the two platforms with benchmark testing. For a step-by-step description of the testing procedure, view the Methodology. We didn’t expect the results to be dramatically different, as both platforms are built on the same CPUs and similar components (see Table 1). We used Broadcom 100 GB NICs on all four systems, and we chose solid-state drives (SSDs) that were available for selection at the time of order.
Table 1. Specifications of Dell Technologies and Supermicro cloud-scale servers compared in our study
Model Name | Dell™ PowerEdge™ HS5610 | Dell™ PowerEdge™ HS5620 | Supermicro® SuperServer® SYS-121C-TN10R | Supermicro® SuperServer® SYS-621C-TN12R |
CPU | 2 x Intel® Xeon® Gold 6448Y processor, 32/64 cores/threads per CPU, 2,100 MHz frequency (base/SCT/MCT) | |||
Storage Controller 01 | PCIe® SSD backplane | Dell™ PowerEdge™ RAID Controller 11 (PERC 11) H755 Front | Broadcom® MegaRAID® 12 Gb Serial-Attached SCSI (SAS)/PCIe® secure SAS29xx | Supermicro® AOC-S3916L-H16iR-32DD SAS3/Serial ATA (SATA) |
Disk | 3.84 TB Dell™ NVM Express® (NVMe®) CM6 read-intensive | 1.75 TB Samsung® PM897 Dell™ MZ7L31T9HBNAAD3 SATA | 960 GB Samsung® MZQL2960HCJR-00A07 | KIOXIA® CD6-R Series U.3 KDC6XLUL960G |
Number of Disks | 4 | 4 | 6 | 6 |
Storage Controller 02 | Dell™ Boot Optimized Server Storage (BOSS)-N1 | Dell™ Boot Optimized Server Storage (BOSS)-N1 | Broadcom® SAS 3908 AOC-S3908L-H8iR | Broadcom® SAS 3916 |
Disk | 480 GB Dell™ EC NVMe® ISE 7400 read-intensive M.2 | 480 GB Dell™ NVMe® PE8010 read-intensive M.2 | 480 GB Micron® 7450 PRO Series M.2 | 480 GB Micron® 7450 PRO Series M.2 |
Number of Disks | 2 | 2 | 2 | 2 |
Storage Controller 03 | Not applicable (N/A) | MegaRAID® 12GSAS/PCIe® Secure SAS29xx | N/A | N/A |
Disk | N/A | 3.84 TB Dell™ NVMe® P5500 RI U.2 | N/A | N/A |
Number of Disks | N/A | 4 | N/A | N/A |
Installed Memory | 16 x 32 GB SK hynix® HMCG88AEBRA107N Pc5-38400 DDR5 (4,800 megatransfers per second [MT/s] 2RX4 error correction code [ECC]) | 16 x 32 GB SK hynix® HMCG88AEBRA107N Pc5-38400 DDR5 (4,800 MT/s 2RX4 ECC) | 16 x 32 GB Samsung® M321R4GA3BB6-CQKMS DDR5 single-bit ECC | 16 x 32 GB Samsung® M321R4GA3BB6-CQKMG DDR5 single-bit ECC |
4,800 MT/s Memory Speed | 4,800 MT/s | 4,800 MT/s | 4,800 MT/s | 4,800 MT/s |
NIC | Broadcom® NetXtreme® BCM5720 gigabit | Broadcom® NetXtreme® BCM5720 gigabit | AOC-A100G-b2CM and AOC-A25G-I2SM | 1 x Intel® Ethernet Controller E810-XXV for SFP 1 x Supermicro® AIOM 25 Gb SFP28 Ethernet controller |
Operating System (OS) | Red Hat® Enterprise Linux® 9.2 | Red Hat® Enterprise Linux® 9.2 | Red Hat® Enterprise Linux® | Red Hat® Enterprise Linux® 9.2 |
BIOS Version | 1.1.2 | 1.1.2 | 1.1 | 1.1 |
Baseboard Management Controller Firmware | 2.1.0 | 2.1.0
| 01.00.52 | 01.00.45 |
MySQL® Performance
Our engineers tested MySQL database performance on all four systems using HammerDB, simulating different numbers of virtual users in a 500 warehouse/e-store environment. As expected, both 1U and 2U servers performed comparably. Figure 1 shows that the 1U Dell™ system performed slightly better than the 1U Supermicro® system, with 6% more new orders per minute (NOPM). In Figure 2, which compares the 2U servers, the Supermicro system comes out slightly ahead with 8% more NOPM.
Figure 1. Comparison of MySQL® performance for five virtual users on 1U Dell™ and Supermicro® servers
Figure 2. Comparison of MySQL® performance for five virtual users on 2U Dell™ and Supermicro® servers
However, there is a marked difference in performance at higher numbers of virtual users. As Figure 3 shows, the 1U Dell server performed better than the 1U Supermicro server with 50 virtual users (46% more NOPM).
Figure 3. Comparison of MySQL® performance for 50 virtual users on 1U Dell™ and Supermicro® servers
The results were similar for the 2U comparison, with the Dell system performing 39% better than the Supermicro system in NOPM. When we increased the number of virtual users to 100, the 1U and 2U Dell servers again outperformed both the 1U and 2U Supermicro servers (39% NOPM for 1U and 38% NOPM for 2U).
With the Dell system showing better overall performance, we decided to compare the 1U Dell server with the 2U Supermicro server. Again, Dell Technologies came out ahead in performance (see Figure 4).
Figure 4. Comparison of MySQL® performance for 100 virtual users on a 1U Dell™ server and a 2U Supermicro® server
VM Workload Performance
To test VM capacity and performance, our engineers created a few hundred VMs on each server. To simulate a virtual desktop infrastructure (VDI) workload typical for a CSP, each VM generated a small load every few minutes. Every five minutes, we spun up a new VM, until the system reported capacity.
The Dell servers outperformed the Supermicro servers in this testing, as shown in Figures 5–7.
Figure 5. The 1U Dell™ server ran 118% more VMs than the 1U Supermicro® server
Figure 6. The 1U Dell™ server supported 22% more VMs than the 2U Supermicro® server
Figure 7. The 2U Dell™ server was able to run 157% more VMs than its 2U Supermicro® counterpart
Analysis
As shown in the results, despite the similarity in hardware components across all four servers, the Dell servers performed better than the Supermicro servers, especially as workloads increased and placed more stress on the hardware.
We also ran network performance testing using iPerf, but differences between the servers were negligible. Overall, the 100 GB Broadcom NICs performed well, with strong, consistent bandwidth performance on both the Dell and Supermicro servers. Broadcom NICs have been shown to provide 100% bandwidth consistency at four Transmission Control Protocol (TCP) streams, and they can achieve maximum throughput twice over 15 runs.[i]
Broadcom® Dual-Port 100 GB PCIe® Ethernet NICs
Broadcom BRCM5720 NICs are widely used in the industry for cloud-scale networking and storage applications, such as high-performance computing, telecommunications, machine learning (ML), storage disaggregation, and data analytics. Broadcom NIC architecture combines a high-bandwidth Ethernet controller with optimized hardware-acceleration engines to enhance network performance and server efficiency.
Standard features include the following:
- Standards-compliant 200/100/50/40/25/10 GB dual-port QSFP56 PCIe NICs with line-rate, full-duplex throughput
- NIC partitioning supporting 16 physical functions (PFs)
- TruFlow™ engine for intelligent flow processing to increase server VM density and accelerate vSwitch processing
- Broadcom BroadSAFE® technology for platform security via a silicon root of trust
- End-to-end congestion avoidance and management that anticipates and eliminates congestion before it happens
- Support for advanced networking technologies (include remote direct memory access [RDMA] over converged Ethernet [RoCE], software-defined networking [SDN], network functions virtualization [NFV], and virtualization)
- TruManage™ technology for server manageability and security for data center deployments
Ease of Acquisition and Deployment
The hardware acquisition process and management needs for a CSP data center are quite different than those of other commercial enterprises. CSPs typically buy far more than one server at a time, often purchasing on a per-rack basis. Therefore, ease of acquisition is important to CSPs because they are constantly upgrading and replacing parts and servers in their data centers.
Fulfillment
Every organization in every industry continues to face challenges in getting the products it needs in a reasonable amount of time. Not being able to deploy servers or replace components due to shipment delays can lead to lost revenue, business opportunities, and customers, along with delays in launching new applications and services. Due to its size and extensive experience in the industry, and specifically in data centers, Dell Technologies has an established, robust supply chain and logistics system that provides short turnaround times. Dell Technologies delivery times, according to one study, are as short as six weeks—compared to other vendors, who might take as long as one year.[3] Deploying a Dell Technologies infrastructure stack with an accelerated delivery schedule could deliver a business up to 107% more revenue for an application, versus using equipment from a vendor with a 12-month lead time.3
Supermicro has historically operated under an original device manufacturer (ODM) model, in which it works with original equipment manufacturers (OEMs) to build servers for them. More recently, Supermicro has taken a more direct approach in building solutions and pricing them aggressively. Some in the industry have noted that while such “bare-bones platforms at razor-thin margins” provide CSPs with initial cost savings, they might come at the cost of quality.[4] This might be the case for Supermicro—for our testing, we had one part on back order that required another few weeks to arrive at our testing lab.
Deployment
Table 2 presents the amount of time that it took our engineers to complete each step for each server.
Table 2. Time to complete deployment steps, in seconds
Deployment Step | Dell™ PowerEdge™ HS5610 | Dell™ PowerEdge™ HS5620 | Supermicro® SuperServer® SYS-121C-TN10R | Supermicro® SuperServer® SYS 621C-TN12R |
1. Inspect the server for any damages caused by shipping. | 240 | 240 | 360 | 600 |
2. Install rails onto the server and rack. | 60 | 60 | 275 | 117
|
3. Install the server into the rack using previously installed rails. | 29 | 30 | 45 | 1,943 |
4. Connect power and network to the server. | 60 | 60 | 60 | 49
|
5. Connect a keyboard, mouse, and monitor to the server. | 35 | 35 | 117 | 90 |
6. Power on the server. | 1 | 1 | 41 | 15
|
7. Install the latest firmware. | 85 | 76 | 267 | 140 |
8. Install an operating system (Red Hat® Enterprise Linux®). | 19 | 75 | 66 | 66 |
Total deployment time | 529 | 577 | 1,231 | 3,020
|
Ease of Management and Serviceability
While all the systems examined in this study are cloud-scale servers, the Dell Technologies and Supermicro systems differ in their management tools and in their design architectures, which can impact management and serviceability.
Management Tools
Dell Technologies offers CSPs several management tools to reduce the time to configure and automate server updates, including Dell Open Server Manager built on OpenBMC. Instead of searching for firmware updates for the Broadcom NICs, for instance, our engineers were able to download updates from one source and get the NICs up and running at full speed within minutes. CSPs can also use Dell™ OpenManage™ Server Administrator (OMSA), built on OpenBMC, as a baseboard management controller (BMC) firmware stack designed to run on heterogeneous infrastructures. Dell Open Server Manager enables open, embedded systems management on PowerEdge cloud-scale servers. The simplicity of a single system-management stack allows for scalable operations and easier migration paths across different or newer infrastructure.
Supermicro also offers a BMC tool that allows IT admins to track and monitor power, but with less overall information than the Dell server. Supermicro IPMI utilities include an in-band configuration tool called IPMICFG. IPMIView allows administrators to manage multiple target systems through BMC. For firmware updates, the Supermicro servers required our engineers to manually download updates from multiple sources, which takes time and persistence as the Supermicro website does not consolidate everything on one page.
Access to Components
Replacing components in servers is a routine practice in data centers, so easy access is key for hyperscalers and other CSPs. PowerEdge servers, which are designed for the data center, can be serviced without removing them from the rack. After removing the manifold from the server, data center admins can easily replace components without having to remove other nearby sections due to the internal layout, with minimal overlapping or covered parts.
The Supermicro servers are built differently, requiring that the unit be removed from the rack first, and with less accessibility to parts. For instance, to service a graphics processing unit (GPU) or even change a DIMM, a technician would need to remove the server from the rack, take it to the servicing lab, remove both risers in order to then remove the shroud, and then access the motherboard.
Figure 8. A well-designed layout with minimal overlapping parts allows easy access inside the Dell™ PowerEdge™ HS5610 server (left); in contrast, the Supermicro® server (right) requires the removal of risers and other parts to access the underlying components
Hot and Cold Aisle Access
Hot and cold aisles are used in large-scale data centers to regulate the ambient temperature, with cold air intake at the front of servers that are facing each other (known as the “cold aisle”) and hot air exhausts at the back of servers facing each other (known as the “hot aisle”). Although this design enhances temperature control and power-consumption management, it can pose safety and efficiency concerns for data center technicians.
Benefits of Cold-Aisle Serviceability
Cold-aisle-optimized designs for data centers aim to improve energy savings and efficiency in operations. The adoption of such designs can lead to several positive outcomes for CSPs, including:
- Reduced energy consumption/smaller carbon footprint: By containing and directing cold air efficiently to the equipment, cold-aisle optimization minimizes energy waste. This leads to lower energy consumption, reduced greenhouse gas emissions, and lower operating costs.
- Enhanced performance: Proper cooling helps maintain optimal temperatures for the data center’s equipment, which can extend the life of the hardware.
- Improved reliability and uptime: Efficient cooling reduces the chance of equipment failure and helps maintain consistent performance levels. This increases the reliability and uptime of the cloud services.
- Regulatory compliance: Adopting cold-aisle-optimized designs can help CSPs comply with strict energy efficiency and environmental regulations and avoid potential fines or penalties.
The PowerEdge HS5610 servers offer a front input/output (I/O) configuration for cold-aisle access, and like other Dell servers, the PowerEdge HS5610 provides cold-aisle crash cart access. Data center admins can access the servers’ storage and NVMe drives, OCP®, dedicated BMC, and serial port from the cold aisle using USB and VGA ports or Bluetooth®. In addition, Dell servers use “blind-mate” power rails, which are static rails that include extra power pass-through bracket assemblies. These bracket assemblies allow technicians to connect power and then remove or service the PowerEdge HS5610 server without requiring hot-aisle access.
In contrast, the Supermicro SuperServer SYS-121C-TN10R is designed for only hot-aisle access.
Figure 9. The Supermicro (top) and Dell Technologies (bottom) systems differ in how ports and components are accessed: the Dell™ servers are built with ports in the front for serviceability from the cold aisle; Supermicro uses a more traditional design that requires service technicians to enter the hot aisle
Diagnostic Tools
Dell Technologies and Supermicro offer tools that CSPs can use to monitor server performance and diagnose problems using the BMC to capture data. The integrated Dell™ Remote Access Controller (iDRAC) tool offers a rich display of information, while the Supermicro tool is less robust. A visual comparison is presented in Figure 10.
For additional management reach, Dell™ CloudIQ is a Dell Technologies tool that combines monitoring, ML, and predictive analytics to simplify the operation of on-premises infrastructure and data protection in the cloud. The tool supports a broad range of Dell Technologies products, including PowerEdge servers, data protection, converged and hyperconverged infrastructure, and networking.
Figure 10. Dell Technologies (left) and Supermicro (right) system diagnostic tools
Other Considerations
Other Dell Technologies offerings can support customers operating cloud data centers so they can deploy servers quickly and consistently. Dell™ ProDeploy Factory Configuration preconfigures servers in the factory before shipping with system settings, the customer-supplied system image, asset tagging, and card placement, so that servers are ready to deploy out of the box. Dell Technologies can also build entire integrated rack solutions and can include factory stress testing.
Customers can quickly get replacement parts through several different routes. Dell ProSupport™ for Infrastructure for CSPs provides 24/7 support and a four-hour exchange timeframe. For instance, if a customer has a bad hard drive, they can set up a support ticket and Dell Technologies can validate the condition of the drive and immediately ship a replacement.
In addition, the Dell Technologies Logistics Online Inventory Solution (LOIS) is an onsite parts locker that provides self-maintainers with a local inventory of common replacement components. Having access to these parts lockers allows CSPs to replace a failed component immediately without delay. A parts installer can pull a new fan or memory DIMM out of the repair cabinet and install it immediately without an acquisition process or delay due to shipping. Each replacement part automatically initiates a replenishment of the parts inventory.
Dell Technologies also affixes an express service tag to each server to expedite troubleshooting and part replacement. This tag includes a QR code that can be scanned for more information using Dell™ OpenManage™ Mobile to discern which fan or memory DIMM to purchase and how to run diagnostics.
Figure 11. Dell™ and Supermicro® server labels differ in the amount of information and level of detail to help customers when installing systems and troubleshooting problems
Conclusion
CSPs depend on top-tier, high-performance servers capable of delivering exceptional performance to handle a wide range of workloads. This holds particularly true for smaller CSPs that lack the extensive resources available to hyperscale companies and that must carefully scrutinize the total cost of ownership (TCO). Our testing shows that PowerEdge HS5610 and HS5620 servers and Supermicro SuperServer SYS-121C-TN10R and SYS-621C-TN12R servers are capable of delivering the performance needed for a typical cloud workload, but the Dell servers perform much better as workloads increase.
Looking beyond the features of a server, our engineers examined various qualitative factors of a server that can affect TCO. Anything that the server vendor does to provide additional benefits and operational improvements can translate to cost savings and a higher return on investment (ROI). This includes the time it takes to deploy servers, issue resolution on Day-2 operations, and support for large organizations that tend to manage support in-house. Dell Technologies provides comprehensive services and tools that help CSPs capture a better TCO that should be considered as part of a server purchase to maintain high levels of productivity and efficiency.
Learn more at “PowerEdge Cloud Scale Servers”: https://www.dell.com/en-us/dt/solutions/cloud/poweredge-and-cloud.htm
The analysis in this document was done by Prowess Consulting and commissioned by Dell Technologies.
Results have been simulated and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
Prowess and the Prowess logo are trademarks of Prowess Consulting, LLC.
Copyright © 2023 Prowess Consulting, LLC. All rights reserved.
Other trademarks are the property of their respective owners.
Author: Prowess Consulting, LLC
[1] TechTarget. “A primer on hyperscale data centers.” February 2023. https://www.techtarget.com/searchdatacenter/tip/A-primer-on-hyperscale-data-centers.
[3] Enterprise Strategy Group (ESG). “Dell Technologies Supply Chain Advantage.” January 2023. https://www.delltechnologies.com/asset/en-us/products/networking/industry-market/esg-showcase-dell-supply-chain-advantage.pdf.
[4] Forbes. “How Are Server Vendors Embracing Intel’s Sapphire Rapids?” January 2023. https://www.forbes.com/sites/moorinsights/2023/01/24/how-are-server-vendors-embracing-intels-sapphire-rapids/?sh=3fec911870d5.