Save Time, Rack Space, and Money—5:1 Server Consolidation Made Possible with the Latest AMD EPYC Processors
Download PDFThu, 20 Apr 2023 17:41:37 -0000
|Read Time: 0 minutes
Summary
The latest Dell PowerEdge servers with AMD EPYC 4th Generation processors, each with up to 96 cores, deliver exceptional value to our customers. The large number of cores coupled with the high-speed DDR5 memory and very high-speed PCIe Gen5 devices makes for servers that can run almost any workload with ease. These servers are especially well suited for virtualization workloads. These unprecedented performance enhancements enabled Dell Technologies to achieve multiple virtualization world records. The cluster-level benchmarks for virtualized workloads are an excellent example of the performance and power-performance world record gains that are achievable.
Running a mixture of architectures in your data center can be cause for some concern—especially if you are looking to upgrade to the latest AMD servers and you are currently running the workloads on legacy Intel® based servers. Even with the greatest level of planning, there is always the fear that some unexpected variable might turn everything upside down during the migration process. Now, there is a new tool for your toolbox to make such migrations easier. The VMware Architecture Migration Tool1 is a PowerShell script that uses VMware PowerCLI to eliminate the guesswork and complexity involved in migrating a virtual machine from one hardware architecture to another.
To fully test the tool, Dell ran a full migration scenario. We were able to consolidate 380 VMs running on five legacy Intel platform servers into one Dell PowerEdge R7625 with AMD EPYC 4th Gen processors. We describe our testing in more detail later in this paper.
Why migrate?
In today’s IT departments, workloads are always evolving. There is increasing pressure to support new workloads while keeping existing workloads to support existing business needs—all while also trying to reduce costs and meet corporate goals.
The latest technology tends to bring multiple advantages, driving the need to upgrade. Some of these advantages are:
- Higher performance
The latest Dell PowerEdge servers with 4th Gen AMD EPYC processors have class-leading performance with up to 121 percent higher scores than prior generations.2
- Better efficiency
The Dell PowerEdge servers with 4th Gen AMD EPYC processors are some of the first to achieve the EPEAT silver rating, indicating the highest level of environmental responsibility and efficiency. Dell has achieved 159 percent higher performance per kilowatt on the VMmark benchmark with the R7625 compared to the prior-generation model server.3
- More security
With Dell’s Cyber Resilient Architecture and AMD’s Infinity Guard, the PowerEdge servers with 4th Gen AMD EPYC processors offer top-class security to ensure that your data and infrastructure are protected.4
- Workload optimizations
The 4th Generation AMD EPYC processors have several optimizations, such as support for AVX-512, INT8, and BFLOAT16. The processors can deliver exceptional performance for workloads that can take advantage of such optimizations.
VMware Architecture Migration Tool
The VMware Architecture Migration Tool (VAMT) was developed jointly by AMD and VMware to automate the migration of legacy VMs from Intel architecture to AMD architecture, with the goal of delivering a better user experience and better business value. Freely available on GitHub, VAMT offers several key features:
- Architecture agnostic and open source
- Fully automated cold migration
- VM success validation
- Process throttling
- Change window support
- Email and syslog support
- Audit trail
- Rollback
The tool streamlines and simplifies the migration process in a trustworthy fashion.
Benchmarking
Dell leveraged the VAMT tool and the VMmark benchmark to achieve some remarkable consolidation on the PowerEdge R7625.
The VMmark benchmark allowed us to set up a workload in the form of tiles within each hardware cluster. Each tile consisted of 19 different VMs running a workload internally. The benchmark was deployed across five legacy Intel based servers and eventually migrated to a single AMD based PowerEdge server. A Dell PowerMax 2000 SAN was used for data storage. The following table shows the configuration details:
Table 1. Configuration of source and target servers
Component/specification | Source | Target |
---|---|---|
Number of servers | 5 | 1 |
Processor | Intel 8180 | AMD EPYC 9654 |
Cores per server | 56 | 192 |
Memory | 768 GB | 3 TB |
Tiles | 4 | 20 |
VMs per server | 76 | 380 |
Server | Server vendor A | Dell PowerEdge R7625 |
Storage | PowerMax 2000; 30 TB spread across 6 LUNs | |
Network | 32 GB FC network for storage, 25 GbE for data network on VMs through a 4-way splitter, 100 Gb switch |
We were able to run four tiles per legacy server for a total of 380 VMs. The VAMT was then used to migrate the VMs across to the target PowerEdge server.
The tool completed a cold migration of all 380 VMs to the target server in 57 minutes!
Achieving value
The Dell PowerEdge R7625 with 4th Gen AMD EPYC processors delivers significant technology advancements that can deliver value in any virtualized deployment. Consolidating from five servers to a single server is an example of the extent of savings possible. This kind of consolidation allows for significant license cost savings and fewer hours on system management. Decommissioning the five legacy systems also reduces power draw and operational costs by as much 64 percent,5 even while also running workloads on the latest architecture with security features like Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV). AMD SEV helps safeguard privacy and integrity by encrypting each virtual machine.
1 https://github.com/vmware-samples/vmware-architecture-migration-tool
2 Based on Dell analysis of submitted SPECFPRate score of 1410 achieved on a Dell PowerEdge R7625 with AMD EPYC 9654s compared to the previous high score of 636 on a Dell PowerEdge R7525 with AMD EPYC 7763 processors as of 11/3/2022. Actual performance might vary.
3 Based on Dell analysis of published VMmark Server Power-Performance score of 21.0179@21 tiles achieved on a Dell PowerEdge R7615 cluster with AMD EPYC 9654P processors compared to the score of 8.1263@12 tiles achieved on a Dell PowerEdge R7515 cluster with the AMD EPYC 7763 processors as of 4/13/2023. Actual performance might vary.
5 Based on Dell internal analysis comparing the total CPU TDP of 2,050 W from five dual-socket servers with the Intel Xeon 8180 processors compared to the total CPU TDP of 720 W from a single dual-socket Dell PowerEdge server with AMD EPYC 9654 processors as of 4/13/2023. Actual performance might vary.
Related Documents
DDR5 Memory Bandwidth for Next-Generation PowerEdge Servers Featuring 4th Gen AMD EPYC Processors
Wed, 03 May 2023 15:49:23 -0000
|Read Time: 0 minutes
Summary
Dell Technologies has announced some exciting new servers featuring the latest 4th Gen AMD EPYC processors. These servers come in 1- and 2-socket versions in 1U and 2U form factors. Each socket supports up to 12 DIMMs at speeds of up to 4,800 MT/s. This document compares the memory bandwidth readings observed with these new servers against previous-generation servers running 3rd Gen AMD EPYC processors.
4th Gen AMD EPYC memory architecture
The 4th Gen AMD EPYC processors are the first AMD x86 server processors to support DDR5 memory. The CPUs themselves still have a chiplet design with a central I/O chiplet surrounded by compute chiplets. The memory runs at speeds of up to 4,800 MT/s, which is 50 percent faster than the 3,200 MT/s that the previous 3rd Gen AMD EPYC processors supported.
One other significant difference is in the number of populated slots. The 3rd Gen AMD EPYC processors supported up to 16 DIMMs per socket in a 2 DIMMs per channel configuration or 8 DIMMs per socket in a 1 DIMM per channel configuration. The 2 DIMMs per channel configuration supported a maximum speed of 2,933 MT/s.
Memory bandwidth test
To quantify the impact of this increase in memory support, we performed two studies.1 The first study (see Figure 1) measured memory bandwidth determined by the number of DIMMs per CPU populated. To measure the memory bandwidth, we used the STREAM Triad benchmark. STREAM Triad is a synthetic benchmark that is designed to measure sustainable memory bandwidth (in MB/s) and a corresponding computation rate for four simple vector kernels. Of all the vector kernels, Triad is the most complex scenario. We ran the benchmark on the following systems:
- Previous-generation Dell PowerEdge R7525 powered by AMD’s 3rd Gen EPYC CPUs populated with up to 16 DDR4 3,200 MT/s DIMMs per channel
- Latest-generation Dell PowerEdge R7625 powered by AMD’s 3rd Gen EPYC CPUs populated with up to 12 DDR5 4,800 MT/s DIMMs per socket
We used default BIOS configurations for this test.
The following figures show the system aggregate memory bandwidth across two CPUs:
Figure 1. System aggregate memory bandwidth trends with DIMM population for 4th Gen AMD EPYC processor-based PowerEdge servers with default BIOS settings
Figure 2. System aggregate memory bandwidth trends with DIMM population for 3rd Gen AMD EPYC processor-based PowerEdge servers with default BIOS settings
Consider that a fully balanced configuration requires all DIMM channels to be populated—that is 8 DIMMs for the 3rd Gen and 12 DIMMs for the 4th Gen. Given these differences, it is challenging to do a direct comparison. However, if we compare the numbers for a balanced configuration with 1 DIMM per channel, we see a 112 percent increase in bandwidth. With just 8 channels populated in both cases, we see a 45 percent increase in bandwidth. Despite this not being a balanced configuration, we still see a significant performance increase at this point.
Figure 3. System aggregate memory bandwidth trends with DIMM population for 4th Gen AMD EPYC processor-based PowerEdge servers with tuned BIOS settings
We collected a second series of datapoints on the R7625 with BIOS settings adjusted for best memory performance. This included setting the NPS setting to NPS4 and disabling CCX as NUMA. With these settings, we see that the maximum bandwidth of the R7625 further increases by another 14.5 percent to a class-leading 789 GB/s.
Conclusion
With up to 96 cores per socket and significant increases in memory bandwidth, Dell PowerEdge servers with 4th Gen AMD EPYC processors continue to provide best-in-class features and specifications to satisfy the most demanding workloads.
1 Tests were performed in January 2023 at the Solutions and Performance Analysis Lab at Dell Technologies.
Dell PowerEdge R7625 Rack Server & Emulex LPe36002 Host Bus Adapter: 64G Fibre Channel Microsoft SQL Server
Fri, 29 Mar 2024 16:19:02 -0000
|Read Time: 0 minutes
Dell PowerEdge R7625 Rack Server & Emulex LPe36002 Host Bus Adapter
64G Fibre Channel Microsoft SQL Server Performance – NVMe/FC vs. SCSI/FC
Tolly Report #224107
Tolly test report demonstrating that Dell PowerEdge R7625 Rack Server outfitted with the Emulex LPe36002 Host Bus Adapter using NVMe/FC can improve application performance vs older generation SCSI/FC.
Executive Summary
New generation servers can bring higher performance across a range of areas. This is certainly the case with Dell’s 16th-generation server line. Similarly, newer protocols like NVM Express (NVMe) over Fibre Channel (FC) can provide greater throughput and efficiency than older SCSI over FC. Dell is unique in offering an end-to-end NVMe/FC connectivity solution in the mid-range storage marketplace with the PowerStore line.
Dell commissioned Tolly to benchmark the performance of the Broadcom Emulex LPe36002 64G Fibre Channel dual-port host bus adapter (HBA) running in the Dell PowerEdge R7625 Rack Server with AMD EPYC processors by testing using actual database applications rather than simulated I/O microbenchmarks. Testing focused on evaluating the database throughput, latency, and CPU efficiency of accessing Microsoft SQL Server 2019 for Linux systems over older SCSI/FC and newer NVMe/FC. Databases were stored on a Dell PowerStore 9200T storage appliance.
Tests showed significant improvements in transaction throughput, latency reduction, and CPU efficiency. See Figure 1 for a summary of relative improvements.
The Bottom Line | |
Dell PowerEdge R7625 with AMD EPYC processors & Emulex LPe36002 64G HBA using NVMe/FC: | |
1 | Improved database transactions by 38% |
2 | Reduced database stored procedure latency by 35% |
Overview
The goal of this test was to illustrate the performance benefits of using the newer, more-efficient NVMe/FC protocol in lieu of the older, less-efficient SCSI/FC protocol in conjunction with Emulex 64G FC HBAs running under Linux in a Dell PowerEdge R7625 Rack Server. (Dell sells the Emulex 64G FC HBA for the same price as the Emulex 32G FC HBA.)
The test was run using Microsoft SQL Server 2019 for Linux accessing the database via SCSI and then via NVMe.
While low-level component benchmarks are instructive, ultimately system architects are rightly most interested in how network-level improvements can translate into application performance improvements. This benchmarking was done with HammerDB which generates actual user transactions against an actual database. The test was focused on TPROC-C which is the HammerDB, database-oriented implementation of the de facto standard TPC-C online transaction processing benchmark.
Tests showed significant improvements in key benchmarks.
Test Results
Microsoft SQL Server 2019 for Linux
Transaction Processing. The NVMe/FC results were significantly better than the SCSI/FC results. When run over NVMe/FC, 38% more transactions per minute were processed.
CPU Efficiency. The NVMe/FC results were significantly better than the SCSI/FC results. When run over NVMe/FC, the CPU efficiency was improved by 50%.
P95 Stored Procedure Latency. Similarly, the NVMe/FC results were significantly better than the SCSI/FC results. When run over NVMe/FC, the latency was reduced by 35%.
Test Setup & Methodology
The HBA under test used current production drivers that are publicly available. Default settings were used. Details of the test environment and systems under test are found in Tables 1-5. Figure 2 shows a composite test environment.
Database Test
The goal of this test was to benchmark the database transaction performance of each HBA running the HammerDB “TPROC-C” workload which, as noted earlier, is the HammerDB, database version of the Transaction Processing Council’s TPC-C OLTP benchmarked
A Dell PowerEdge R7625 server, powered by AMD EPYC processors, was configured with the HBA under test. The Broadcom Emulex LPe36002 64G HBA connected to a Dell PowerStore 9200T via a Dell Connectrix 64G Fibre Channel switch. The test utilized a single 64G FC port of the Emulex HBA.
The server ran RHEL 8.9. SCSI Device Mapper and NVMe native multipath were enabled for the respective devices. NUMA was set to off and “transparent huge pages” was disabled.
For storage, path selection policy for NVMe native multipath was set to “round-robin". For SCSI Device mapper multipath was set to "queue-length 0”.
This test was run using Microsoft SQL Server 2019 for Linux,
The open source HammerDB test tool was used to populate the database schema and run the workload.
Table 1. HBA Under Test
Vendor | Product Name | Firmware | Driver |
Broadcom | Emulex LPe36002 (64G) (PCIe 4.0) | 14.0.539.26 | 14.0.0.15 |
Table 2. Server Configuration
Vendor/System | Dell PowerEdge R7625 |
CPU | 2 socket AMD EPYC 9374F 32-Core Processor @ 3.8 GHz |
Number of CPUs | 128 logical processors. Profile: Performance, Logical Processors: Enabled, Sub Numa Clustering: Disabled |
Memory (RAM) | 256 GB |
Power Mode
| Performance |
OS | Red Hat Ent. Linux 8.9 (RHEL8) |
Kernel | 4.18.0-425.3.1 |
Table 3. Microsoft Database Configuration
Database | Microsoft SQL Server 2019 for Linux |
Storage | Single volume, XFS |
Dataset Size | 100 GB |
DB Memory Allocation | 10G |
Table 4. Database Test Tool
Vendor | Open Source |
Application | HammerDB 4.7 |
TPROC-C settings | Total # of Warehouses = 1,000 Transactions per user = 1 million Ramp-up time: 2 minutes Run time: 5 minutes |
Table 5. Storage Configuration
Vendor/Device | Dell PowerStore 9200T v3.5 |
Ports | 8 x 32G FC |
Volume Size | 1,024GB volume each for NVMe/FC and SCSI/FC |
Namespace/LUN | 8 x 32G target ports (single namespace) |
Network Fabric | Dell Connectrix 64G FC switch v9.0.1a |
About AMD
For over 50 years, AMD has been at the forefront of driving innovation in high-performance computing, graphics, and visualization technologies. Their products are relied upon by billions of people, leading Fortune 500 businesses, and cutting-edge scientific research institutions worldwide. AMD's mission is to build exceptional products that accelerate next-generation computing experiences and power solutions for the world's most important challenges. Visit http://www.amd.com for more information about AMD.
Broadcom Emulex LPe36002
The Broadcom Emulex LPe36000-series Gen 7 Fibre Channel HBAs are designed for demanding mission-critical workloads and emerging applications. The family of adapters features Silicon Root of Trust security, designed to thwart firmware attacks aimed at enterprises and governments.
Gen 7 64G provides seamless backward compatibility to 32G and 16G networks.
Dell sells the LPe36002 64G HBA for the same price as the 32G model.
About Tolly
The Tolly Group companies have been delivering world-class IT services for over 30 years. Tolly is a leading global provider of third-party validation services for vendors of IT products, components and services.
You can reach the company by E-mail at sales@tolly.com, or by telephone at +1 561.391.5610.
Visit Tolly on the Internet at: http://www.tolly.com
Tolly Terms Of Usage
The Tolly Gro This document is provided, free-of-charge, to help you understand whether a given product, technology, or service merits additional investigation for your particular needs. Any decision to purchase a product must be based on your own assessment of suitability based on your needs. The document should never be used as a substitute for advice from a qualified IT or business professional. This evaluation was focused on illustrating specific features and/or performance of the product(s) and was conducted under controlled, laboratory conditions. Certain tests January have been tailored to reflect performance under ideal conditions; performance January vary under real-world conditions. Users should run tests based on their own real-world scenarios to validate performance for their own networks.
Reasonable efforts were made to ensure the accuracy of the data contained herein but errors and/or oversights can occur. The test/audit documented herein January also rely on various test tools the accuracy of which is beyond our control. Furthermore, the document relies on certain representations by the sponsor that are beyond our control to verify. Among these is that the software/hardware tested is production or production track and is, or will be, available in equivalent or better form to commercial customers. Accordingly, this document is provided "as is," and Tolly Enterprises, LLC (Tolly) gives no warranty, representation or undertaking, whether express or implied, and accepts no legal responsibility, whether direct or indirect, for the accuracy, completeness, usefulness, or suitability of any information contained herein. By reviewing this document, you agree that your use of any information contained herein is at your own risk, and you accept all risks and responsibility for losses, damages, costs, and other consequences resulting directly or indirectly from any information or material available on it. Tolly is not responsible for, and you agree to hold Tolly and its related affiliates harmless from any loss, harm, injury, or damage resulting from or arising out of your use of or reliance on any of the information provided herein.
Tolly makes no claim as to whether any product or company described herein is suitable for investment. You should obtain your own independent professional advice, whether legal, accounting or otherwise, before proceeding with any investment or project related to any information, products or companies described herein. When foreign translations exist, the English document is considered authoritative. To assure accuracy, only use documents downloaded directly from Tolly.com. No part of any document January be reproduced, in whole or in part, without the specific written permission of Tolly. All trademarks used in the document are owned by their respective owners. You agree not to use any trademark in or as the whole or part of your own trademarks in connection with any activities, products or services which are not ours, or in a manner which January be confusing, misleading, or deceptive or in a manner that disparages us or our information, projects or developments.