Dell PowerEdge C6615 Delivers World Record VMmark Power-Performance Results
Download PDFThu, 18 Apr 2024 16:24:02 -0000
|Read Time: 0 minutes
Abstract
The Dell PowerEdge C6615 has set a new VMmark 3 world record. This result is the world’s highest power-performance score achieved on VMmark 3 using vSAN storage. This configuration also achieved the highest 4-node server power-performance score and the highest server-storage power-performance score. All of these records illustrate the sheer efficiency, scalability, and performance-density of the C6615 modular platform.
This document summarizes the VMmark 3 benchmark results for the PowerEdge C6615 published on 1/23/2024. It lists the results, summarizes the major configuration details, and links to the results on the VMmark website.
Benchmark results
- Performance Score: 17.42 @ 16tiles
- Server Power-Performance: 18.2000 @ 16 tiles
- Server Storage Power-Performance: 11.0600 @ 16 tiles
What do these scores mean?
The following world records were achieved:
- highest power-performance score achieved on VMMark 3 using vSAN storage.
- highest 4-node server power-performance score
- highest server-storage power-performance score
The Dell PowerEdge C6615 is a modular server node in the C6600 chassis with the AMD 4th Gen EPYC 8004 series of processors. This configuration allows for four 1-S nodes in a single 2U chassis, enabling high performance density and power efficiency, especially when using local storage for vSAN.
The configuration also leveraged the latest 96GB DDR5 DIMMs that offer exceptional value and capacity for a variety of use cases.
These results showcase the great performance density, power efficiency, and scalability of the Dell PowerEdge C6615 servers for virtualization use cases. VMmark is a benchmark standard for today’s virtualized applications in the datacenter.
Notes
- Results referenced are current as of April 3rd. 2024.
- To view all VMmark 3 results, see VMmark 3.x Results (vmware.com).
- This benchmark was run by Dell in the Dell SPA lab and audited by VMware.
Key configuration details
We used 4x Dell PowerEdge C6615 server nodes in a single C6600 chassis.
Each node had the following configuration:
- 1x AMD EPYC 8534P 64 core processors at 2.3GHz (AMD EPYC 8534P | AMD)
- 6x 96 GB Dual Rank x4 DDR5 4800MT/s RDIMM (576GB total)
- Dell Customized Image of VMware ESXi 8.0 U2, Build 22380479
- VMware vCenter Server 8.0 U2a
- Boot Storage: Dell EC NVMe ISE 7400 RI M.2 960GB
- vSAN Storage:
- 1x Disk Group per Host
- 4x Dell Ent NVMe v2 AGN MU U.2 6.4TB
- Networking:
- 1x Mellanox CX-6 Dx Dual Port 100GbE QSFP56 Adapter
- 1x Broadcom Advanced Dual 25Gb Ethernet Adapter
About the Dell PowerEdge C6615
The Dell PowerEdge C6615 node with AMD EPYC 8004 series of processors is designed to maximize value and minimize TCO for scale-out workloads, using a scalable dense compute infrastructure focused on performance per watt, per dollar.
Some of the key features include:
- AMD 4th Gen EPYC up to 64 cores/CPU
- Single-socket only
- Enables DDR5 at 4800 MT/s memory and PCIe Gen5 with double the speed of previous PCIe Gen4 for faster access and transport of data, to optimize application output
- Fully featured systems management with iDRAC, and OpenManage Enterprise, and CloudIQ
These features allow you to maximize security with the PowerEdge Cyber Resilient Architecture.
Figure 1. The PowerEdge C6615 sled and the PowerEdge C6600 chassis
What is VMmark 3.0?
VMmark is a free tool used by hardware vendors and others to measure the performance, scalability, and power consumption of virtualization platforms.
Figure 2. Outline of the VMmark benchmark components
The VMmark benchmark combines commonly-virtualized applications into predefined bundles called "tiles". This version of the benchmark has 19 unique virtual machines per tile. The number of VMmark tiles a virtualization platform can run, as well as the cumulative performance of those tiles and of a variety of platform-level workloads, determine the VMmark 3 score.
References
- For details about VMmark, see VMmark (vmware.com).
- PowerEdge C6615 Server Node
Related Documents
Dell PowerEdge 16G Server BIOS Settings for Optimized Performance: R7625, R6625, R7615, R6615, C6615
Tue, 26 Mar 2024 22:46:05 -0000
|Read Time: 0 minutes
BIOS setting recommendations
The following tables provide the BIOS setting recommendations for the latest generation of PowerEdge servers:
Table 1. BIOS setting recommendations - System profile settings
System setup screen | Setting | BIOS Defaults | SPEC cpu2017 int rate (General Purpose Performance) | SPEC cpu2017 fp rate | SPEC cpu2017 int speed | SPEC cpu2017 fp speed | Memory Throughput | HPC | Latency |
System profile setting | System profile | Performance Per Watt | Custom | Custom | Custom | Custom | Custom | Custom | Custom |
System profile setting[*] | CPU Power Management | OS DBPM | OS DBPM | OS DBPM | OS DBPM | OS DBPM | OS DBPM | Max Performance | Max Performance |
System profile setting | Memory Frequency | Max Performance | Max Performance | Max Performance | Max Performance | Max Performance | Max Performance | Max Performance | Max Performance |
System profile setting | Turbo Boost | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
System profile setting | C-States | Enabled | Enabled | Enabled | Disabled | Disabled | Disabled | Disabled | Disabled |
System profile setting | Write Data CRC | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Enabled | Disabled |
System profile setting | Memory Patrol Scrub | Standard | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
System profile setting | Memory Refresh Rate | 1x | 1x | 1x | 1x | 1x | 1x | 1x | 1x |
System profile setting | Workload Profile | not configured | not configured | not configured | not configured | not configured | not configured | HPL | not configured |
System profile setting | PCI ASPM L1 Link Power Management | Enabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
System profile setting | Determinism Slider | Performance Determinism | Power Determinism | Power Determinism | Power Determinism | Power Determinism | Power Determinism | Power Determinism | Power Determinism |
System profile setting | Power Profile Select | High Performance Mode | High Performance Mode | High Performance Mode | High Performance Mode | High Performance Mode | High Performance Mode | High Performance Mode | High Performance Mode |
System profile setting | PCIE Speed PMM Control | Auto | Auto | Auto | Auto | Auto | Auto | Auto | (GEN 5) |
System profile setting | EQ Bypass To Highest Rate | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
System profile setting | DF PState Frequency Optimizer | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
System profile setting | DF PState Latency Optimizer | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
System profile setting | DF CState | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
System profile setting | Host System Management Port (HSMP) Support | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
System profile setting | Boost FMax | 0-Auto | 0-Auto | 0-Auto | 0-Auto | 0-Auto | 0-Auto | 0-Auto | 0-Auto |
System profile setting | Algorithm Performance Boost Disable (ApbDis) | Disabled | Disabled | Disabled | Enabled | Enabled | Disabled | Disabled | Enabled |
System profile setting | ApbDis Fixed Socket P-State[2] | P0 | P0 | P0 | |||||
System profile setting | Dynamic Link Width Management (DLWM) | Unforced | Unforced | Unforced | Unforced | Unforced | Unforced | Unforced | Forced x16 |
[*] For C6615, apply setting from Table 3.
[1] Depends on how system was ordered. Other System Profile defaults are driven by this choice and may be different than the examples listed. Select Performance Profile first, and then select Custom to load optimal profile defaults for further modification.
[2] Pstate field is dependent on Algorithm Performance Boost Disable (ApbDis) and is visible only when it is enabled.
Table 2. BIOS setting recommendations – Memory, processor, and iDRAC settings
System setup screen | Setting | BIOS Defaults | SPEC cpu2017 int rate (General Purpose Performance) | SPEC cpu2017 fp rate | SPEC cpu2017 int speed | SPEC cpu2017 fp speed | Memory Throughput | HPC | Latency |
Memory settings | System Memory Testing | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Memory settings | DRAM Refresh Delay | Minimum | Performance | Performance | Performance | Performance | Performance | Performance | Performance |
Memory settings | Correctable memory ECC SMI | Enabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Memory settings | Uncorrectable Memory Error (DIMM Self healing on uncorrectable memory) | Enabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Memory settings | Correctable Error Logging | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | Logical Processor | Enabled | Enabled | Disabled[1] | Disabled[1] | Disabled[1] | Disabled[1] | Disabled[1] | Disabled[1] |
Processor settings | Virtualization Technology | Enabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | IOMMU Support | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
Processor settings | Kernel DMA Protection | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | L1 Stream HW Prefetcher | Enabled | Enabled | Disabled | Enabled | Enabled | Enabled | Enabled | Enabled |
Processor settings | L2 Stream HW Prefetcher | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
Processor settings | L1 Stride Prefetcher | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
Processor settings | L1 Region Prefetcher | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
Processor settings | L2 Up Down Prefetcher | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
Processor settings | MADT Core Enumeration | Linear | Linear | Linear | Linear | Linear | Linear | Linear | Linear |
Processor settings[*] | NUMA Node Per Socket | 1 | 4 | 4 | 4 | 1 | 4 | 4 | 4 |
Processor settings | L3 cache as NUMA | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | Secure Memory Encryption | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | Minimum SEV no-ES ASID | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Processor settings | SNP Memory Coverage | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | Secure Nested Paging | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | Transparent Secure Memory Encryption | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled | Disabled |
Processor settings | ACPI CST C2 Latency | 800 | 18 | 18 | 18 | 18 | 800 | 18 | 800 |
Processor settings | Configurable TDP | Maximum | Maximum | Maximum | Maximum | Maximum | Maximum | Maximum | Maximum |
Processor settings | x2APIC Mode | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled | Enabled |
Processor settings | Number of CCDs per Processor | All | All | All | All | All | All | All | All |
Processor settings | Number of Cores per CCD | All | All | All | All | All | All | All | All |
iDRAC settings | Thermal Profile | Default | Maximum Performance | Maximum Performance | Maximum Performance | Maximum Performance | Maximum Performance | Maximum Performance | Maximum Performance |
[*] For C6615, apply setting from Table 3.
[1] Logical Processor (Hyper Threading) tends to benefit throughput-oriented workloads such as SPEC CPU2017. Many HPC workloads disable this option.
Table 3. BIOS setting recommendations specific to C6615 (apply remaining settings from Table 1 and 2)
System setup screen | Setting | BIOS Defaults | SPEC cpu2017 int rate (General Purpose Performance) | SPEC cpu2017 fp rate | SPEC cpu2017 int speed | SPEC cpu2017 fp speed | Memory Throughput | HPC | Latency |
Processor settings | NUMA Node per Socket | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 2 |
System profile setting | CPU Power Management | OS DBPM | OS DBPM | OS DBPM | OS DBPM | OS DBPM | OS DBPM | OS DBPM | OS DBPM |
iDRAC recommendations
Following are what we would recommend for an iDRAC environment:
- Thermally challenged environments should increase fan speed through iDRAC Thermal Section.
- All Power Capping should be removed in performance-sensitive environments.
Glossary
System profile: (Default="Performance Per Watt")
To assist the average customer in setting each individual power/performance feature for their specific environment, a menu option is provided that can help a customer optimize the system for factors such as minimum power usage/acoustic levels, maximum efficiency, Energy Star optimization, and maximum performance.
Performance Per Watt OS mode optimizes the performance/watt efficiency with a bias towards performance. It is the favored mode for Energy Star. Note that this mode is slightly different than Performance Per Watt DAPC mode. In this mode, no bus speeds are derated, leaving the OS in charge of making those changes.
Custom allows the user to individually modify any of the low-level settings that are preset and unchangeable in any of the other four preset modes.
C-States
C-states reduce CPU idle power. There are three options in this mode: Legacy, Autonomous, and Disable.
Enabled: When “Enabled” is selected, the operating system initiates the C-state transitions. Some OS SW may defeat the ACPI mapping, such as intel_idle driver.
Autonomous: When "Autonomous" is selected, HALT and C1 requests get converted to C6 requests in hardware.
Disable: When "Disable" is selected, only C0 and C1 are used by the OS. C1 gets enabled automatically when an OS autohalts.
CPU Power Management
CPU Power Management allows the selection of CPU power management methodology. Maximum Performance is typically selected for performance-centric workloads where it is acceptable to consume additional power to achieve the highest possible performance for the computing environment. This mode drives processor frequency to the maximum across all cores (although idled cores can still be frequency-reduced by C-States enforcement through BIOS or OS mechanisms if enabled). This mode also offers the lowest latency of the CPU Power Management Mode options, so it is always preferred for latency-sensitive environments. OS DBPM is another Performance Per Watt option that relies on the operating system to dynamically control individual core frequency. Both Windows and Linux can take advantage of this mode to reduce the frequency of idled or underutilized cores in order to save power. This will be Read-only unless System Profile is set to Custom.
Memory Frequency
Memory Frequency governs the BIOS memory frequency. The variables that govern maximum memory frequency include the maximum rated frequency of the DIMMs, the DIMMs per channel population, the processor choice, and this BIOS option. Additional power savings can be achieved by reducing the memory frequency at the expense of reduced performance. This will be Read-only unless System Profile is set to Custom.
Turbo Boost
Turbo Boost governs the Boost Technology. This feature allows the processor cores to be automatically clocked up in frequency beyond the advertised processor speed. The amount of increased frequency (or 'turbo upside') one can expect from an EPYC processor depends on the processor model, thermal limitations of the operating environment, and in some cases power consumption. In general terms, the fewer cores being exercised with work, the higher the potential turbo upside. The potential drawbacks for Boost are mainly centered on increased power consumption and possible frequency jitter that can affect a small minority of latency-sensitive environments. This will be Read-only unless System Profile is set to Custom.
Memory Patrol Scrub
Memory Patrol Scrubbing searches the memory for errors and repairs correctable errors to prevent the accumulation of memory errors. When set to Disabled, no patrol scrubbing will occur. When set to Standard Mode, the entire memory array will be scrubbed once in a 24-hour period. When set to Extended Mode, the entire memory array will be scrubbed more frequently to further increase system reliability. This will be Read-only unless System Profile is set to Custom.
Memory Refresh Rate
The memory controller will periodically refresh the data in memory. The frequency at which memory is normally refreshed is referred to as 1X refresh rate. When memory modules are operating at a higher-than-normal temperature or to further increase system reliability, the refresh rate can be set to 2X, however this may have a negative impact on memory subsystem performance under certain circumstances. This will be Read-only unless System Profile is set to Custom.
PCI ASPM L1 Link Power Management
When enabled, PCIe Advanced State Power Management (ASPM) can reduce overall system power while slightly reducing system performance.
Note: Some devices may not perform properly (they may hang or cause the system to hang) when ASPM is enabled; for this reason, L1 will only be enabled for validated qualified cards.
Determinism Slider
The Determinism Slider controls whether BIOS will enable determinism to control performance.
Performance Determinism: BIOS will enable 100% deterministic performance control.
Power Determinism: BIOS will not enable deterministic performance control.
Power Profile Select
High Performance Mode (default): Favors core performance. All DF P-States are available in this mode, and the default DF P-State and DLWM algorithms are active.
Efficiency Mode: Configures the system for power efficiency. Limits boost frequency available to cores and restricts DF P-States available in the system. Maximum IO.
Performance Mode: Sets up Data Fabric to maximize IO sub-system performance.
Algorithm Performance Boost Disable (ApbDis)
When enabled, a specific hard-fused Data Fabric (SoC) P-state is forced for optimizing workloads sensitive to latency or throughput. For higher performance and power savings, when disabled, P-states will be automatically managed by the Application Power Management, allowing the processor to provide maximum performance while remaining within a specified power-delivery and thermal envelope.
ApbDis Fixed Socket P-State
This value defines the forced P-state when ApbDis is enabled.
Dynamic Link Width Management (DLWM)
DLWM reduces the XGMI link width between sockets from x16 to x8 (default) when no traffic is detected on the link. As with Data Fabric and Memory P-states, this feature is optimized to trade power between core and high IO/memory bandwidth workloads.
Forced: Force link width to x16, x8, or x2.
Unforced: Link width will be managed by DLWM engine.
System Memory Testing
System Memory Testing indicates whether or not the BIOS system memory tests are conducted during POST. When set to Enabled, memory tests are performed.
Note: Enabling this feature will result in a longer boot time. The extent of the increased time depends on the size of the system memory.
Dram Refresh Delay
By enabling the CPU memory controller to delay running the REFRESH commands, you can improve the performance for some workloads. By minimizing the delay time, it is ensured that the memory controller runs the REFRESH command at regular intervals. For Intel-based servers, this setting only affects systems configured with DIMMs which use 8 Gb density DRAMs.
Correctable Memory ECC SMI
Allows the system to log ECC-corrected DRAM errors into the SEL log. Logging these rare errors can help identify marginal components, however the system will pause for a few milliseconds after an error while the log entry is created. Latency-conscious customers may want to disable the feature. Spare Mode and Mirror mode require this feature to be enabled.
DIMM Self Healing (Post Package Repair) on Uncorrectable Memory Error Enable/Disable Post Package Repair (PPR) on Uncorrectable Memory Error.
Correctable Error Logging
Enable/Disable logging of correctable memory threshold error.
Logical Processor
Each processor core supports up to two logical processors. When set to Enabled, the BIOS reports all logical processors. When set to Disabled, the BIOS only reports one logical processor per core. Generally, a higher processor count results in increased performance for most multi-threaded workloads, and the recommendation is to keep this enabled. However, there are some floating point/scientific workloads, including HPC workloads, where disabling this feature may result in higher performance.
Virtualization Technology
When set to Enabled, the BIOS will enable processor Virtualization features and provide the virtualization support to the Operating System (OS) through the DMAR table. In general, only virtualized environments such as VMware(r) ESX(tm), Microsoft Hyper-V(r), Red Hat(r) KVM, and other virtualized operating systems will take advantage of these features. Disabling this feature is not known to significantly alter the performance or power characteristics of the system, so leaving this option Enabled is advised for most cases.
IOMMU Support
Enable or Disable IOMMU support. Required to create IVRS ACPI Table.
Kernel DMA Protection
When set to Enabled, using IOMMU, BIOS & Operating System will enable direct memory access protection for DMA-capable peripheral devices. Enable IOMMU Support to use this option.
L1 Stream HW Prefetcher
When set to Enabled, the processor provides advanced performance tuning by controlling the L1 stream HW prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.
L2 Stream HW Prefetcher
When set to Enabled, the processor provides advanced performance tuning by controlling the L2 stream HW prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.
L1 Stride Prefetcher
When set to Enabled, the processor provides additional fetch to the data access for an individual instruction for performance tuning by controlling the L1 stride prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.
L1 Region Prefetcher
When set to Enabled, the processor provides additional fetch to data along with the data access to the given instruction for performance tuning by controlling the L1 region prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.
L2 Up Down Prefetcher
When set to Enabled, the processor uses memory access to determine whether to fetch next or previous for all memory accesses for advanced performance tuning by controlling the L2 up/down prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.
MADT Core Enumeration
This field determines how BIOS enumerates processor cores in the ACPI MADT table. When set to Round Robin, processor cores are enumerated in a Round Robin order to evenly distribute interrupt controllers for the OS across all Sockets and Dies. When set to Linear, processor cores are enumerated across all Dies within a Socket before enumerating additional Sockets for a linear distribution of interrupt controllers for the OS.
NUMA Nodes Per Socket
This field specifies the number of NUMA nodes per socket. The Zero option is for 2 socket configurations.
L3 cache as NUMA Domain
This field specifies that each CCX within the processor will be declared as a NUMA Domain.
Secure Memory Encryption
This field enables or disables AMD secure encryption features such as Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV). In addition to enabling this option, SME must be supported and activated by the operating system. Similarly, SEV must be supported and activated by the hypervisor. This option also determines if other secure encryption feature such as TSME and SEV-SNP features can be enabled.
Minimum SEV non-ES ASID
This field determines the number of Secure Encrypted Virtualization (SEV) Encrypted States (ES) and non-ES available Address Space IDs. The number specified is the dividing line between ES and non-ES ASIDs. The register save state area is also encrypted along with the entire guest memory area. The maximum number of ASIDs available depends on installed CPU and memory configuration which can either be 15, 253, or 509. The default value is 1, and the value entered by user means the number of non-ES ASIDs starts from the value entered and ends at the maximum number of ASIDs available. A value of 1 means there are only non-ES ASIDs available. For example, if the maximum number of ASIDs is 15, the default value 1 means there are 15 SEV non-ES ASIDs and 0 SEV ES ASIDs. Alternatively, if the maximum number of ASIDs is 15, the value 4 means there are 12 SEV non-ES ASIDs and 3 SEV ES ASIDs. Further, if the maximum number of ASIDs is 509, the value 40 means there are 470 SEV non-ES ASIDs and 39 SEV ES ASIDs.
Secure Nested Paging
This option enables or disables SEV-SNP, a set of additional security protections.
SNP Memory Coverage
This option selects the operating mode of the Secure Nested Paging (SNP) Memory and the Reverse Map Table (RMP). The RMP is used to ensure a one-to-one mapping between system physical addresses and guest physical addresses.
Transparent Secure Memory Encryption
This field enables or disables Transparent Secure Memory Encryption (TSME). TSME is always-on memory encryption that does not require operating system or hypervisor support. If the operating system supports SME, this field does not need to be enabled. If the hypervisor supports SEV, this field does not need to be enabled. Enabling TSME affects system memory performance.
ACPI CST C2 Latency
Enter in 18 - 1000 microseconds (decimal value). Larger C2 latency values will reduce the number of C2 transitions and reduce C2 residency. Fewer transitions can help when performance is sensitive to the latency of C2 entry and exit. Higher residency can improve performance by allowing higher frequency boost and reduce idle core power. With Linux kernel 6.0 or later, the C2 transition cost is significantly reduced. The best value will be dependent on kernel version, use case, and workload.
Configurable TDP
Configurable TDP allows the reconfiguration of the processor Thermal Design Power (TDP) levels based on the power and thermal delivery capabilities of the system. TDP refers to the maximum amount of power the cooling system is required to dissipate.
Note: This option is only available on certain SKUs of the processors, and the number of alternative levels varies as well.
x2APIC Mode
Enable or Disable x2APIC mode. Compared to the traditional xAPIC architecture, x2APIC extends processor addressability and enhances interrupt delivery performance.
Number of CCDs per Processor
This field enables the number of CCDs per Processor.
Number of Cores per CCD
This field enables the number of Cores per CCD.
Authors: Charan Soppadandi, Chris Cote, Donald Russell, Kavya AR
Benchmark Performance of AMD EPYC™ Processors
Mon, 16 Jan 2023 23:40:30 -0000
|Read Time: 0 minutes
Summary
The Dell PowerEdge portfolio of servers with AMD EPYC processors have achieved several world record benchmark scores including VMMark, TPCx-HS and SAP-SD. These scores demonstrate the advantage of these servers for key business workloads.
AMD 3rd Generation AMD EPYC™ Processors
The 3rd Generation of AMD EPYC™ Processors builds on the AMD Infinity Architecture to provide full features and functionality for both one-socket and two-socket x86 server options. The processor retains the chiplet design from the 2nd generation, with a 12nm based IO die surrounded by 7nm based compute dies and is a drop-in replacement option. With a range of options, from 8 cores all the way to 64 cores, and TDPs of up to 280W, these processors can target a wide variety of workloads. Configurations support up to 160 lanes of PCIe Gen4 allowing for options like 24x direct attach NVMe drives and dual port 100Gbps NICs that can run at line rate.
Key New Features
The AMD 3rd Generation of processors builds on the previous generation but adds a few key optimizations that deliver significant performance improvements. The L3 cache in each CCD is now shared across all 8 cores instead of just 4. Thus, each core has up to 32MB of L3 cache allowing for flexibility, lower inter-core latency, and improved cache performance. The DDR memory latency has been further reduced along with a new 6-channel memory interleaving option. The IO memory management unit has been optimized to better handle 200Gbps line rate. There is improved support for Hot Plug surprise remove following PCIe-SIGs new implementation guideline. New features like SEV-SNP (Secure Nested Paging) provide enhanced virtualization security. There are a few other enhancements and optimizations targeting workloads around HPC, etc.
What does this mean?
Technical specifications can only explain part of the story. Key workload-based benchmarks can explain some of the real-world applicability and real-world performance. Dell has worked to publish multiple key benchmarks to help customers gauge the real-world performance of the various Dell PowerEdge servers with AMD EPYC Processors
Key Benchmarks
Some of the key benchmarks that are relevant to typical use cases are listed below:
VMMark:
VMMark is a benchmark from VMWare that highlights virtualization. The VMMark benchmark runs multiple tiles on the system under test. Each tile consists of 19 different virtual machines, with each running a typical workload. This benchmark is great at identifying the capabilities for a typical IT server where the workloads are virtualized, and multiple workloads are running on a single server.
For AMD, EPYC, the large number of cores, the high speed memory, the high speed PCIe Gen4 for networking, and, when used, storage, all contribute towards a very positive result.
As of 3/15/2021, Dell has top scores on 4-node vSAN configurations with the R7515, the R6525 and the C6525. The R7515 and R6525 are 1 and 2-Socket servers with scores of 15.18@16 tiles and 24.08@28 tiles respectively. The C6525 is a modular server with 4-nodes in a single 2-U system with a score of 13.74@16 tiles and highlights the sheer density of compute possible in 2U of rack space.
Dell also has a leading score for a matched pair 2-socket configuration connected to a Dell EMC PowerMax. This configuration managed a score of 19.4@22 tiles from just 2 servers, achieving the maximum VM density for such a configuration. This score highlights the advantages of leveraging an excellent external storage array like the Dell EMC PowerMax to maximize reliability and performance.
Reference: https://www.vmware.com/products/vmmark/results3x.html
TPCx-HS
The TPCx-HS benchmark is built to showcase the performance of a Hadoop cluster doing data analytics. In today’s world, where data is critical, the ability to analyze and manage this data becomes very important. The benchmark can do batch processing with MapReduce or data analytics on Spark
As of 3/15/2021, Dell PowerEdge servers with AMD EPYC 3rd generation of processors have multiple world record scores for TPCx-HS at both the 1TB and 3TB database sizes. These include performance improvements of as much as 60% over the previous world records with as much as 40% lower $/HSph.
Reference: http://tpc.org/tpcx-hs/results/tpcxhs_perf_results5.asp?version=2
SAP-SD
SAP-Sales and Distribution is a core functional module in SAP ERP Central Component that allows organizations to store and manage customer and product related data. The ability to access and manage this data at high speed, and with minimal latency is a very critical requirement of the business architecture.
For this benchmark, Dell PowerEdge servers have world record scores on Windows and Linux for both 1-S and 2-S platforms. The 2-S Linux configuration score of 75000 benchmark users is higher than even the best 4-S score for this benchmark, highlighting the significant advantage of this architecture for database use cases.
Reference: https://www.sap.com/dmc/exp/2018-benchmark-directory/#/sd
Conclusion
Dell PowerEdge servers with AMD EPYC processors have industry leading performance numbers. Benchmarks like VMMark, TPCx-HS and SAP-SD show that these platforms are excellent for the most common workloads and provide excellent business value.