Siemens’ Simcenter STAR-CCM+ Performance with AMD EPYC 7003 Series Processors
Thu, 18 Mar 2021 16:39:54 -0000
|Read Time: 0 minutes
Introduction
This blog discusses the performance of Siemens’ Simcenter STAR-CCM+ on the Dell EMC Ready Solutions for HPC Digital Manufacturing with AMD EPYC 7003 series processors. This Dell EMC Ready Solutions for HPC was designed and configured specifically for digital manufacturing workloads, where computer aided engineering (CAE) applications are critical for virtual product development. The Dell EMC Ready Solutions for HPC Digital Manufacturing uses a flexible building block approach to HPC system design, where individual building blocks can be combined to build HPC systems which are optimized for customer specific workloads and use cases.
The Dell EMC Ready Solutions for HPC Digital Manufacturing is one of many solutions in the Dell EMC HPC solution portfolio. Please visit www.dellemc.com/hpc for a comprehensive overview of the HPC solutions offered by Dell EMC.
Benchmark System Configuration
Performance benchmarking was performed using dual-socket Dell EMC PowerEdge servers with 7002 and 7003 series AMD EPYC processors. All servers were populated with two processors and one DIMM per channel memory configuration. The system configurations used for the performance benchmarking are shown in Table 1 and Table 2. The BIOS configuration used for the benchmarking systems is shown in Table 3.
Table 1. 7002 Series AMD EPYC System Configuration
Server | Dell EMC PowerEdge C6525 |
Processor | 2x AMD EPYC 7532 32-core Processors |
Memory | 16x16GB 3200 MTps RDIMMs |
BIOS Version | 1.4.8 |
Operating System | Red Hat Enterprise Linux Server release 7.6 |
Kernel Version | 3.10.0-957.27.2.el7.x86_64 |
Table 2. 7003 Series AMD EPYC System Configuration
Server | Dell EMC PowerEdge R6525 |
Processors | 2x AMD EPYC 7713 64-Core Processors 2x AMD EPYC 7543 32-Core Processors |
Memory | 16x16GB 3200 MTps RDIMMs |
BIOS Version | 2.0.1 |
Operating System | Red Hat Enterprise Linux Server release 8.3 |
Kernel Version | 4.18.0-240.el8.x86_64 |
Table 3. BIOS Configuration
System Profile | Performance Optimized |
Logical Processor | Disabled |
Virtualization Technology | Disabled |
NUMA Nodes Per Socket | 4 |
Software Versions
Application software versions are as described in Table 4.
Table 4. Software Versions
Simcenter STAR-CCM+ | 2020.3.1 mixed precision with Open MPI 4 |
Siemens’ Simcenter STAR-CCM+ Performance
Simcenter STAR-CCM+ is a multiphysics software application used to simulate a wide range of products and designs under a variety of conditions. The benchmarks reported here mainly use the computational fluid dynamics (CFD) and heat transfer features of STAR-CCM+. CFD applications typically scale well across multiple processor cores and servers, have modest memory capacity requirements, and typically perform minimal disk I/O while solving. However, some simulations may have greater I/O demands, such as transient analysis.
The benchmark cases from the standard STAR-CCM+ benchmark suite were evaluated on the systems. The benchmark results reported here are single-server performance results, with the benchmark run using all processor cores available in the server. STAR-CCM+ benchmark performance is measured using the Average Elapsed Time metric which is the average elapsed time per solver iteration. A smaller elapsed time represents better performance. Figure 1 shows the relative performance results for a selection of the STAR‑CCM+ benchmarks.
Figure 1. Simcenter STAR-CCM+ Single Server Performance
The results in Figure 1 are plotted relative to the performance of a single server configured with AMD EPYC 7532 processors. Larger values indicate better overall performance. These results show the performance improvement available with 7003 series AMD EPYC processors. The 32-core AMD EPYC 7543 processor provides good performance for these benchmarks. Per server, the 64-core AMD EPYC 7713 provides a significant performance advantage over the 32-core processors.
Conclusion
The results presented in this blog show that 7003 series AMD EPYC processors offer a significant performance improvement for Siemens’ Simcenter STAR-CCM+ relative to 7002 series AMD EPYC processors.
Related Blog Posts
The Dell PowerEdge C6615: Maximizing Value and Minimizing TCO for Dense Compute and Scale-out Workloads
Mon, 02 Oct 2023 21:35:01 -0000
|Read Time: 0 minutes
In the ever-evolving landscape of data centers and IT infrastructure, meeting the demands of scale-out workloads is a continuous challenge. Organizations seek solutions that not only provide superior performance but also optimize Total Cost of Ownership (TCO).
Enter the new Dell PowerEdge C6615, a modular node designed to address these challenges with innovative solutions. Let's delve into the key features and benefits of this groundbreaking addition to the Dell PowerEdge portfolio.
Industry challenges
- Maximizing Rack utilization: One of the primary challenges in the data center world is maximizing rack utilization. The Dell PowerEdge C6615 addresses this by offering dense compute options.
- Cutting-edge processors: High-performance processors are crucial for scalability and security. The C6615 is powered by a 4th Generation AMD EPYC 8004 series processor, ensuring top-tier performance.
- Total Cost of Ownership (TCO): TCO is a critical consideration that encompasses power and cooling efficiency, licensing costs, and seamless integration with existing data center infrastructure. The C6615 is designed to reduce TCO significantly.
Introducing the Dell PowerEdge C6615
The Dell PowerEdge C6615 is a modular node designed to revolutionize data center infrastructure. Here are some key highlights:
- Price-performance ratio: The C6615 offers outstanding price per watt for scale-out workloads, with up to a 315% improvement compared to a one-socket (1S) server with AMD EPYC 9004 Series server processor.
- Optimized thermal solution: It features an optimized thermal solution that allows for air-cooling configurations with up to 53% improved cooling performance compared to the previous generation chassis.
- Density-optimized compute: The C6615's architecture is tailored for scale-out WebTech workloads, offering exceptional performance with reduced TCO.
- High-speed NVMe storage: It provides high-speed NVMe storage for applications with intensive IOPS requirements, ensuring efficient performance.
- Efficient scalability: With 40% more cores per rack compared to the AMD EPYC 9004 Series server processors, the C6615 allows for quicker and more efficient scalability.
- SmartNIC: It includes a SmartNIC with hardware-accelerated networking and storage, saving CPU cycles and enhancing security.
Key features
To maximize efficiency and reduce environmental impact, the PowerEdge C6615 incorporates several key features:
- Power and thermal efficiency: The 2U chassis with four nodes enhances power and thermal efficiency, eliminating the need for liquid cooling.
- Flexible I/O options: It supports up to two PCIe Gen5 slots and one 16 PCIe Gen5 OCP 3.0 slot for network cards, ensuring versatile connectivity.
- Security: Security is integrated at every phase of the PowerEdge lifecycle, from supply chain protection to Multi-Factor Authentication (MFA) and role-based access controls.
Accelerating performance
In benchmark testing, the C6615 outperforms the competition:
- HPL Benchmark: It showcases up to a 335% improvement in performance per watt per dollar and a 210% increase in performance per CPU dollar compared to other 1S systems with the AMD EPYC 9004 Series server processor.
Figure 1. HPL benchmark performance
- SPEC_CPU2017 Benchmark: Results demonstrate up to a 205% improvement in performance per watt per dollar and a remarkable 128% increase in performance per CPU dollar compared to similar systems.
Figure 2. SPEC_CPU2017 benchmark performance
Final thoughts
The seamless integration of the Dell PowerEdge C6615 into existing processes and toolsets is facilitated by comprehensive iDRAC9 support for all components. This ensures a smooth transition while leveraging the full potential of your server infrastructure.
Dell's commitment to environmental sustainability is evident in its use of recycled materials and energy-efficient options, helping to reduce carbon footprints and operational costs.
In conclusion, the Dell PowerEdge C6615 emerges as a leading dense compute solution, delivering exceptional value and unmatched performance. For more information, visit the PowerEdge Servers Powered by AMD site and explore how this innovative solution can transform your data center operations.
Note: Performance results may vary based on specific configurations and workloads. It's recommended to consult with Dell or an authorized partner for tailored solutions.
Author: David Dam
HPC Application Performance on Dell PowerEdge R7525 Servers with the AMD Instinct™ MI210 GPU
Mon, 12 Sep 2022 12:11:52 -0000
|Read Time: 0 minutes
PowerEdge support and performance
The PowerEdge R7525 server can support three AMD Instinct™ MI210 GPUs; it is ideal for HPC Workloads. Furthermore, using the PowerEdge R7525 server to power AMD Instinct MI210 GPUs (built with the 2nd Gen AMD CDNA™ architecture) offers improvements on FP64 operations along with the robust capabilities of the AMD ROCm™ 5 open software ecosystem. Overall, the PowerEdge R7525 server with the AMD Instinct MI210 GPU delivers expectational double precision performance and leading total cost of ownership.
Figure 1: Front view of the PowerEdge R7525 server
We performed and observed multiple benchmarks with AMD Instinct MI210 GPUs populated in a PowerEdge R7525 server. This blog shows the performance of LINPACK and the OpenMM customizable molecular simulation libraries with the AMD Instinct MI210 GPU and compares the performance characteristics to the previous generation AMD Instinct MI100 GPU.
The following table provides the configuration details of the PowerEdge R7525 system under test (SUT):
Table 1. SUT hardware and software configurations
Component | Description |
Processor | AMD EPYC 7713 64-Core Processor |
Memory | 512 GB |
Local disk | 1.8T SSD |
Operating system | Ubuntu 20.04.3 LTS |
GPU | 3xMI210/MI100 |
Driver version | 5.13.20.22.10 |
ROCm version | ROCm-5.1.3 |
Processor Settings > Logical Processors | Disabled |
System profiles | Performance |
NUMA node per socket | 4 |
HPL | rochpl_rocm-5.1-60_ubuntu-20.04 |
OpenMM | 7.7.0_49 |
The following table contains the specifications of AMD Instinct MI210 and MI100 GPUs:
Table 2: AMD Instinct MI100 and MI210 PCIe GPU specifications
GPU architecture | AMD Instinct MI210 | AMD Instinct MI100 |
Peak Engine Clock (MHz) | 1700 | 1502 |
Stream processors | 6656 | 7680 |
Peak FP64 (TFlops) | 22.63 | 11.5 |
Peak FP64 Tensor DGEMM (TFlops) | 45.25 | 11.5 |
Peak FP32 (TFlops) | 22.63 | 23.1 |
Peak FP32 Tensor SGEMM (TFlops) | 45.25 | 46.1 |
Memory size (GB) | 64 | 32 |
Memory Type | HBM2e | HBM2 |
Peak Memory Bandwidth (GB/s) | 1638 | 1228 |
Memory ECC support | Yes | Yes |
TDP (Watt) | 300 | 300 |
High-Performance LINPACK (HPL)
HPL measures the floating-point computing power of a system by solving a uniformly random system of linear equations in double precision (FP64) arithmetic, as shown in the following figure. The HPL binary used to collect results was compiled with ROCm 5.1.3.
Figure 2: LINPACK performance with AMD Instinct MI100 and MI210 GPUs
The following figure shows the power consumption during a single HPL run:
Figure 3: LINPACK power consumption with AMD Instinct MI100 and MI210 GPUs
We observed a significant improvement in the AMD Instinct MI210 HPL performance over the AMD Instinct MI100 GPU. The numbers on a single GPU test of MI210 are 18.2 TFLOPS which is approximately 2.7 times higher than MI100 number (6.75 TFLOPS). This improvement is due to the AMD CDNA2 architecture on the AMD Instinct MI210 GPU, which has been optimized for FP64 matrix and vector workloads. Also, the MI210 GPU has larger memory, so the problem size (N) used here is large in comparison to the AMD Instinct MI100 GPU.
As shown in Figure 2, the AMD Instinct MI210 has shown almost linear scalability in the HPL values on single node multi-GPU runs. The AMD Instinct MI210 GPU reports better scalability compared to its last generation AMD Instinct MI100 GPUs. Both GPUs have the same TDP, with the AMD Instinct MI210 GPU delivering three times better performance. The performance per watt value of a PowerEdge R7525 system is three times more. Figure 3 shows the power consumption characteristics in one HPL run cycle.
OpenMM
OpenMM is a high-performance toolkit for molecular simulation. It can be used as a library or as an application. It includes extensive language bindings for Python, C, C++, and even Fortran. The code is open source and actively maintained on GitHub and licensed under MIT and LGPL.
Figure 4: OpenMM double-precision performance with AMD Instinct MI100 and MI210 GPUs
Figure 5: OpenMM single-precision performance with AMD Instinct MI100 and MI210 GPUs
Figure 6: OpenMM mixed-precision performance with AMD Instinct MI100 and MI210 GPUs
We tested OpenMM with seven datasets to validate double, single, and mixed precision. We observed exceptional double precision performance with OpenMM on the AMD Instinct MI210 GPU compared to the AMD Instinct MI100 GPU. This improvement is due to the AMD CDNA2 architecture on the AMD Instinct MI210 GPU, which has been optimized for FP64 matrix and vector workloads.
Conclusion
The AMD Instinct MI210 GPU shows an impressive performance improvement in FP64 workloads. These workloads benefit as AMD has doubled the width of their ALUs to a full 64-bits wide. This change allows the FP64 operations to now run at full speed in the new 2nd Gen AMD CDNA architecture. The applications and workloads that are designed to run on FP64 operations are expected to take full advantage of the hardware.