Analyzing How Gen4 NVMe Drive Performance Scales on the PowerEdge R7525
Download PDFMon, 16 Jan 2023 13:44:23 -0000
|Read Time: 0 minutes
Summary
Gen4 NVMe drives double the PCIe speeds of Gen3 from 1GB/s to 2GB/s per lane, effectively increasing the performance capability by two times. However, users also need to understand how Gen4 NVMe performance scales when more than one drive is loaded into a populated server running workloads. This DfD will analyze how various IO profiles scale when more than one Gen4 NVMe drive is loaded into a PowerEdge R7525.
PCIe 4.0 History and Gen4 NVMe Scaling
PCIe 4.0 was released in 2019, following its predecessor with double the bandwidth (up to 64GB/s), bit rate (up to 16GT/s) and frequency (up to 16GHz). AMD released the first motherboards to support PCIe
4.0 in early 2020, while Intel motherboards with PCIe 4.0 support are scheduled to begin releasing by the end of 2020. Gen4 NVMe drives were introduced shortly after the release of PCIe 4.0 to capitalize on its specification improvements; allowing performance metrics to double (if the same number of lanes are used).
Although these numbers look enticing at first glance, very little data has been gathered around how Gen4 NVMe drives perform when scaled in a datacenter server running workloads. What is the sweet spot? When does the performance curve begin to plateau? The Dell Technologies engineering team constructed an in-house test setup to obtain data points that will help users understand IOPS and bandwidth trends when scaling Gen4 NVMe drives.
Test Setup
The PowerEdge R7525 was used as the host server, as it s one of the first Dell EMC servers to support PCIe 4.0.
Figure 1 - Samsung PM1733 Gen4 NVMe
The Samsung PM1733 Gen4 NVMe drive was connected using CPU direct attach and then scaled. Measurements were taken for 1, 2, 4, 8, 12 and 24 drives. The IOmeter benchmark was used to simulate data center workloads running on NVMe drives to achieve the maximum raw performance data. FIO was used as a supplemental benchmark as well. *Note that these benchmark results are not directly applicable to file systems or application workloads.
Random reads (RR) and writes (RW) were measured in Input/Output operations per second (IOPS). Online Transaction Processing (OLTP), useful for measuring database workloads, is also measured in IOPS. Sequential reads (SR) and writes (SW) were measured in mebibyte per second (MiBPs).
Test Results
Figure 2 – Gen4 NVMe RR perf scaling for up to 24 drives Figure 3 –Gen4 NVMe RW perf scaling for up to 24 drives Figure 4 – Gen4 NVMe OLTP perf scaling per drive for up to 24 drives
Figure 5 –Gen4 NVMe SR perf scaling for up to 24 drives Figure 6 –Gen4 NVMe SW perf scaling for up to 24 drives
As seen in Figures 2-6, the Gen4 NVMe drives have remarkable performance. One Gen3 NVMe drive commonly has 4K RR performance in the triple-digit KIOPS, but one Gen4 NVMe drive is within the quad-digit KIOPS for 4K RR. Scaling to 12 Gen4 NVMe drives shows 17M 4KiB RR IOPS, allowing for extraordinary amounts of data to be read randomly from the disk at one time. Scaling to 12 Gen4 NVMe drives also has a notable 80.41GiBs at 128KiB, a number very close to the theoretical maximum line rate of 94.5 128K SR GBPS. Lastly, 4K OLTP benchmark speeds are also nearly 2 times faster than Gen3 NVMe drives.
Furthermore, these bar graphs demonstrate that each profile scales linearly for up to 12 drives. The benchmarked synthetic workloads received linear performance improvements with up to 12 NVMe drives scaled, and each performance readout also scaled very closely to its theoretical maximum. However, once the jump from 12 to 24 drives is made, two of the IO profiles (in particular, the RR and SR profiles) stop scaling linearly and become less optimized. When accounting for the fact CPU utilization is at 90%, it is to be expected that scaling beyond 12 drives will not give linear performance increases for all IO profiles.
Conclusion
Customers seeking to scale their Gen4 NVMe drives will be pleased to know that all IO Profile performance readings scaled linearly for up to 12 drives, while only some of the IO Profiles did for up to 24 drives. Servers and systems running workloads like data analytics, AI, ML, DL and databases can greatly benefit from this increase in IOPS and throughput when scaling Gen4 NVMe devices for up to 12 drives.