Home > Workload Solutions > High Performance Computing > White Papers > Dell Validated Design for HPC NG-Stor Storage - Joint Solution with Kalray > NVMe Tier
The NVMe Tier modules in this release (see Figure 2) were updated to include the PowerEdge R660 and PowerEdge R760 servers. For HA purposes, GPFS replication was implemented for each NVMe server pair, which might be replaced with alternatives (under investigation) in a future version of the solution. This alternative maintains good performance for the NVMe tier, reduces dependencies on third-party software, and reduces complexities from an additional software layer. The servers approved for this tier support NVMe PCIe5 devices. Mixing NVMe PCIe5 devices with lower performant PCIe4 devices (or slower devices) from previous releases is not recommended for the solution and not supported for the same NVMe tier. Additional pairs of NVMe nodes can scale out performance and capacity for this NVMe tier. Increased capacity is provided by selecting the appropriate capacity for the NVMe devices supported on the servers or adding more NVMe servers.
Pairs of PowerEdge servers in HA (failover domains) provide a high-performance flash-based tier for the NG-Stor solution. Two PowerEdge servers were benchmarked as part of the NVMe tier: a PowerEdge R660 server with 14 NVMe direct-attached drives and a PowerEdge R760 server with 16 NVMe direct-attached devices. The PowerEdge R7625 server with direct-attached drives is supported and can also be used, but its performance with PCIe4 devices was not characterized; performance with PCIe 5 devices can be characterized in future work. To maintain homogeneous performance across all the NVMe nodes, allowing striping data across all nodes in the tier, do not mix different server models on the same NVMe tier. Differences in performance hold back faster nodes while waiting for stripes on slower nodes. However, having multiple NVMe tiers on a solution is possible to accommodate different sets of servers each using a different fileset for access.
The drives on any NVMe tier server pair are configured as NSD devices; each NSD has a replica on each pair of servers (failure domain) using GPFS replication. This configuration allows data redundancy not only at the device level but at the server level, though capacity is reduced to 50 percent of total NVMe NSD capacity. The only restriction for these NVMe tier servers is that they must be used in pairs.
The HDMD uses a NVMe tier module based on a pair of PowerEdge R660 servers used to store only metadata. Each of these PowerEdge R660 servers has 14 Dell AG 3.84TB (PM1743) E3.S PCIe5 NVMe direct-attached devices. However, any NVMe device supported on the PowerEdge R660 server is supported for these NVMe nodes. If better metadata performance is required, additional HDMD modules can be used. To increase the maximum number of files for the solution, larger NVMe devices can be used. Alternatively, more HDMD modules can be added to provide additional performance.
Because PowerEdge 14G servers are used, a block size of 8 MiB has been selected at the file system level. This large configuration might seem to waste too much space when using small files, however, GPFS uses subblock allocation to prevent waste. The file system allows each large block to be subdivided into subblocks as small as 16 KiB (512 subdivisions for the 8 MiB block case), if a more space efficient configuration is needed.
For the rest of the performance work, such HDMD modules were used. However, this document does not include metadata characterization. A future update to the solution will include this information.
Table 3. NVMe tier server components (components for this tier only)
Solution component | At release | Test bed | |
Processor | PowerEdge R660 NVMe nodes | 2x Intel Xeon Gold 6426Y 2.5G, 16C/32T, 16 GT/s, 38M Cache, Turbo, HT (185W) DDR5-4800
| 2x Intel Xeon Gold 6448Y 2.1G, 32 C/64T, 16 GT/s, 60M Cache, Turbo, HT (225W) DDR5-4800
|
High-demand metadata | |||
PowerEdge R760 NVMe nodes | |||
Memory | PowerEdge R660 NVMe nodes | 16x 16 GiB RDIMM, 4800 MT/s (256 GiB) | |
High-demand metadata | |||
PowerEdge R760 NVMe nodes | |||
NVMe | PowerEdge R660 NVMe nodes | Supported NVMe devices 14 NSDs replicated across server-pair | 14 Dell AG 3.84 TB (PM1743) PCIe5 |
High-demand metadata | Supported NVMe devices 14 NSDs replicated across server-pair | 14 Dell AG 3.84 TB (PM1743) PCIe5 | |
PowerEdge R760 NVMe nodes | Supported NVMe devices 16 NSDs replicated across server-pair | 16 Dell AG 3.84 TB (PM1743) PCIe5 | |
BIOS | 1.5.6 | ||
Operating system | Red Hat Enterprise Linux 8.8 | ||
Kernel version | 4.18.0-477.15.1.el8_8.x86_64 | ||
NG-Stor software | 6.0.3.1-1 | ||
File system software | Spectrum Scale (GPFS) 5.1.8-1 | ||
OFED version | MLNX_OFED_LINUX-23.04-1.1.3.0 | ||
High-performance network connectivity | 2x Dell OEM DPN 0RYMTY ConnectX-7 InfiniBand NDR (400 Gb/s) FW 28.36.1010 | ||
High-performance switch | Mellanox QM9700 NDR 400 Gb/s | ||
Local disks (operating system) | BOSS-S2 with 2x M.2 480 GB in RAID 1 | ||
Systems management | iDRAC9 Enterprise + Dell OpenManage 10.1.0.0 |