Home > Workload Solutions > High Performance Computing > White Papers > Dell Validated Design for HPC NG-Stor Storage - Joint Solution with Kalray > Benchmarks and test beds
To characterize the different components of the NG-Stor solution, we used the hardware specified in the last column of Table 1, including the HDMD. Due to limitations in the number of clients, only sequential performance was first characterized, followed by Random IOPS when we could add more clients. Small random IOPS and metadata OPS require a higher number of clients than those clients initially available. In a future update, we will include a metadata study using MDtest with empty files, 3KiB and 4KiB files.
To assess the solution sequential performance, we selected the following benchmarks:
For these benchmarks, the test bed included the clients described in the following table, except for testing the gateway nodes:
Table 2. InfiniBand client test bed
Component | Description |
Number of client nodes | 8 |
Client node | C6520 & C6525 |
Processors per client node | 4 nodes with 2 x Intel Xeon Platinum 8352Y 32 Cores @ 2.2 GHz |
Memory per client node | 4 nodes with 2 x AMD EPYC 7702 64 Cores @ 2.0 GHz |
BIOS | 4 nodes (7702) with 256 GiB = 16 x 16 GiB DDR4 3200 MT/s |
Operating system | 4 nodes (8352Y) with 512 GiB = 16 x 32 GiB DDR4 3200 MT/s |
Operating system kernel | (7702) 2.11.3, (8352Y) 1.11.2 |
NG-Stor software | Red Hat Enterprise Linux 8.8 |
Spectrum Scale (GPFS) | 4.18.0-477.10.1.el8_8.x86_64 |
OFED version | 6.3.0.1-1 |
CX7 FW | 5.1.8-2 |
Because there were only eight compute nodes available for testing, when a higher number of threads was required, those threads were equally distributed on the compute nodes (that is, 16 threads = 2 threads per node, 32 threads = 4 threads per node, 64 threads = 8 threads per node, 128 threads = 16 threads per node, 256 threads = 32 threads per node, 512 threads = 64 threads per node, 1024 threads = 128 threads per node). The intention was to simulate a higher number of concurrent clients with the limited number of compute nodes. Because the benchmarks support a higher number of threads, a maximum value up to 512 was used, which was based on the client nodes’ core count (except for IOzone N clients to N files). Using more threads than cores might cause excessive context switching and other performance issues.