PowerEdge R750 Sequential IOzone Performance N clients to N files

Thank you for your feedback!

Sequential N clients to N files performance was measured with IOzone version 3.492. The tests that we ran varied from a single thread up to 1024 threads.
We used files large enough to minimize caching effects, with a total data size of 8 TiB, twice the total memory size of servers and clients. Note that GPFS sets the tunable page pool to the maximum amount of memory used for caching data, regardless of the amount of RAM that is installed and is free (set to 32 GiB on clients and 96 GiB on servers to allow I/O optimizations). While in other Dell HPC solutions the block size for large sequential transfers is 1 MiB, GPFS was formatted with a block size of 8 MiB; therefore, use that value or its multiples on the benchmark for optimal performance. A block size of 8 MiB might seem too large and waste too much space when using small files, but GPFS uses subblock allocation to prevent that situation. In the current configuration, each block was subdivided into 512 subblocks of 16 KiB each.
The following commands were used to run the benchmark for write and read operations, where the Threads variable is the number of threads used (1 to 1024 incremented in powers of 2), and threadlist was the file that allocated each thread on a different node, using the round-robin method to spread them homogeneously across the 16 compute nodes. The FileSize variable has the result of 8192 (GiB)/Threads to divide the total data size evenly among all threads used. A transfer size of 16 MiB was used for this performance characterization.
./iozone -i0 -c -e -w -r 16M -s ${FileSize}G -t $Threads -+n -+m ./threadlist
./iozone -i1 -c -e -w -r 16M -s ${FileSize}G -t $Threads -+n -+m ./threadlist

Figure 35. N to N sequential performance
From the results, we see that read performance reaches a plateau of approximately 96 GB/s at 128 threads and a peak at 512 threads with 98.7 GB/s. This number might seem low because 16 NVMe devices are used per PowerEdge R750 server, compared to PowerEdge R650 servers with 10 devices. The reason is that read specifications for the 1.6TB devices used (P5600) with PowerEdge R750 servers is low compared to the devices used on PowerEdge R650 servers (PM1735), as described in the second bulleted item in the following list.
The write performance reached a plateau with approximately 26 GB/s at 32 threads, with a peak at 28.6 GB/s at 512 Threads. Write performance might look low compared to read performance, however, two factors should be considered:
- Replication was used to have a copy for each NVMe device (NSD) on a different server for HA purposes, effectively creating a data mirror. Then, only half of the NVMe drives contribute to write performance, while the other half become required overhead to mirror data.
- NVMe PCIe 4 models available for the preproduction PowerEdge R750 servers used in this project were limited, so Dell 1.6 TB (OEM intel P5600) PCIe4 devices were used on the test bed. The performance of these devices is 3,500 GB/s for reads and 1,700 GB/s for writes, which are considerably lower compared to larger capacity devices (7,000 GB/s reads and 3,500 GB/s writes for 3.2 TB devices and 7,000 GB/s reads and 4,300 GB/s writes for 6.4 TB devices).
Both read and write results seem to be stable when the plateau is reached, which is a favorable behavior because the servers do not exhibit a drop in performance as the number of simultaneous clients access different threads. As a future test and because IOzone has a limitation of 1024 as the maximum number of threads, IOR can be used to find the limit with respect to simultaneous clients/files (after adding more clients to avoid context switching within the clients, affecting performance).

Your Browser is Out of Date

PowerEdge R750 Sequential IOzone Performance N clients to N files

PowerEdge R750 Sequential IOzone Performance N clients to N files