Sequential IOzone Performance N Clients to N Files

Thank you for your feedback!

Sequential N clients to N files performance was measured with IOzone version 3.492. Tests executed varied from single thread up to 512 threads.
Caching effects were minimized by using files big enough to avoid it, with a total data size of 8 TiB, more than two times the total memory size of servers and clients. It is important to note that GPFS uses the tunable page pool to set the maximum amount of memory used for caching data, regardless the amount of RAM installed and free (set to 32 GiB on clients and 96 GiB on servers to allow I/O optimizations). It is also important to note that while in other Dell Technologies HPC solutions, the block size for large sequential transfers is 1 MiB, GPFS was formatted with a block size of 8 MiB and therefore, that value or multiples of it should be used on the benchmark for optimal performance. A block size of 8 MiB may look too large, however GPFS uses subblock allocation. In the current configuration, each block was subdivided in 512 subblocks of 16 KiB each.
The following commands were used to execute the benchmark for writes and reads, where Threads was the variable with the number of threads used (1 to 512 incremented in powers of two), and threadlist was the file that allocated each thread on a different node, using sequential round-robin to spread them homogeneously across the 16 compute nodes. The variable FileSize has the result of 8192 (GiB) / Threads to divide the total data size evenly among all threads used. A transfer size of 16 MiB was used for this performance characterization.
./iozone -i0 -c -e -w -r 16M -s ${FileSize}G -t $Threads -+n -+m ./threadlist
./iozone -i1 -c -e -w -r 16M -s ${FileSize}G -t $Threads -+n -+m ./threadlist
Figure 15. N to N sequential performance
From the results, we can observe that Read performance is higher at low thread counts (>10%) with a peak at 4 threads where it was almost 18% higher than PixStor 5, and after that is only marginally higher than what observed with PixStor 5, with a small drop in performance at 1024 threads. Write performance was almost the same for 1 & 2 threads, and then at 4 threads, It was 24% higher than PixStor 5 and reached a plateau that was stable and remained about 20% higher than PixStor 5 until the maximum number of threads that IOzone allows is reached and performance dropped a bit at 1024 threads (since there are only 640 cores in the nodes, this could be due to oversubscription overhead). Note that the peak read performance was 23 GB/s at 32 threads and the peak write performance of 20.5 was reached at 64 threads. The significant improvement in write performance compared to PixStor 5 was a pleasant surprise that may be due to block I/O improvements in Red Hat 8.x or possibly hardware and driver improvements for the new HBA335e (a PCIe 4 adapter), since with a total data size of 8 TiB, it cannot be due to any cache effects from servers, clients or combined.
It is important to remember that the GPFS preferred mode of operation is scattered, and the solution was formatted to use it. In this mode, blocks are allocated from the very beginning in a pseudo-random fashion, spreading data across the whole surface of each HDD. While the obvious disadvantage is a smaller initial maximum performance, that performance is fairly constant, regardless of how much space is used on the file system. In contrast to other parallel file systems that use the outer tracks to hold more data (sectors) per disk revolution, and therefore have the highest possible performance the HDDs can provide, as the system uses more space, inner tracks with less data per revolution are used, with the consequent reduction of performance. GPFS also supports that allocation system (it is called clustered), but it is only used on the PixStor storage solution as an exception in deployments with special conditions.

Your Browser is Out of Date

Sequential IOzone Performance N Clients to N Files

Sequential IOzone Performance N Clients to N Files