Home > Workload Solutions > High Performance Computing > White Papers > Dell Validated Design for HPC pixstor Storage—Joint Solution with Kalray > PowerEdge R650 Sequential IOzone Performance N clients to N files
We measured Sequential N clients to N files performance with IOzone version 3.492. The tests that we ran varied from a single thread up to 1024 threads.
We used files large enough to minimize caching effects, with a total data size of 8 TiB, twice the total memory size of servers (four R650 nodes) and clients. GPFS sets the tunable page pool to the maximum amount of memory used for caching data, regardless of the amount of RAM that is installed and is free (set to 32 GiB on clients and 96 GiB on servers to allow I/O optimizations). While in other Dell HPC solutions the block size for large sequential transfers is 1 MiB, GPFS was formatted with a block size of 8 MiB; therefore, use that value or its multiples on the benchmark for optimal performance. A block size of 8 MiB might seem too large and waste too much space when using small files, but GPFS uses subblock allocation to prevent that situation. In the current configuration, each block was subdivided into 512 subblocks of 16 KiB each.
The following commands were used to run the benchmark for read and write operations, where the Threads variable is the number of threads used (1 to 1024 incremented in powers of 2), and threadlist was the file that allocated each thread on a different node, using the round-robin method to spread them homogeneously across the 16 compute nodes. The FileSize variable has the result of 8192 (GiB)/Threads to divide the total data size evenly among all threads used. A transfer size of 16 MiB was used for this performance characterization.
./iozone -i0 -c -e -w -r 16M -s ${FileSize}G -t $Threads -+n -+m ./threadlist
./iozone -i1 -c -e -w -r 16M -s ${FileSize}G -t $Threads -+n -+m ./threadlist
From the results, we see that read performance reaches a plateau of approximately 180 GB/s at 32 threads and a peak at 64 threads with 180.8 GB/s, a considerable increase from the previous generation of PowerEdge R640 servers with NVMe Gen 3 that had a plateau of about 80 GB/s for the same number of NVMe nodes (4).
The write performance reached a plateau of approximately 40 GB/s at 16 threads, with a peak at 40.2 GB/s at 16, 32 and 512 threads. Write performance might look low compared to read performance, however, consider two factors:
Both read and write results are stable when the plateau is reached, which is a favorable behavior because the servers do not exhibit a drop in performance as the number of simultaneous clients access different threads. As a future test and because IOzone has a limitation of 1024 maximum number of threads, IOR can be used to find the limit with respect to simultaneous clients/files (after adding more clients to avoid context switching within the clients, affecting performance).