Home > Workload Solutions > High Performance Computing > White Papers > HPC Software-Defined Storage with PixStor > Random Small Blocks IOzone Performance N Clients to N Files
Random N clients to N files performance was measured with IOzone version 3.492. Tests executed varied from single thread up to 512 threads. This benchmark used 4 KiB blocks for emulating small blocks traffic.
Caching effects were minimized by setting the GPFS page pool tunable to 16GiB on the clients and 32 GiB on the servers and using files two times that size. The section titled Sequential IOzone Performance N Clients to N Files above has a more complete explanation about why this is effective on GPFS.
The following command was used to execute the benchmark in random I/O mode for both writes and reads, where Threads was the variable with the number of threads used (1 to 512 incremented in powers of two), and threadlist was the file that allocated each thread on a different node, using round robin to spread them homogeneously across the 16 compute nodes.
./iozone -i0 -c -e -w -r 16M -s ${Size}G -t $Threads -+n -+m ./threadlist
./iozone -i2 -O -w -r 4K -s ${Size}G -t $Threads -+n -+m ./threadlist
From the results, we can observe that write performance starts at a high value of almost 12.8K IOPS and rises slowly to the maximum of 12.8K IOPS at 256 threads. Read performance on the other hand starts small at 1.5K IOPS and increases performance almost linearly with the number of clients used (keep in mind that number of threads is doubled for each data point) until it reaches the maximum performance of 18.4K IOPS at 512 threads with signs of approaching a maximum. However, as previously explained, using more threads on the current 16 compute nodes than the number of cores (640), has the limitation that incurring in more context switching, which apparently limits performance. A future test with more compute nodes could check what random read performance can be achieved with 1024 threads with IOzone. Also, FIO or IOR could be used to investigate the behavior with more than 1024 threads.