Home > Workload Solutions > High Performance Computing > White Papers > Dell Validated Design for HPC pixstor Storage—Joint Solution with Kalray > PowerEdge R650 Random small blocks IOzone Performance N clients to N files
Random N clients to N files performance was measured with IOzone version 3.492. The tests that we ran varied from one up to 1024 threads, using 4 KiB blocks for emulating small blocks traffic. We minimized caching effects by setting the GPFS page pool tunable to 16 GiB on the clients and 32 GiB on the servers (four R650 nodes) and using total data size of 128 GiB.
The following command was used to run the benchmark in random IO mode for both read and write operations, where the Threads variable is the number of threads used (from 1 to 1024 incremented in powers of 2), and threadlist was the file that allocated each thread on a different node, using the round-robin method to spread them homogeneously across the 16 compute nodes.
./iozone -i0 -c -e -w -r 16M -s ${Size}G -t $Threads -+n -+m ./threadlist
./iozone -i2 -O -w -r 4K -s ${Size}G -t $Threads -+n -+m ./threadlist
Note that the scale chosen was logarithmic with base 10, to allow comparing operations that have differences of several orders of magnitude; otherwise, some of the operations appear like a flat line close to 0 on a normal graph. A log graph with base 2 is more appropriate, because the number of threads are increased by powers of 2. Such a graph looks similar, but people tend to perceive and remember numbers based on powers of 10 better.
From the results, we see that write performance starts at a high value of approximately 5.6K IOPS and rises to a plateau of 575K IOPS at approximately 256 threads and to the maximum of 591K IOPS at 1024 threads. Read performance starts at 8.6K IOPS and increases performance with the number of clients used until it reaches the maximum performance of 3,716K IOPS at 1024 threads with signs of approaching a plateau. However, as previously explained, using more threads on the current 16 compute nodes than the number of cores (640) might incur more context switching, which could be limiting peak performance. A future test with more physical compute nodes can check the random read performance that can be achieved with 1024 threads with IOzone. Also, FIO or IOR with more nodes (cores) can be used to investigate the behavior with more than 1024 threads.