Home > Workload Solutions > High Performance Computing > White Papers > Dell Validated Design for HPC pixstor Storage—Joint Solution with Kalray > PowerEdge R750 Random small blocks IOzone Performance N clients to N files
Random N clients to N files performance was measured with IOzone version 3.492. The tests that we ran varied from 1 up to 1024 threads, using 4 KiB blocks for emulating small blocks traffic. We minimized caching effects by setting the GPFS page pool tunable to 16 GiB on the clients and 32 GiB on the servers and using total data size of 128 GiB.
The following command was used to run the benchmark in random IO mode for both read and write operations, where the Threads variable is the number of threads used (from 1 to 1024 incremented in powers of 2), and threadlist was the file that allocated each thread on a different node, using the round robin method to spread them homogeneously across the 16 compute nodes.
./iozone -i0 -c -e -w -r 16M -s ${Size}G -t $Threads -+n -+m ./threadlist
./iozone -i2 -O -w -r 4K -s ${Size}G -t $Threads -+n -+m ./threadlist
Note that the scale chosen was logarithmic with base 10, to allow comparing operations that have differences several orders of magnitude; otherwise, some of the operations appear like a flat line close to 0 on a normal graph. A log graph with base 2 is more appropriate because the number of threads are increased by powers of 2. Such a graph looks similar, but people tend to perceive and remember numbers based on powers of 10 better.
From the results, we see that write performance starts at a high value of approximately 5.6K IOPS and rises to a plateau of 560K IOPS at approximately 256 threads and to the maximum of 584K IOPS at 1024 threads. Read performance starts at 7K IOPS and increases performance with the number of clients used until it reaches the maximum performance of 2,031K IOPS at 1024 threads with signs of approaching a plateau. However, as previously explained, using more threads on the current 16 compute nodes than the number of cores (640) has the limitation of incurring more context switching, which can limit peak performance. A future test with more physical compute nodes can check the random read performance that can be achieved with 1024 threads with Iozone. Also, FIO or IOR with more nodes (cores) could be used to investigate the behavior with more than 1024 threads.