Home > Workload Solutions > High Performance Computing > White Papers > HPC Software-Defined Storage with PixStor > Sequential IOR Performance N Clients to 1 File
Sequential N clients to a single shared file performance was measured with IOR version 3.3.0, assisted by OpenMPI v4.1.2A1 to run the benchmark over the 16 compute nodes. Tests executed varied from single thread up to 512 threads since there are not enough cores for 1024 threads (the 16 clients have a total of 16 x2 x 20 = 640 cores) and oversubscription overhead apparently affected IOzone results at 1024 threads.
Caching effects were minimized by setting the GPFS page pool tunable to 32GiB on the clients and 96 GiB on the servers, and using a total data size of 8 TiB, more than twice the RAM size from servers and clients combined. A transfer size of 16 MiB was used for this performance characterization. The previous performance test section has a more complete explanation for those matters.
The following commands were used to execute the benchmark, where Threads is the number of threads used (1 to 512 incremented in powers of two), and my_hosts.$Threads is the corresponding file that allocated each thread on a different node, using sequential round-robin to spread them homogeneously across the 16 compute nodes. The variable FileSize has the result of 8192 (GiB) / Threads to divide the total data size evenly among all threads used.
mpirun --allow-run-as-root -np $Threads --hostfile my_hosts.$Threads --mca btl_openib_allow_ib 1 --mca pml ^ucx --oversubscribe --prefix /usr/mpi/gcc/openmpi-4.1.2a1 /usr/local/bin/ior -a POSIX -v -i 1 -d 3 -e -k -o /mmfs1/perftest/ior/tst.file -w -s 1 -t 16m -b ${FileSize}G
mpirun --allow-run-as-root -np $Threads --hostfile my_hosts.$Threads --mca btl_openib_allow_ib 1 --mca pml ^ucx --oversubscribe --prefix /usr/mpi/gcc/openmpi-4.1.2a1 /usr/local/bin/ior -a POSIX -v -i 1 -d 3 -e -k -o /mmfs1/perftest/ior/tst.file -r -s 1 -t 16m -b ${FileSize}G
From the results, we can observe again that PixStor 6 has similar read performance and better write performance than that observed in PixStor 5, but in this case that performance rises very fast for reads with the number of clients used and then reaches a plateau that is semi-stable for reads and very stable for writes all the way to the maximum number of threads used on this test. Therefore, large single shared file sequential performance is stable even for 512 concurrent threads. Note that the maximum read performance was 23.8 GB/s at 16 threads. Furthermore, read performance decreased from that value until reaching the plateau at around 21 GB/s, with a momentary decrease below 21 GB/s at 64 and 128 threads. Similarly, note that the maximum write performance of 20.4 was reached at 64 threads and it is much better than PixStor 5 results, but by storing 8 TiB of data, that cannot be the result of cached data.