Home > Storage > PowerScale (Isilon) > Industry Solutions and Verticals > Analytics > Dell PowerScale and NVIDIA GPUDirect Performance Report > Server configuration
The servers have been configured following NVIDIA’s guidelines. For more details, see the NVIDIA GPUDirect Storage Installation and Troubleshooting Guide and additional documentation at GPUDirect Storage on the NVIDIA Docs Hub.
Each server has two Mellanox ConnectX-6 network interface cards running at 100 Gbps with JUMBO Frame enabled. Each NIC is configured on its own subnet. A third card running at 1 Gbps is used for management purposes.
# cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
ethernets:
eno1:
addresses:
- 192.168.1.11/24
gateway4: 192.168.1.1
nameservers:
addresses:
- 192.168.1.2
- 192.168.1.2
search:
- lab.local
ens6f0:
addresses:
- 10.100.10.11/24
mtu: 9000
ens6f1:
addresses:
- 10.100.20.11/24
mtu: 9000
version: 2
It has been observed during preliminary testing that each F600 node can deliver up to 8.2 GB/s of throughput for SEQUENTIAL READ IO with a single GPU. Based on that and knowing that each server has two 100Gbps NICs and two GPUs, we can map one GPU on a single NIC without forgetting to respect the NUMA node affinity collected in Understanding NUMA node affinity and PCIe tree section. We also need to make sure that each mount point is pointing to a unique and dedicated PowerScale front-end IP/NIC.
During our tests, we had access to seven servers with two GPUs each for a total of fourteen GPUs. We only had seven PowerScale nodes, which allowed us to only use a single GPU per server and do a one-to-one mapping between GPUs and F600 nodes.
Table 4. Mount point mapping
Client Name | Client NIC name | Mount point | PowerScale IP address | PowerScale node name | PowerScale NIC name |
worker001 worker002 worker003 worker004 worker005 worker006 worker007 | mlx5_0 mlx5_0 mlx5_0 mlx5_0 mlx5_0 mlx5_0 mlx5_0 | /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 | 10.100.10.1 10.100.10.2 10.100.10.3 10.100.10.4 10.100.10.5 10.100.10.6 10.100.10.7 | node1 node2 node3 node4 node5 node6 node7 | 100gige-1 100gige-1 100gige-1 100gige-1 100gige-1 100gige-1 100gige-1 |
The mount points have been mounted by script:
ssh -n root@${MGMT} "mount -o \ proto=rdma,port=20049,vers=3,rsize=${BLKS},wsize=${BLKS} \
${NODE}:/ifs/benchmark ${MNT}"
${MGMT} corresponds to the management IP of the server
${BLKS} corresponds to the block size in bytes (ex: 512*1024 = 524288)
${NODE} corresponds to the PowerScale frontend IP
${MNT} corresponds to the local directory where to mount the NFS export share
# /usr/local/cuda/gds/tools/gdscheck -p
GDS release version: 1.6.1.9
nvidia_fs version: 2.15 libcufile version: 2.12
Platform: x86_64
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe : Unsupported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Supported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Enabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : false
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 1
properties.rdma_dynamic_routing_order : GPU_MEM_NVLINKS GPU_MEM SYS_MEM P2P
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 3
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 0
execution.max_io_queue_depth : 128
execution.parallel_io : false
execution.min_io_threshold_size_kb : 8192
execution.max_request_parallelism : 0
=========
GPU INFO:
=========
GPU index 0 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 1 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
Found ACS enabled for switch 0000:80:01.1
Found ACS enabled for switch 0000:e0:03.1
IOMMU: disabled
Platform verification succeeded
Note: You can edit the file /etc/cufile.json to customize gdsio. During tests, only these two parameters have been modified:
"allow_compat_mode": false (default = true)
"rdma_dynamic_routing": true (default = false)
For more details, see GPUDirect Storage documentation.