Server configuration

Thank you for your feedback!

GDS installation

The servers have been configured following NVIDIA’s guidelines. For more details, see the NVIDIA GPUDirect Storage Installation and Troubleshooting Guide and additional documentation at GPUDirect Storage on the NVIDIA Docs Hub.

Network card configuration

Each server has two Mellanox ConnectX-6 network interface cards running at 100 Gbps with JUMBO Frame enabled. Each NIC is configured on its own subnet. A third card running at 1 Gbps is used for management purposes.

# cat /etc/netplan/00-installer-config.yaml

# This is the network config written by 'subiquity'

network:

ethernets:

eno1:

addresses:

- 192.168.1.11/24

gateway4: 192.168.1.1

nameservers:

addresses:

- 192.168.1.2

search:

- lab.local

ens6f0:

addresses:

- 10.100.10.11/24

mtu: 9000

ens6f1:

addresses:

- 10.100.20.11/24

mtu: 9000

version: 2

Mount point mapping

It has been observed during preliminary testing that each F600 node can deliver up to 8.2 GB/s of throughput for SEQUENTIAL READ IO with a single GPU. Based on that and knowing that each server has two 100Gbps NICs and two GPUs, we can map one GPU on a single NIC without forgetting to respect the NUMA node affinity collected in Understanding NUMA node affinity and PCIe tree section. We also need to make sure that each mount point is pointing to a unique and dedicated PowerScale front-end IP/NIC.

During our tests, we had access to seven servers with two GPUs each for a total of fourteen GPUs. We only had seven PowerScale nodes, which allowed us to only use a single GPU per server and do a one-to-one mapping between GPUs and F600 nodes.

Table 4. Mount point mapping

Client Name	Client NIC name	Mount point	PowerScale IP address	PowerScale node name	PowerScale NIC name
worker001 worker002 worker003 worker004 worker005 worker006 worker007	mlx5_0 mlx5_0 mlx5_0 mlx5_0 mlx5_0 mlx5_0 mlx5_0	/mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1 /mnt/f600_gdsio1	10.100.10.1 10.100.10.2 10.100.10.3 10.100.10.4 10.100.10.5 10.100.10.6 10.100.10.7	node1 node2 node3 node4 node5 node6 node7	100gige-1 100gige-1 100gige-1 100gige-1 100gige-1 100gige-1 100gige-1

Client Name

Client NIC name

Mount point

PowerScale IP address

PowerScale node name

PowerScale NIC name

worker001

worker002

worker003

worker004

worker005

worker006

worker007

mlx5_0

/mnt/f600_gdsio1

10.100.10.1

10.100.10.2

10.100.10.3

10.100.10.4

10.100.10.5

10.100.10.6

10.100.10.7

node1

node2

node3

node4

node5

node6

node7

100gige-1

The mount points have been mounted by script:

ssh -n root@${MGMT} "mount -o \ proto=rdma,port=20049,vers=3,rsize=${BLKS},wsize=${BLKS} \

${NODE}:/ifs/benchmark ${MNT}"

${MGMT} corresponds to the management IP of the server

${BLKS} corresponds to the block size in bytes (ex: 512*1024 = 524288)

${NODE} corresponds to the PowerScale frontend IP

${MNT} corresponds to the local directory where to mount the NFS export share

GDS verification

# /usr/local/cuda/gds/tools/gdscheck -p

GDS release version: 1.6.1.9

nvidia_fs version: 2.15 libcufile version: 2.12

Platform: x86_64

============

ENVIRONMENT:

============

=====================

DRIVER CONFIGURATION:

=====================

NVMe : Unsupported

NVMeOF : Unsupported

SCSI : Unsupported

ScaleFlux CSD : Unsupported

NVMesh : Unsupported

DDN EXAScaler : Unsupported

IBM Spectrum Scale : Unsupported

NFS : Supported

BeeGFS : Unsupported

WekaFS : Unsupported

Userspace RDMA : Unsupported

--Mellanox PeerDirect : Enabled

--rdma library : Not Loaded (libcufile_rdma.so)

--rdma devices : Not configured

--rdma_device_status : Up: 0 Down: 0

=====================

CUFILE CONFIGURATION:

=====================

properties.use_compat_mode : false

properties.force_compat_mode : false

properties.gds_rdma_write_support : true

properties.use_poll_mode : false

properties.poll_mode_max_size_kb : 4

properties.max_batch_io_size : 128

properties.max_batch_io_timeout_msecs : 5

properties.max_direct_io_size_kb : 16384

properties.max_device_cache_size_kb : 131072

properties.max_device_pinned_mem_size_kb : 33554432

properties.posix_pool_slab_size_kb : 4 1024 16384

properties.posix_pool_slab_count : 128 64 32

properties.rdma_peer_affinity_policy : RoundRobin

properties.rdma_dynamic_routing : 1

properties.rdma_dynamic_routing_order : GPU_MEM_NVLINKS GPU_MEM SYS_MEM P2P

fs.generic.posix_unaligned_writes : false

fs.lustre.posix_gds_min_kb: 0

fs.beegfs.posix_gds_min_kb: 0

fs.weka.rdma_write_support: false

fs.gpfs.gds_write_support: false

profile.nvtx : false

profile.cufile_stats : 3

miscellaneous.api_check_aggressive : false

execution.max_io_threads : 0

execution.max_io_queue_depth : 128

execution.parallel_io : false

execution.min_io_threshold_size_kb : 8192

execution.max_request_parallelism : 0

=========

GPU INFO:

=========

GPU index 0 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled

GPU index 1 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled

==============

PLATFORM INFO:

==============

Found ACS enabled for switch 0000:80:01.1

Found ACS enabled for switch 0000:e0:03.1

IOMMU: disabled

Platform verification succeeded

Note: You can edit the file /etc/cufile.json to customize gdsio. During tests, only these two parameters have been modified:
"allow_compat_mode": false (default = true)
"rdma_dynamic_routing": true (default = false)
For more details, see GPUDirect Storage documentation.

Your Browser is Out of Date