Home > AI Solutions > Gen AI > Guides > Dell PowerScale Storage Deployment Guide for NVIDIA DGX SuperPOD > Build and install Dell Multipath Client Driver
From the BCM head node, start from an image which includes the DGX OS 6.1 version and the NVIDIA OFED stack (5.8 in this example). For more details on how to deploy NVIDIA OFED stack with NVIDIA Base Command Manager, refer to the official NVIDIA website and look at the Installation Manual.
root@bcmhu:~# cmsh
[bcmhu]% softwareimage
[bcmhu->softwareimage]% list
Name (key) Path (key) Kernel version Nodes
---------------------------- ---------------------------------------- ------------------- --------
default-image /cm/images/default-image 5.19.0-45-generic 0
dgx-os-6.1-a100-image /cm/images/dgx-os-6.1-a100-image 5.15.0-1042-nvidia 0
dgx-os-6.1-h100-image /cm/images/dgx-os-6.1-h100-image 5.15.0-1042-nvidia 0
dgx-os-6.1-h100-mofed-image /cm/images/dgx-os-6.1-h100-mofed-image 5.15.0-1042-nvidia 1
[bcmhu->softwareimage]% clone dgx-os-6.1-h100-mofed-image dgx-os-6.1-h100-mofed-dell-image
[bcmhu->softwareimage*[dgx-os-6.1-h100-mofed-dell-image*]]% commit
Fri Jul 5 13:08:58 2024 [notice] bcmhu: Started to copy: /cm/images/dgx-os-6.1-h100-mofed-image -> /cm/images/dgx-os-6.1-h100-mofed-dell-image (1)
[bcmhu->softwareimage*[dgx-os-6.1-h100-mofed-dell-image*]]% commit[bcmhu->softwareimage[dgx-os-6.1-h100-mofed-dell-image]]%
Fri Jul 5 13:12:00 2024 [notice] bcmhu: Copied: /cm/images/dgx-os-6.1-h100-mofed-image -> /cm/images/dgx-os-6.1-h100-mofed-dell-image (2)
Fri Jul 5 13:12:00 2024 [notice] bcmhu: Initial ramdisk for image dgx-os-6.1-h100-mofed-dell-image is being generated
Fri Jul 5 13:13:05 2024 [notice] bcmhu: Initial ramdisk for image dgx-os-6.1-h100-mofed-dell-image was generated successfully
[bcmhu->softwareimage[dgx-os-6.1-h100-mofed-dell-image]]% list
Name (key) Path (key) Kernel version Nodes
-------------------------------- ------------------------------------------- ------------------- --------
default-image /cm/images/default-image 5.19.0-45-generic 0
dgx-os-6.1-a100-image /cm/images/dgx-os-6.1-a100-image 5.15.0-1042-nvidia 0
dgx-os-6.1-h100-image /cm/images/dgx-os-6.1-h100-image 5.15.0-1042-nvidia 0
dgx-os-6.1-h100-mofed-dell-image /cm/images/dgx-os-6.1-h100-mofed-dell-image 5.15.0-1042-nvidia 0
dgx-os-6.1-h100-mofed-image /cm/images/dgx-os-6.1-h100-mofed-image 5.15.0-1042-nvidia 1
[bcmhu->softwareimage[dgx-os-6.1-h100-mofed-dell-image]]% category use dgx-h100
[bcmhu->category[dgx-h100]]% get softwareimage
dgx-os-6.1-h100-mofed-image
[bcmhu->category[dgx-h100]]% set softwareimage dgx-os-6.1-h100-mofed-dell-image
[bcmhu->category*[dgx-h100*]]% commit
Fri Jul 5 13:42:26 2024 [notice] bcmhu: hop-dgx-01 [ UP ], restart required (softwareImage)
[bcmhu->category[dgx-h100]]% device
[bcmhu->device]% reboot hop-dgx-01
Reboot in progress for: hop-dgx-01
Fri Jul 5 13:45:19 2024 [notice] bcmhu: hop-dgx-01 [ DOWN ], restart required (softwareImage)
Fri Jul 5 13:48:17 2024 [notice] bcmhu: hop-dgx-01 [ BOOTING ] (ldlinux.e64 from bcmhu)
Fri Jul 5 13:49:40 2024 [notice] bcmhu: hop-dgx-01 [ INSTALLING ] (node installer started)
Fri Jul 5 13:50:16 2024 [notice] bcmhu: hop-dgx-01 [ INSTALLER_CALLINGINIT ] (switching to local root)
Fri Jul 5 13:52:54 2024 [notice] bcmhu: hop-dgx-01 [ UP ]
[bcmhu->device]% quit
You must build the Dell Multipath Client Driver on a running DGX. The built package will be extremely dependent on the running kernel. Any minor version kernel mismatch on the client results in the driver not installing correctly. For example, 5.4.0-150-generic is incompatible with 5.4.0-167-generic. Which means, If the kernel is upgraded to a newer version, it is mandatory to rebuild and reinstall the driver.
The Dell Multipath Client Driver can be downloaded from the PowerScale OneFS InfoHub web page. In this document, we assume that the Dell Multipath Client Driver source code is located under the following location:
/home/dell/dell-nfs-client-driver-ext
Connect to a DGX system which is running with the previously created “dgx-os-6.1-h100-mofed-dell-image” BCM image:
root@bcmhu:~# ssh hop-dgx-01
root@hop-dgx-01:~# cd /home/dell/dell-nfs-client-driver-ext/
root@hop-dgx-01:/home/dell/dell-nfs-client-driver-ext# ./build.sh bin
Input kernel version key: 5.15.0-1042-nvidia.ubuntu
Closest kernel version key match: 4.15.0.ubuntu
Matched source tag: HEAD
Building for kernel: 5.15.0-1042-nvidia (to override, use KVER environment)
Preparing source for kernel version
Building DEB for Distributor dell vendor Dell-Technologies
...
Output in dist/
total 960
-rw-r--r-- 1 root root 938928 Jul 5 13:55 dellnfs-dkms_4.0.24-Dell-Technologies-MLNX.OFED.LINUX-5.8-4.1.5.0_all.deb
root@hop-dgx-01:/home/dell/dell-nfs-client-driver-ext# apt install ./dist/dellnfs-dkms_4.0.24-Dell-Technologies-MLNX.OFED.LINUX-5.8-4.1.5.0_all.deb
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'dellnfs-dkms' instead of './dist/dellnfs-dkms_4.0.24-Dell-Technologies-MLNX.OFED.LINUX-5.8-4.1.5.0_all.deb'
...
root@hop-dgx-01:/home/dell/dell-nfs-client-driver-ext# mkdir /mnt/pscale
root@hop-dgx-01:/home/dell/dell-nfs-client-driver-ext# exit
From the Base Command Manager Head node:
root@bcmhu:~# cmsh
[bcmhu]% device use hop-dgx-01
[bcmhu->device[hop-dgx-01]]% grabimage -w
Fri Jul 5 14:00:57 2024 [notice] bcmhu: Provisioning started: sending hop-dgx-01:/ to bcmhu:/cm/images/dgx-os-6.1-h100-mofed-dell-image, mode GRAB, dry run = no
Fri Jul 5 14:01:04 2024 [notice] bcmhu: Provisioning completed: sent hop-dgx-01:/ to bcmhu:/cm/images/dgx-os-6.1-h100-mofed-dell-image, mode GRAB, dry run = no
grabimage -w [ COMPLETED ]
[bcmhu->device[hop-dgx-01]]% softwareimage use dgx-os-6.1-h100-mofed-dell-image
[bcmhu->softwareimage[dgx-os-6.1-h100-mofed-dell-image]]% createramdisk
Fri Jul 5 14:01:41 2024 [notice] bcmhu: Initial ramdisk for image dgx-os-6.1-h100-mofed-dell-image is being generated
Fri Jul 5 14:02:34 2024 [notice] bcmhu: Initial ramdisk for image dgx-os-6.1-h100-mofed-dell-image was generated successfully
[bcmhu->softwareimage[dgx-os-6.1-h100-mofed-dell-image]]% device
[bcmhu->device]% reboot hop-dgx-01
Reboot in progress for: hop-dgx-01
Fri Jul 5 14:12:47 2024 [notice] bcmhu: hop-dgx-01 [ DOWN ]
Fri Jul 5 14:15:42 2024 [notice] bcmhu: hop-dgx-01 [ BOOTING ] (ldlinux.e64 from bcmhu)
Fri Jul 5 14:17:07 2024 [notice] bcmhu: hop-dgx-01 [ INSTALLING ] (node installer started)
Fri Jul 5 14:17:50 2024 [notice] bcmhu: hop-dgx-01 [ INSTALLER_CALLINGINIT ] (switching to local root)
Fri Jul 5 14:20:28 2024 [notice] bcmhu: hop-dgx-01 [ UP ]
[bcmhu->device]% quit
root@bcmhu:~# ssh hop-dgx-01
root@hop-dgx-01:~# dellnfs-ctl status
version: 4.0.24-Dell-Technologies-MLNX.OFED.LINUX-5.8-4.1.5.0
kernel modules: sunrpc compat_nfs_ssc lockd nfs_acl auth_rpcgss rpcsec_gss_krb5 nfs nfsv3 nfsv4
services: rpcbind.socket rpcbind
rpc_pipefs: /run/rpc_pipefs
root@hop-dgx-01:~# mkdir /mnt/pscale
root@hop-dgx-01:~# mount -t nfs -o proto=rdma,port=20049,nconnect=32,remoteports=172.18.11.11-172.18.11.30,localports=enp41s0f0np0~enp41s0f1np1~enp170s0f0np0~enp170s0f1np1 172.18.11.11:/ifs/benchmark /mnt/pscale
root@hop-dgx-01:~# ls /mnt/pscale
data mlperf_dataset
root@hop-dgx-01:~# dpkg -l | grep dellnfs
ii dellnfs-dkms 2:4.0.24-Dell-Technologies-MLNX.OFED.LINUX-5.8-4.1.5.0 all DKMS support for NFS RDMA kernel module
root@hop-dgx-01:~# ls /usr/src/ | grep dell
dellnfs-99.2-4.0.24
root@hop-dgx-01:~# dkms status dellnfs
dellnfs/99.2-4.0.24, 5.15.0-1042-nvidia, x86_64: installed
root@hop-dgx-01:~# modinfo rpcrdma
filename: /lib/modules/5.15.0-1042-nvidia/updates/dkms/rpcrdma.ko
alias: rpcrdma6
alias: xprtrdma
alias: svcrdma
license: Dual BSD/GPL
description: RPC/RDMA Transport
author: Open Grid Computing and Network Appliance, Inc.
srcversion: DF37E7462DFE7BA69A1D4BB
depends: ib_core,sunrpc,mlx_compat,rdma_cm
retpoline: Y
name: rpcrdma
vermagic: 5.15.0-1042-nvidia SMP mod_unload modversions
sig_id: PKCS#7
signer: dgxos6-base64 Secure Boot Module Signature key
sig_key: 1D:50:80:1C:14:A3:D9:DA:28:35:A5:6F:CF:18:F6:43:52:F8:D7:3E
sig_hashalgo: sha512
signature: 0D:DD:D4:5E:AD:5A:6A:25:ED:47:21:05:A1:4C:25:51:8B:62:AC:9C:
6A:98:36:37:28:F9:77:9F:D2:66:20:C6:2E:C2:F1:83:5F:AA:D0:AA:
F2:1C:73:A6:5D:74:EE:EF:3B:C6:E7:48:52:D8:1A:E9:1F:2E:BF:62:
18:6D:DD:0A:5D:78:33:47:DA:8D:DD:56:40:63:F9:FA:CA:CA:2C:72:
FD:88:E9:12:CE:66:33:B8:C9:57:65:A3:76:CE:03:34:F5:C9:8C:51:
84:6C:A5:90:67:72:0B:CB:6A:77:FA:7E:A9:1E:08:B9:F2:2A:B6:DE:
DA:BF:7A:C9:D5:02:F5:77:F0:46:69:67:F8:49:B5:45:5C:07:B8:FD:
FC:82:29:9A:01:46:7A:3C:E1:19:16:5B:FC:46:F7:60:8D:D2:D4:E6:
5A:34:40:BF:00:61:B3:A3:92:5A:9F:82:5B:45:49:6C:D5:F3:8F:12:
6C:FD:BE:AA:17:CB:89:24:CC:AA:2E:CB:B3:62:DE:53:F6:D5:5E:1A:
FF:BE:08:42:4A:63:74:1D:81:F8:8D:9B:58:C3:41:E5:E6:AC:2F:41:
FD:E2:E5:FC:C4:A1:69:40:F8:72:34:E9:19:E7:42:CE:A3:A0:33:60:
20:7C:D4:A1:7C:FB:50:CB:8B:94:18:93:1A:8F:3C:A8
root@hop-dgx-01:~# dpkg -S /lib/modules/5.15.0-1042-nvidia/updates/dkms/rpcrdma.ko
dpkg-query: no path found matching pattern /lib/modules/5.15.0-1042-nvidia/updates/dkms/rpcrdma.ko
Note: the dpkg -S command might not give you the correct info regarding the rpcrdma module that is supposed to come with the Dell Multipath Client Driver. This is due to the BCM Mofed package.
root@hop-dgx-01:~# ll /lib/modules/5.15.0-1042-nvidia/updates/dkms/rpcrdma.ko
-rw-r--r-- 1 root root 569260 Jul 5 13:58 /lib/modules/5.15.0-1042-nvidia/updates/dkms/rpcrdma.ko
This action will be required during the following tasks:
root@hop-dgx-01:~# cp $(which dellnfs-ctl) /tmp/
root@hop-dgx-01:~# apt remove dellnfs-dkms
...
depmod...
Deleting module dellnfs-99.2-4.0.24 completely from the DKMS tree.
root@hop-dgx-01:~# ls /usr/src/ | grep dell
root@hop-dgx-01:~# dkms status dellnfs
# Reload all nfs modules
root@hop-dgx-01:~# systemctl stop slurmd.service && systemctl stop slurmstepd.scope ; umount /home ; modprobe mlx_compat && /tmp/dellnfs-ctl reload && mount master:/home /home
# Repeat the command, until you don't get any errors
root@hop-dgx-01:~# cd ; systemctl stop slurmd.service && systemctl stop slurmstepd.scope ; umount /home ; modprobe mlx_compat && /tmp/dellnfs-ctl reload && mount master:/home /home
dellnfs-ctl: unloading kmod nfsv4
dellnfs-ctl: unloading kmod nfs
dellnfs-ctl: unloading kmod rpcsec_gss_krb5
dellnfs-ctl: unloading kmod auth_rpcgss
dellnfs-ctl: unloading kmod lockd
dellnfs-ctl: unloading kmod sunrpc
dellnfs-ctl: loading kmod sunrpc
dellnfs-ctl: loading kmod lockd
dellnfs-ctl: loading kmod auth_rpcgss
dellnfs-ctl: loading kmod rpcsec_gss_krb5
dellnfs-ctl: loading kmod nfs
dellnfs-ctl: loading kmod nfsv4
# At this stage we may notice that the rpcrdma kernel module coming from the MOFED package is broken, as it is now using the OOTB module coming from the NVIDIA kernel which won’t work with MOFED.
# this can be confirmed by simply trying to mount the NFS export with NFSoRDMA
root@hop-dgx-01:~# mount -t nfs -o proto=rdma,port=20049,vers=3 172.18.11.11:/ifs/benchmark /mnt/pscale
mount.nfs: an incorrect mount option was specified
# the mount option proto=rdma is not recognized anymore, while a traditional TCP mount will work
root@hop-dgx-01:~# mount -t nfs -o vers=3,nconnect=8 172.18.11.11:/ifs/benchmark /mnt/pscale
root@hop-dgx-01:~# ls /mnt/pscale/
data mlperf_dataset
root@hop-dgx-01:~# umount /mnt/pscale
# to restore the original rpcrdma module coming from MOFED, simply uninstall and reinstall the mlnx-nfsrdma DKMS package
# first confirm that the mlnx-nfsrdma DKMS package is broken
root@hop-dgx-01:~# dkms status mlnx-nfsrdma
mlnx-nfsrdma/5.8, 5.15.0-1042-nvidia, x86_64: installed (WARNING! Diff between built and installed module!)
# then remove the dkms package
root@hop-dgx-01:~# dkms remove -m mlnx-nfsrdma -v 5.8
Module mlnx-nfsrdma-5.8 for kernel 5.15.0-1042-nvidia (x86_64).
Before uninstall, this module version was ACTIVE on this kernel.
...
depmod...
Deleting module mlnx-nfsrdma-5.8 completely from the DKMS tree.
# Finally, reinstall the DKMS mlnx-nfsrdma package:
root@hop-dgx-01:~# dkms install -m mlnx-nfsrdma -v 5.8
Creating symlink /var/lib/dkms/mlnx-nfsrdma/5.8/source -> /usr/src/mlnx-nfsrdma-5.8
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
...
depmod...
# verify that the dkms package is now properly installed
root@hop-dgx-01:~# dkms status mlnx-nfsrdma
mlnx-nfsrdma/5.8, 5.15.0-1042-nvidia, x86_64: installed
# Confirm that NFSoRDMA is working properly.
root@hop-dgx-01:~# mount -t nfs -o proto=rdma,port=20049,vers=3 172.18.11.11:/ifs/benchmark /mnt/pscale
root@hop-dgx-01:~# ls /mnt/pscale/
data mlperf_dataset
Dell Multipath Client Driver introduces new mount options to the NFS client stack. This section will provide the recommended options to use in a NVIDIA DGX SuperPOD environment with Dell Ethernet Storage. Table 3 shows all recommended newly added mount option within Dell Multipath Client Driver.
Linux Mount Option | Comment |
remoteports= | Allow a client to target multiple PowerScale IP to multiplex I/O. Use `-` to provide a range Use `~` to combine multiple ranges and/or individual IP. ex: remoteports=100.127.97.11-100.127.97.42 |
localports= | Client IP addresses or network interface device name. Use `-` to provide an IP range Use `~` to combine multiple ranges and/or individual IP, or individual network device name. As every DGX have identical configuration, it is recommended to use the network device name, ex: localports=enp225s0f0np0~enp97s0f0np0 |
remoteports_offset= | Adjusts the starting point in the remoteports list of IP addresses when selecting nconnect transports. If this number matches (modulo the nconnect number) for two clients, those clients will always access the same file from the same node. If this number is different (modulo the nconnect number) between two clients, those clients will always access the same file from different nodes. The default is a pseudo-random number based on source IP address, which tries to distributed the mapping between nodes for higher aggregate throughput against a single file. Note: ignored when spread-read or spread-write supercedes it Must be an integer greater than 0, ex: remoteports_offset=1 |
nconnect= | The "nconnect" option, which was ported to OS linux, has undergone significant changes and now works with RDMA protocol. It allows the user to specify the number of connections that should be established between the client and the PowerScale cluster up to a limit of 64. The remoteports option should be a multiple of the nconnect value. nconnect=32 |
localports_failover | Enable failover mode. For this option to work, noidlexprt option must be used. No value needed. |
noidlexprt | Do not disconnect idle connections. Required by localports_failover option. No value needed. |
Table 4 shows the other recommended options for NVIDIA DGX SuperPOD environment with Dell Ethernet Storage.
Linux Mount Option | Comment |
proto= | Enable NFSoRDMA. Must be used with port option. proto=rdma |
port= | The numeric value of the server's NFS service port. For NFSoRDMA, use port 20049 port=20049 |
rsize= | The maximum number of bytes in each network READ request that the NFS client can receive when reading data from a file on an NFS server. rsize=1048576 |
wsize= | The maximum number of bytes per network WRITE request that the NFS client can send when writing data to a file on an NFS server. wsize=1048576 |
vers= | The NFS protocol version number used to contact the server's NFS service. vers=3 |
From the BCM head node, run the following two commands to mount the PowerScale NFS export (please adjust the dgx hostname naming convention as needed):
# unmount the share /mnt/pscale on every DGX client:
pdsh -w hop-dgx-[01-32] umount /mnt/pscale
# Mount /mnt/pscale on every DGX with the recommended mount options
i=1
for x in {01..20} ; do
ssh hop-dgx-${x} "mount -o proto=rdma,port=20049,rsize=1048576,wsize=1048576,vers=3,nconnect=32,remoteports=100.127.98.11-100.127.98.42,localports=enp225s0f0np0~enp97s0f0np0,remoteports_offset=${i},localports_failover,noidlexprt 100.127.98.11:/ifs/superpod /mnt/pscale"
((i++))
done
Note: at the moment it is not possible to configure the mount point within Base Command Manager (cmsh -> category -> <category_name> -> fsmount -> /mnt/pscale) as the remoteports_offset option will have a different value on each client. Another approach could be to have a script at startup that will mount the export and set the remoteports_offset value based on hostname id.
Example : id=`echo $HOSTNAME | grep -Eo '[0-9]+$' | sed 's/^0*//'`