We manually installed GPU driver on each worker node. To install the GPU operator:
- Before installing the NVIDIA GPU driver, install the kernel headers and gcc complier:
$ dnf group install "Development Tools"
$ dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
- Download and install NVIDIA driver version 525 by running the NVIDIA-Linux-x86_64-525.65.run file and following the prompts.
- Verify the driver installation by running the nvdia-smi command.
- Ensure that MIG is turned off.
- Reboot the server if required
- Install the GPU operator by using the following helm commands:
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
$ helm repo update
$ helm install --wait --generate-name -n robinio nvidia/gpu-operator --set driver.enabled=false,toolkit.version=v1.11.0-ubi8 --version v22.9.2
- Verify the GPU installation by using the kubectl get pods -n robinio | grep nvidia command:
