No data available.

Thank you for your feedback!

NVIDIA drivers are compiled from source. To complete the build process:
1. Install the kernel-devel package by running the following command:
2. yum -y install kernel-devel-`uname -r`
3. The Nvidia-driver package requires the DKMS package. To install the DKMS package from the EPEL repository:
  1. Install the epel repository by running the following command:
    yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
  2. Locate the newest NVIDIA drivers in the following repository:
    yum install -y https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.1.105-1.x86_64.rpm
4. To install the nvidia-kmod package, which includes the NVIDIA kernel modules, run:
yum -y install nvidia-driver nvidia-driver-cuda nvidia-modprobe
1. Remove the nouveau kernel module to ensure that the NVIDIA kernel module loads:
modprobe -r nouveau
Installing the NVIDIA driver package blacklists the driver in the kernel command line nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off. This ensures that the nouveau driver is not loaded on subsequent reboots.
1. Load the NVIDIA and the unified memory kernel modules by running the following command:
nvidia-modprobe && nvidia-modprobe -u
1. Verify that the installation and the drivers are working by running:
nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g'

This command outputs the name of the GPU on the server – in this example, Tesla-V100-SXM2-32GB. This name can be used to label the node in OpenShift.
Steps 1 to 6 of this procedure describe installation of the NVIDIA GPU driver from source. At the time of writing, NVIDIA and Red Hat have announced a technical preview of new packages for GPU drivers for select Red Hat Enterprise Linux versions. These packages eliminate the need to have compilers and a full software development toolchain installed on each system that is running NVIDIA GPUs, simplifying the management experience for the user. To get started with the new packages, follow the instructions in this README.
Add the nvidia-container-runtime-hook
The version of Docker that is shipped by Red Hat includes support for OCI runtime hooks, Therefore, we need to install only the nvidia-container-runtime-hook package.
1. Install libnvidia-container and the nvidia-container-runtime repository by running the following command:
curl -s -L https://nvidia.github.io/nvidia-container-runtime/centos7/nvidia-container-runtime.r
epo | tee /etc/yum.repos.d/nvidia-container-runtime.repo

An OCI prestart hook makes NVIDIA libraries and binary files available in a container by bind-mounting them in from the host. The prestart hook is triggered by the presence of certain environment variables in the container: NVIDIA_DRIVER_CAPABILITES=compute,utility.
1. Install an OCI prestart hook by running:
yum -y install nvidia-container-runtime-hook
1. Set the config/activation files for docker/podman/cri-o on all the nodes with GPUs (in our installation, the storage and application nodes) by running:
cat<<'EOF' >> /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
{
"hasbindmounts": true,
"hook": "/usr/bin/nvidia-container-runtime-hook",
"stage": [ "prestart" ]
}
EOF
Build the SELinux policy module
An SELinux policy tailored for running CUDA GPU workloads is required to run NVIDIA containers that are contained and not privileged.
Install the SELinux policy module on all GPU worker nodes by running:
wget https://raw.githubusercontent.com/zvonkok/origin-ci-gpu/master/selinux/nvidia-container.pp
semodule -i nvidia-container.pp

Check and restore the labels on the node
The new SELinux policy relies on correct labeling of the host. Ensure that the files that are needed have the correct SELinux label by running the following commands:
1. Restorecon all the files that the prestart hook will need:
nvidia-container-cli -k list | restorecon -v -f -
1. Restorecon all the devices that are accessed:
restorecon -Rv /dev
1. Restorecon all the files that the device plug-in will need:
restorecon -Rv /var/lib/kubelet
The system is now set up to run a GPU-enabled container.
Verify SELinux and prestart hook functionality
To verify correct operation of the driver and container enablement, run a cuda-vector-add container with Docker or Podman:

docker run --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL \
--security-opt label=type:nvidia_container_t \
docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
Trying to pull repository docker.io/mirrorgooglecontainers/cuda-vector-add ...
v0.1: Pulling from docker.io/mirrorgooglecontainers/cuda-vector-add
5d9a20cbabf3: Pull complete
84b2e9f421b6: Pull complete
6f94649104a2: Pull complete
6c16e819a84a: Pull complete
9822cda4c699: Pull complete
1bc138ea32ad: Pull complete
ade909bfe2a5: Pull complete
e70e5ba470d6: Pull complete
ab71e6b7eb90: Pull complete
925740434ebd: Pull complete
2f93605342b5: Pull complete
fe61ad4992f7: Pull complete
Digest: sha256:0705cd690bc0abf54c0f0489d82bb846796586e9d087e9a93b5794576a456aea
Status: Downloaded newer image for docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
If you see the words Test PASSED, the drivers, hooks and container runtime are functioning correctly and you can proceed to configuring OpenShift Container Platform.

Your Browser is Out of Date

No data available.

No data available.

Build the SELinux policy module

Check and restore the labels on the node

Verify SELinux and prestart hook functionality