Summary

Thank you for your feedback!

This paper describes how to deploy Nvidia Tesla GPUs for use with Kubeflow on OpenShift Container Platform. By integrating Nvidia GPUs on dedicated application worker nodes, and in converged mode on storage nodes, we have demonstrated flexible configurations that you can use to create a high-performance compute environment that can meet your ML/DL needs. This paper also documents the ability to configure an OpenShift Container Platform with a mix of Nvidia GPUs, thus extending the scope and capability of the ML/DL work profiles that can be processed in a single environment.
The development of a DL model is a computationally intensive operation. In most situations, the full learning process may require training of neural networks with millions of parameters. The learning process can severely tax a nonaccelerated compute platform. Nvidia GPUs are designed to runs 1,000s of threads, exploiting parallelisms that are available in ML/DL workloads and enabling higher levels of productivity for organizations using Kubeflow to develop ML/DL applications. This document demonstrates how Nvidia GPUs can be added to your OpenShift Container Platform and extend its processing capacity for execution of ML/DL workloads using Kubeflow.