Summary

Thank you for your feedback!

This paper describes how to deploy Nvidia Tesla GPUs for use with Kubeflow on OpenShift Container Platform. By integrating Nvidia GPUs on dedicated application worker nodes and in converged mode on storage nodes, we have demonstrated flexible configurations that you can use to create a high-performance compute environment for your ML/DL needs. This paper also describes the ability to configure an OpenShift Container Platform with a mix of Nvidia GPUs, extending the scope and capability of the ML/DL work profiles that can be processed in a single environment.
The development of a DL model is a computationally intensive operation. In most situations, the full learning process may require training of neural networks with millions of parameters. The learning process can severely tax a nonaccelerated compute platform. Nvidia GPUs are designed to runs 1,000s of threads, exploiting parallelisms that are available in ML/DL workloads and enabling higher levels of productivity for organizations that use Kubeflow to develop ML/DL applications. This white paper demonstrates how Nvidia GPUs that are added to your OpenShift Container Platform can extend its processing capacity for execution of ML/DL workloads using Kubeflow.