Home > AI Solutions > Artificial Intelligence > Guides > Design Guide—Optimize Machine Learning Through MLOps with Dell Technologies cnvrg.io > Overview
The following table describes the hardware and software of the two configurations we used for validating this design. The software versions were the latest during the time of validation. Newer versions might be available after the publication of this document and are fully supported.
Category | Components in VMware vSphere with Tanzu | Components in Symworld Cloud Native Platform |
Servers | 4 x PowerEdge R750xa servers, each with 2 x NVIDIA A100 80 GB | 4 x PowerEdge R7525 servers, each with 2 x NVIDIA A100 80 GB |
Virtualization and container orchestration | VMware vSphere 7.0U3 Standard Edition VMware vSphere with Tanzu (Required for container orchestration). | Symworld Cloud Native Platform (version 5.3.11-217) |
Storage for VMs and Kubernetes cluster | vSAN | Symworld Cloud Native Storage |
Storage for AI datasets | PowerScale F810 as NFS storage | PowerScale F810 as NFS storage |
Network switches |
|
|
Virtualized GPUs and AI software suite | NVIDIA AI Enterprise (version 1.1) | NVIDIA GPU Operator (version 1.11.1) |
MLOps platform | cnvrg.io (version 3.11) | cnvrgv.io (version 4.7.40) |
Additionally, we used the following Kubernetes configuration for VMware vSphere with Tanzu:
The validation is performed with the AI Radiologist use case. The objective of this use case is to train a DL model to classify pathologies using a patient’s frontal-view chest X-rays. Our project is based on by the Stanford ChexNet project, which was developed to detect pneumonia from a chest X-ray.
The dataset used is the ChestX-ray14, which is one of the largest publicly available chest X-ray datasets released by the National Institute of Health (NIH). We use ChexNet as the baseline model for our project, which is a 121-layer Dense Convolutional Neural Network (DenseNet).
Consider a team with multiple data scientists collaborating in an ML project to create a solution to classify pathologies in a chest X-ray.
The MLOPs workflow for this project consists of various steps such as:
These steps are iterative. Therefore, it becomes more complex when multiple team members are working in the same project focused on different steps of the MLOPs pipeline.
Another task is to manage the efficient resource allocation for various workloads. To have an optimized and effective tool to handle these requests from data scientists and ML engineers.