Solution introduction

Thank you for your feedback!

Overview
Organizations today use artificial intelligence (AI) to make strategic and investigative decisions, solve problems, and accelerate achievement of business intelligence goals. Machine learning (ML) is the most critical enabler of AI, employing algorithms and learning models to parse large datasets. Data scientists tasked with ML development, however, can spend an inordinate amount of time designing, configuring, and testing ML platforms instead of working on the ML models themselves. These tasks cause highly trained specialists to spend too little time on their primary specialty and adding value.
By using Machine Learning plus IT Operations (MLOps)—inspired by the popular DevOps model for application development—organizations investing in ML can automate the training and deployment of ML models at scale. They can implement continuous integration/continuous deployment (CI/CD) for quicker and more responsive ML and AI, with a goal of achieving standardized, predictable, and manageable ML environments and realization of ML models reaching production.
Dell Technologies has worked closely with cnvrg.io and other partners to deliver MLOps through a jointly engineered and tested solution to help organizations capitalize on the benefits of ML and AI. The software portion of the joint solution includes:
- cnvrg.io as the MLOps platform
- Optional edge-computing and Dell AI enabling workloads
- Applications, frameworks, and tools that researchers, data scientists, and developers can use to build ML models and analyze data
cnvrg.io can be deployed on various Kubernetes infrastructures, including others provided by Dell Technologies. This solution is validated to run on two solutions:
- Dell Validated Design for AI with VMware and NVIDIA, which includes VMware vSphere with Tanzu
- Dell Validated Design for Analytics—Data Lakehouse, which includes Symworld Cloud Native Platform (formerly known as Robin Cloud Native Platform)
The hardware portion of the Dell solution can run on either Dell PowerEdge Servers or VxRail hyperconverged infrastructure (for VMware-based solutions) for compute. Dell PowerScale provides performance and concurrency at scale that is critical to consistently feeding the most data-hungry ML and AI algorithms.
Document purpose
Using the information in this design guide, IT administrators and DevOps engineers can instantly design a fully featured ML operations platform with cnvrg.io to support advanced AI use cases that can take advantage of NVIDIA-accelerated GPUs (graphics processing units) and curated AI software for AI researchers, data scientists, and developers on Dell Technologies infrastructures.
This design guide is a companion document to the Optimize Machine Learning Through MLOps with Dell Technologies and cnvrg.io White Paper. See this technical white paper for more information about the challenges that MLOPs platforms address.
Note: The contents of this document are valid for the described software and hardware versions. For information about updated configurations for newer software and hardware versions, contact your Dell Technologies sales representative.
Document scope
This design guide discusses cnvrg.io and how it is designed to run on a Kubernetes infrastructure, of which there are multiple options based on Dell infrastructure. It discusses cnvrg.io software architecture, storage, and network configuration. The design guide then details a solution architecture for cnvrg.io on the Dell Validated Design for AI with VMware and NVIDIA and Design Guide—Dell Validated Design for Analytics—Data Lakehouse.
We provide sizing guidelines that help to allocate compute resources to MLOps, based on their use cases. We provide information about how we validated the solution through a use case that proceeds through the life cycle of an AI model from development, to training, and to inference.
Audience
This design guide is intended for solution architects, system administrators, IT administrators, DevOps engineers, and others who are interested in MLOps platforms for developing and deploying AI applications in production.

Your Browser is Out of Date

Solution introduction

Solution introduction

Overview

Document purpose

Document scope

Audience