Organizations today use artificial intelligence (AI) to make strategic and investigative decisions, solve problems, and accelerate achievement of business intelligence goals. Machine learning (ML) is the most critical enabler of AI, employing algorithms and learning models to parse large datasets. Data scientists tasked with ML development, however, can spend an inordinate amount of time designing, configuring, and testing ML platforms instead of working on the ML models themselves. These tasks cause highly trained specialists to spend too little time on their primary specialty and adding value.
By using Machine Learning plus IT Operations (MLOps)—inspired by the popular DevOps model for application development—organizations investing in ML can automate the training and deployment of ML models at scale. They can implement continuous integration/continuous deployment (CI/CD) for quicker and more responsive ML and AI, with a goal of achieving standardized, predictable, and manageable ML environments and realization of ML models reaching production.
Dell Technologies has worked closely with cnvrg.io and other partners to deliver MLOps through a jointly engineered and tested solution to help organizations capitalize on the benefits of ML and AI. The software portion of the joint solution includes:
cnvrg.io can be deployed on various Kubernetes infrastructures, including others provided by Dell Technologies. This solution is validated to run on VMware vSphere with Tanzu, a virtualization platform that enables an enterprise to manage clusters of both on-demand Kubernetes containers alongside traditional virtual machines, providing complete life cycle management of those compute and storage resources.
The hardware portion of the Dell solution can run on either Dell PowerEdge Servers or VxRail hyperconverged infrastructure (HCI) for compute. Dell PowerScale provides performance and concurrency at scale that is critical to consistently feeding the most data-hungry ML and AI algorithms.
Using the information in this design guide, IT administrators and DevOps engineers can instantly design a fully featured ML operations platform with cnvrg.io to support advanced AI use cases that can take advantage of NVIDIA-accelerated GPUs (graphics processing units) and curated AI software for AI researchers, data scientists, and developers on Dell Technologies infrastructures.
Note: The contents of this document are valid for the described software and hardware versions. For information about updated configurations for newer software and hardware versions, contact your Dell Technologies sales representative.
This design guide discusses cnvrg.io and how it is designed to run on a Kubernetes infrastructure, of which there are multiple options based on Dell infrastructure. It discusses cnvrg.io software architecture, storage, and network configuration. The design guide then details a solution architecture for cnvrg.io on the .
We provide sizing guidelines that help to allocate compute resources to MLOps, based on their use cases. We provide information about how we validated the solution through a use case that proceeds through the life cycle of an AI model from development, to training, and to inference.
This design guide is intended for solution architects, system administrators, IT administrators, DevOps engineers, and others who are interested in MLOps platforms for developing and deploying AI applications in production.