Enterprises are rapidly increasing their investments in infrastructure platforms to support data analytics and artificial intelligence (AI), including the more specific AI disciplines of machine learning (ML) and deep learning (DL). All these disciplines benefit from running in containerized environments. The benefits of running these applications on OpenShift Container Platform apply to developers, data scientists, and IT operators.
For simplicity, we use “data analytics as a service” (DAaaS) to refer to analytics and AI as operated and instantiated in a containerized environment. OpenShift Container Platform enables operators to create a DAaaS environment as an extensible analytics platform with a private cloud-based delivery model. This delivery model makes various tools available for data analytics and can be configured to efficiently process and analyze huge quantities of heterogeneous data from shared data stores.
The data analytics life cycle, and particularly the ML life cycle, is a multiphase process to integrate large volumes and varieties of data, abundant compute power, and open source languages, libraries, and tools to build intelligent applications and predictive outcomes. At a high level, the life cycle comprises these steps:
Data scientists and engineers are primarily responsible for developing modeling methods that ensure the selected outcome continues to provide the highest prediction accuracy. The key challenges that data scientists face include:
Containers and Kubernetes are key to accelerating the data analytics life cycle because they provide data scientists and IT operators with the agility, flexibility, portability, and scalability needed to train, test, and deploy ML models.
OpenShift Container Platform provides all these benefits. Through its integrated DevOps capabilities and integration with hardware accelerators, it enables better collaboration between data scientists and software developers. It also accelerates the roll-out of analytics applications to departments as needed.
The benefits include the ability to:
On-demand access to high performance hardware can seamlessly meet the high compute resource requirements to help determine the best ML model, providing the highest prediction accuracy.
Extending OpenShift DevOps automation capabilities to the ML lifecycle enables collaboration between data scientists, software developers, and IT operations so that ML models can be quickly integrated into the development of intelligent applications. This feature helps boost productivity and simplify life cycle management for ML-powered intelligent applications.
One example of ML on OpenShift Container Platform is the work done by Dell EMC and Red Hat to deploy Kubeflow on OpenShift.
Kubeflow is an open-source Kubernetes-native platform for ML workloads that enables enterprises to accelerate their ML/DL projects. Based originally on Google’s use of TensorFlow on Kubernetes, Kubeflow is a composable, scalable, portable ML stack that includes components and contributions from a variety of sources and organizations. It bundles popular ML/DL frameworks such as TensorFlow, MXNet, Pytorch, and Katib with a single deployment binary file. By running Kubeflow on OpenShift Container Platform, you can quickly operationalize a robust ML pipeline.
The software stack is only part of the solution. You also need high-performance servers, storage, and network infrastructure to deliver the stack’s full capability. Enterprises investing in custom infrastructure platforms to support the exploration of such AI technologies sometimes use ad hoc hardware implementations that are outside mainstream data center systems infrastructure. The ability to integrate production-grade, experimental AI technologies in well-defined platforms facilitates wider adoption. This scenario is where Dell EMC Ready Stack for OpenShift Container Platform comes in.