Home > APEX > Compute & HCI > White Papers > Optimize Machine Learning Through MLOps with Dell APEX and cnvrg.io > ML challenges
Both complexity and tuning greatly influence ML and AI. By default, every ML and AI pipeline contains multiple component steps, some of which are highly repeatable and some of which might be highly specific to a particular problem. Even if a particular pipeline can be run for multiple workloads, there is a significant benefit to tuning and optimizing for the required dataset. ML development is a full cycle process with multiple phases. As an organization goes from modeling to testing to deployment, management, and monitoring, there are a host of challenges at each step in the process.
For instance, in the modeling phase, data scientists might have to vary parameters, adjust settings, or change variables. Then, they must evaluate the outputs to see how closely the initial small amount of test data aligns to their goals. The multiplicity of versions and model hosts (containers) that arise can be complex enough to make organizing and management of the models too time-consuming, slowing development, and delaying release.
When a model appears to be valid, testing does not end, as larger amounts of data are fed into the model and a new cycle of testing occurs. When deployed, all the pipeline process steps must be recorded and archived as the data output is highly dependent on the exact pipeline configuration and steps. For repeatability and audit reasons, it is not enough to deploy; there must be extensive recording of the deployment parameters.
Constant monitoring of the solution is also required. Workers must constantly monitor factory robots that have automated physical production. Similarly, organizations cannot deploy an ML solution and move on, because changes in data over time may have a fundamental impact on outputs. A continuous cycle of modeling, testing, deploying, managing, and monitoring is important for the best quality control, as you would expect from an assembly line.
The result is that far too much time can be spent setting up the proper ML and AI environment and not enough time on the development of working AI/ML systems. Clearly, optimizing the deployment process can drive increased results and provide more value faster. However, almost 85 percent of AI/ML projects fail to deliver on their goals and a staggering 47 percent never even make it out of the lab.
Note: Source: https://www.infoworld.com/article/3639028/why-ai-investments-fail-to-deliver.html.
The disconnect between effort and result is a direct consequence of the lack of process standardization in ML and AI.
It is not a small case, however, of applying the DevOps methodologies directly to ML. The organization must recognize that the practice of data science—where application development is the subservient, not primary function—needs to change. It must be brought closer to IT as the two forces begin to work together to effect real change through MLOps. MLOps is a powerful path forward, allowing data scientists to get out from behind the system configuration console and back to driving ever more sophisticated and useful results.