The Machine Learning Lifecycle
(Adapted from Microsoft’s – Data Science Lifecycle)
In one of our previous articles, we have discussed in detail the multiple reasons why such a high number of ML initiatives fail to make it to the production phase. The need to deal with these challenges and other smaller nuances of deploying ML models has given rise to the relatively new concept of MLOps.
What is MLOps
MLOps – a set of best practices aimed at automating the ML lifecycle – brings together the ML system development and ML system operations. An amalgamation of DevOps, machine learning, and data engineering, MLOps simplifies machine learning deployment issues in diverse business scenarios by establishing ML as an engineering discipline. It brings about several machine learning benefits to otherwise challenging scenarios of ML deployment.
Businesses can leverage MLOps to craft a definitive process for driving tangible outcomes through ML. One of the most prominent reasons behind the rising popularity of MLOps is its ability to bridge the expertise gap between business and data teams. Moreover, the wide-spread adoption of ML has had an impact on the evolution of the regulatory landscape. As this effect continues to grow, MLOps will help enterprises handle the bulk of regulatory compliance without impacting data practices.
And finally, the collaborative expertise between data and operations teams allows MLOps to bypass the bottlenecks that exist in the deployment process. And as we explore further, we will see how MLOps tightens the loop and irons out the creases in the machine learning system design and implementation framework.
MLOps framework for success
Since MLOps is a nascent field, it can be difficult to get a grasp of what it entails and its requirements. One of the foremost challenges in implementing MLOps is the difficulty in superimposing DevOps practices on ML pipelines. This is primarily due to the fundamental difference: DevOps deals with code, whereas ML is code and data. And when it comes to data, unpredictability is always a major concern.
Since code and data evolve independently and in parallel, the resulting disconnect causes ML production models to be slow and often inconsistent. Moreover, applying a simple CI/CD approach may not be possible due to a lack of reproducibility of an immense volume of data which is difficult to track and version. Therefore, for machine learning in production, it is crucial to adopt a CI/CD/CT (continuous training) approach.
Exploring the ML pipeline (CI/CD/CT)
Data teams need to look at MLOps simply as a code artefact that stands independent of individual data instances. This is why breaking it up into two distinct pipelines (training pipeline and serving pipeline) can help ensure a safe run environment for batch files as well as an effective test cycle.
The training pipeline involves the entire model preparation process which starts with collecting and preparing data. Once the data is collected, validated, and prepared, data scientists need to implement feature engineering to assign data values for training as well as in production. At the same time, an algorithm has to be chosen that will define how the model identifies data patterns. Once this is done, the model can start training based on historical offline data. The trained model can then be evaluated and validated before being deployed through a model registry to the production pipeline.
A schematic representation of the complete model preparation process