MLOps is a Machine Learning(ML) engineering culture and practice that aims to unify ML system development (Dev) and ML system operations(Ops). It’s the application of DevOps practice in machine learning field. Practicing MLOps means that we advocate for automation and monitoring at all steps of ML system construction, including integration, testing, releasing, deployment and infrastructure management.
A few quotes to understand more about MLOps:
- “ The main challenges people face when developing ML capabilities are scale, version control, model reproducibility, and aligning stakeholders”
- The term MLOps is defined as “the extension of the DevOps methodology to include Machine Learning and Data Science assets as first-class citizens within the DevOps ecology”
- MLOps, like DevOps, emerges from the understanding that separating the ML model development from the process that delivers it — ML operations — lowers quality, transparency, and agility of the whole intelligent software.”
Building your own MLOps pipeline is not one day’s work. Before you get started, it’s important to understand the different stages or levels of MLOps pipeline, and plan accordingly to gradually build up a full MLOps pipeline. In this article, we will look at the different maturity levels of MLOps pipeline and what to do to get to each level.
MLOps maturity level is a good way to measure the level of automation in a machine learning pipeline. The exact definition of maturity levels might diff a little in different industry MLOps leaders such as Google or Microsoft. According to Microsoft Azure’s MLOps Levels definition (Strongly recommend reading) and Google’s architecture review of MLOps levels , they can be summarized as:
- Level 0: No DevOps or MLOps
- Level 1: DevOps but no MLOps
- Level 2: DevOps and Automated Model Training
- Level 3: DevOps and Automated Model Deployment
- Level 4: Full MLOps
Let’s do a deep dive into the MLOps maturity levels and see what characteristics each maturity level has.
Maturity level 0 is heavily manual. In this level, there is no DevOps for model release, and no MLOps for model training and deployment. A data scientist manually extracts and experiments with data, and then manually trains and creates ML models, and then manually evaluates and validates the models. The ML models are then handled off to a software engineer to manually deploy to ML service in production. Below is a diagram of a typical MLOps Maturity Level 0 pipeline:
In this level, there are a few obvious characteristics to call out :
- No MLOps. From data extraction and analysis to model training and evaluation, the process is entirely manual. The data scientist may have some local scripts in place to simplify the workflow, but there is no unified or managed system to automate the process. The lack of automated data processing, automated model training, and automated model evaluation often results in long release cycle of ML models and compromised consistency of ML model performance.
- No DevOps. This is typical when a ML project is in its initial phase, when the software engineers have not had the time and investment to build a DevOps pipeline to release the ML models automatically. However, the lack of automation, particularly no automated deployment for the ML model and no automated integration testing for the ML service, often results in long deployment cycle of ML models and problematic production management such as version management and emergency rollback.
- Infrequent release cycles. With a heavily manual workflow, any ML feature change takes a long cycle to be released to the users. The common practice of releasing frequently in today’s agile software development fails miserably at this maturity level.
In summary, it’s fine to be on this level when a ML project first starts. But if staying on this level, the manual effort quickly adds up and take a toll on the overall efficiency and progress of the project, as well as scientist and engineer happiness.
Maturity level 1 is still manual but has taken a step forward to add automation into the system. The ML workflow performed by the data scientist is still manual, but once the ML model is handed over to the engineer (or the scientist self), the release, deployment, and monitoring of the ML service is fully automated. The pipeline that performs ML service release and deployment is considered as a CI/CD pipeline, which ensures proper DevOps practices ML service and code management.
It’s important to note that the DevOps in this level refers to ML service release, not ML model release. The ML model, once created, is not automatically released and deployed to production. A engineer/scientist still needs to manually trigger the ML service pipeline to deploy the ML model to production.
In this level, there are a few characteristics to call out :
- DevOps for ML service release. The ML model, once created, is deployed through a DevOps pipeline with automatic build, packaging, testing, and deployment. No more manual copy of ML model to production. The same CI/CD flow applies to ML service code change as well. Automated integration testing is in place to verify ML service related code change. This ensures consistent model performance and code change in production on the ML service side.
- No MLOps. The ML workflow is still manual. No automated model training and no automated model deployment.
A team usually advances to this level pretty quickly. Often times it’s because there are more engineers than scientists on the team, therefore the engineers have more capacity to quickly build up a DevOps pipeline for ML service deployment. However, the team and project as whole still feels the pain of infrequent release cycles, due to the manual effort in involved in ML model creation workflow, which often takes longer time, particularly if the model training involves deep learning and large amount of data.
Maturity level 2 makes significant progress on ML workflow automation. In this level, the steps from data extraction, data processing, and model training are fully automated. A scientist utilizes the efficiency of those automation to collect data and train ML model quickly, reducing the overall ML workflow and release cycle. However, the trained model still needs to be manually validated and still needs to be manually handed over to the engineers to deploy to ML service in production.
A few characteristics to call out in this level:
- Partial MLOps. ML workflow is partially automated up to ML model training. As the team realizes the pain of long release cycle and invests effort in reducing ML workflow time, the data processing and model training steps are often the first targets.
- Automated Model Training. Model training is often the most time consuming step in today’s ML workflow, not only because of the actual training time, but also due to the infrastructure and large number of machines needed. Management of the training infrastructure, if not fully automated, often become a burden to manually start up and shut off. The scientist may come up with automated scripts to reduce the manual steps needed to manage the training infrastructure, but a more unified and fully managed system is needed to reduce the work further.
- No Automated Model Deployment. As mentioned above, the ML model created by automated model training step is still not automatically released. This is usually caused by either the manual model validation from the scientist or the disconnection between the model training pipeline and model deployment pipeline, or both.
In this level, the team has made great progress in automating the ML workflow and clearly feels the efficiency and speed improvement it brings to the overall release cycle. At this point, it only makes sense to keep moving forward to the next maturity level with more automation.
Maturity level 3 tackles the last step of the ML workflow — model deployment. In this level, the trained model is automatically validated using the predefined metric threshold. Once the model validation passes, the ML model is automatically sent to ML service pipeline and automatically triggers a ML service pipeline deployment to deploy the ML model to production. As you have probably already guessed, in this level, all steps from data collection all the way to model deployment in production are automated. The ML training pipeline and ML service pipeline are connected and work together to turn raw data into served ML model in production.
It’s also important to realize that we are not at full MLOps yet. The job is not done when the ML model is deployed to production. Model monitoring and specially model retraining are very important parts of a full MLOps system, which we don’t have yet.
A few characteristics to call out in this level:
- Close to full MLOps. ML workflow is fully automated from data collection to model deployment. No manual effort is involved, if everything goes well, once scientist triggers the ML training pipeline with desired data. Full MLOps is not achieved yet, as we haven’t had model retraining in place to close the loop.
- Automated Model Deployment. The exciting part of the ML workflow automation. As the scientists and engineers work together to make the connection between ML training pipeline and ML service pipeline, the automated model deployment is achieved. This often involves proper automated model validation on the ML training pipeline side and automated deployment trigger on the ML service pipeline side.
The team now has a working and efficient system to quickly deliver raw data to served model in production. Productivity and speed are significantly improved for the ML project. The team may tempt to rejoice and call the job done. However, to achieve full MLOps, there is a final maturity level to go after.
Maturity level 4 is the final level as full MLOps. The noticeable difference between level 4 and level 3 is the capability of model retraining. ML model is heavily data driven. After the ML model is deployed to production and starts running inference on real world data that it has never been trained on, its performance can start degrading over time. It’s important to set up performance monitoring metrics to monitor ML model performance in production, and more importantly, automatically triggers model retraining when mode performance metrics go down certain threshold. Below is a diagram of a typical MLOps Maturity Level 4 pipeline:
Level 4 MLOps pipeline incorporates all continuous integration(CI), continuous delivery(CD), and continuous training (CT) processes into the pipeline. Comparing to previous level pipelines, a few new features are added around continuous training:
- Collecting new real world data. After being deployed to production, the ML model constantly gets inference request with real world data. In this level, the MLOps system collects the new real world data it sees in production over time, and saves the data for future use.
- ML model performance monitoring. Getting the ML model to production is not the end. As common practice, performance monitoring needs to be set up to monitor ML model performance and create relevant alarms.
- Automated model retraining. This is a huge difference comparing to previous level pipeline. Either when ML model performance goes down to certain predefined threshold, or at certain cadence, the MLOps system can trigger an automated retraining. The new real world data collected so far in production can then be used to retrain ML model to better handle more real world cases. Benefiting from already implemented automated model testing and automated model deployment, the newly trained model can automatically be tested and released to production without any manual intervention.
Level 4 full MLOps pipeline creates a unified ML pipeline with modularized components, instead of isolated components with manual transitions. Level 4 MLOps pipeline changes how we look at ML development and deployment processes. It’s no longer a separate workflow where scientist creates models and engineer deploys them, like the old days when developer team creates applications and operations team deploy and monitor them. Unified ML pipeline simplifies scientists’ workflow and empowers scientists to create, test, deploy, and monitor ML models with fast iterations and confidence.
At this level, the team finds itself running a very efficient and robust MLOps pipeline. ML feature release cycle is significantly reduced and more features are delivered to customers more quickly and safely. Moreover, as the architecture and platform of MLOps pipeline is highly reusable, the team can now put more focus on scaling to more ML based products, without having to start from scratch and go through the slow and painful process of Level 0 again.