In an MLOps project, it doesn’t matter how high the accuracy is as long as the model is not running in a live environment. As a data scientist, the job doesn’t end with just developing models. Even 80% of the model development phase is spent dealing with data, and despite this, some companies may request data science, data analysis, mlops, and data engineering parts from a single person.
After developing the model, it is time to deploy the model. There are three known ways to do this. These are; Online, Streaming, and Batch methods.
a. Kafka + Spark Streaming
a. Batch predictions to excel, csv, or DB.
Only 22% of the developed models are used and 4% of the models used can be transferred to the production environment. Only 22% of the developed models are used and 4% of the models used can be transferred to the live environment. The reasons for this failure are not only technical but also organizational culture, organizational structure, low level of awareness and knowledge, assuming that everything is over by model deployment.
MLOps is actually a concept derived from DevOps. MLOps does not refer to any tool or software. It brings together the best practices of Machine Learning, DevOps, and Data Engineering.
We do these four operations as follows;
- CI = Continuous Integration; When any developer commits the slightest change, if this integration is automatically reflected in the production environment, it is called CI.
- CD = Continuous Delivery; It covers the part from the repository to the product environment.
- The part where MLOps differs from DevOps is continuous training and continuous testing parts. There is no such need in the DevOps part of continuously training the model. Because there is no ML model, there is no such thing as retraining the model, and the software works in a deterministic way. In the test part, it is necessary to test whether the model works or not. Therefore, continuous testing is required.
In summary, DevOps for Machine Learning projects is called MLOps.
Who will do the MLOps job?
ML Engineers do DevOps work in the ML part. Of course, these vary according to the wishes of the companies. An ML Engineer is someone who is competent in designing, building, testing, and maintaining production software applications that integrate AI services and incorporate a data science model.
MLOps is pipeline oriented. Deploy the pipeline, not the model.
The meanings of these terms mentioned above are as follows;
CI (Continuous Integration): Validation of data, not just code and model.
CD (Continuous Delivery): Delivery part of the delivery pipeline.
CT (Continuous Training): Models automatically re-train and monitor.
MLOps has maturity level. The level of automation mainly determines the maturity level.
- Level 0
- Level 1
- Level 2 (Full automation)
Level 0 (NoOps):
There are hardly any jobs at level 0. We can also talk about Level 0 as follows;
- Data Scientist hands over model as artifact.
- Static offline dataset usage, manual data collection.
- Manual model deployment or sometimes scripting.
- No frequent new releases.
- No CI/CD
- There is only prediction service.
- No performance Monitoring, no testing.
Level 1 (Automated Training):
If it is a little more automated;
- There is a working pipeline for data collection.
- Data Scientist doesn’t work isolated, a team member.
- Source code are modularized. No notebooks.
Level 2 (Full Automation):
- CI, Pipeline CD
- Data validation in progress
This is how it is to talk about MLOps briefly. Have a good reading.