When Data Science started getting traction, people were talking mainly about algorithms and the ultimate goal was to have ML model no matter how and where it was trained. The resulting model was then like a hot potato — very few knew how to use it, even fewer knew how and where to deploy it, monitor it etc. Fortunately, these days are gone and ML engineering today equipped with MLOps practices resembles more building production pipelines where all stages are automated and ready to deliver new model versions on demand.
One of the important stages of MLOps pipeline is the so-called Decision Gate, which decides whether the newly trained model performs better than the one that is currently in use. The main goal of this article is to show how to build such a Decision Gate using Vertex AI Experiments.
Just for this article, imagine that you are a new member of Data Science team for telco company. Your goal is to take Data Science practice to the next level and build MLOps foundations. You spent your first week talking to other team members and learnt that the team is quite proud of their first ML model built to predict customer churn. You met the person who did the coding to learn that the model was trained on Jupyter Notebook with Sklearn. Once a month he starts that Notebook on his laptop, trains new model and uses the new model to build a list of customers that are expected to churn. He puts that list into a text file and sends it to customer service team. This process worked like that for the last 10 months but model properties and its performance characteristics were not logged. Moreover, it turned out that once the model is trained it stays on the laptop.
You were also introduced to the product manager of CRM system. He would like to tighten cooperation with Data Science team and have possibility to call ML Models through REST API. His first use case is to be able to display the most current probability of churn on customer profile page whenever there is a new communication open with customer service consultant.
Not a worst starting point but there is some room for improvements. You know already that there is nothing like ML platform in place yet so let us give you a brief intro into Vertex AI.
When it comes to Machine Learning Google is perceived as gold standard with world-class research groups like Google Brain, Google Research and Deep Mind, successful deployments of ML at scale with Google Search, Google Translate and multiple contributions like TensorFlow, Kubeflow, Kaggle, Colabs, TPUs, Kubeflow, BERT, T5, ImageNet, Parti, LaMDA, PaLM. And that’s just to name a few.
Given all the above — Google’s Machine Learning platform named Vertex AI available as a service on Google Cloud seems to be at least worth checking out.
One thing that really makes Vertex AI different from other similar platforms is that it is fully serverless — you get access to Google’s ML Infrastructure with all kinds of accelerators like GPUs and TPUs with NO NEED to manage servers, virtual machines, kubernetes clusters and NO NEED to install and upgrade any software.
Second thing worth mentioning is that Vertex AI is an end-to-end MLOps platform. What does it mean? It means that it is a single place where you can manage features, label training samples, run trainings using your favorite ML frameworks, execute automated hyperparameter tuning, implement and execute MLOps pipelines, register your models in Model Registry, run serverless batch prediction jobs, deploy your models as REST endpoints and have support for model monitoring and explainability of predictions.
Why you should care about all these? Well, this will help you a lot if you are really planning to scale your Data Science practice by standardising the way your teams build solutions based on Machine Learning.
Last thing is that Vertex AI will allow you to build fully custom models but will also give you access to many state-of-the-art architectures designed by Google and available as AutoML training. AutoML means that all you need to do to have a model is to prepare and label training data and then run Vertex AI AutoML training which will handle feature engineering, check many architectures and do hyperparameter tuning for you. Quite powerful mechanisms to at least build reference models.
Every ML model training happens in some context, where context is literally anything that can give more details about training and the resulting model: algorithm name, details about input data, hyperparameters applied for this particular training, model performance metrics etc. In fact the key part of applied machine learning is experimentation and for each business problem that you are going to try to solve with machine learning, you will need to experiment with many different features, ML architectures and hyperparameters before concluding that you have a model that can address the problem. Each such experiment has its own context and if you would be able to snapshot that context and persist it then this would help you to compare distinct experiments, track model evolution etc. Vertex AI Experiments module is designed just for this use case and it works like a serverless database for snapshots of experiment contexts with a convenient UI and SDK to collect and manage context details.
Now getting back to your new role. First thing you decided to do is to baseline current state. You asked your colleague to take the most recent version of ML Model for customer churn from his laptop and register it in Vertex AI Model Registry.
Sounds like a small change but it already opens quite some number of possibilities like:
- you have a single place where you have all your ML Models that can be accessed at any time regardless of your team holiday calendars.
- you can run batch prediction job on Vertex AI serverless compute infrastructure (possibly with accelerators like GPU or TPU) to build a list of customers that according to that model are about to leave. This means you are no longer dependent on you laptop computing capacity.
- you can deploy your model as REST API Endpoint to serverless Vertex AI Endpoints and be able to handle ad-hoc HTTPS prediction requests from other components.
Then you want to automate all the steps that are needed to prepare necessary data, do feature engineering, run model training and deploy the new model. In other words — instead ending up with the model you end up with a scripted recipe — a pipeline that is able to train and deploy the new model on-demand.
And then you hear this question: but which algorithm should we use for our use case? Churn prediction is a binary classification problem which means that given input data the model has just two options: Yes(1) or No (0). Many existing algorithms can be used here, starting from Random Forrests and Logistic Regression to more complex Deep Neural Networks. Some people say that competition is always a good thing. It forces us to do our best. This applies to ML as well. Instead of focusing on a single algorithm why not to choose few of them and let those models compete. We will call these models challengers. The best challenger will then compete with the current champion — the model that is currently deployed and serves requests in production. Decision on which challenger is best and then whether best challenger is better than current champion will be handled by a tasks called Decision Gate.
Our Decision Gate needs to take two decisions:
- select the best challenger
- decide whether the best challenger is better than the current champion
Both decisions need some deterministic metric of what means better. It is quite intuitive to assume that better model is simply the one that has better performance metrics and when it comes to binary classifiers we typically use AUC ROC which stands for “Area Under the Curve” (AUC) of “Receiver Characteristic Operator” (ROC).
The goal then it so collect the value of that metric for every challenger and make it available to Decision Gate. And this is where we will use Vertex AI Experiments. Every model training registers model parameters and metrics using Vertex AI Experiments SDK in the context of the so-called experiment run. Every experiment run has unique name (id) and creation timestamp. We can therefore expect as many experiment runs as the number of challengers we want to train. All those experiment runs belong to the same parent experiment identified by its name, e.g: here we have three experiment runs that belong to the same experiment named telco-churn-mlops.
Experiment resembles a database table where every record represents a different experiment run (with the exception that in general every experiment run can have a different list of parameters and metrics). This “table” will therefore help us track the evolution of all the models trained by all executions of our pipeline. How it helps? if there is a new member of your Data Science team that would like to learn more on how the model performed historically your will just ask that person to check the corresponding experiment in Vertex AI.
However, you may ask the following question: Experiment contains all experiment runs that took place so far, while in our Decision Gate we want to compare just metrics collected for challengers trained within the same pipeline execution. How can we identify experiment runs corresponding to the same pipeline execution? This is a valid question. One of the options we have to group experiment runs corresponding to the same pipeline execution by logging additional parameter representing given pipeline execution. In our case this parameter will be named: training_set (we train multiple challengers that belong to the same training set).
Last question then is how do we identify experiment run corresponding to the current champion? Current champion (if exists) represents the best challenger trained during one of the previous pipeline executions and therefore its experiment run should have different value of training_set parameter.
Which means that to identify experiment run of the current champion we need to use another trick. That trick requires us to label our champion model registered in Vertex AI Model Registry with the corresponding experiment run name.
Both tricks will help us collect experiment runs corresponding to our challengers and current champion which is all we need to select the best challenger and then decide if the best challenger is better than current champion. If the latter is true, our Decision Gate should enable downstream tasks in our pipeline which should first register the best challenger in Vertex AI Model Registry as new version and then deploy it to Vertex AI Endpoint.
Vertex AI comes with Pipelines module which is a serverless execution environment for MLOps pipelines built using Kubeflow Pipelines SDK v1. 8.9 or higher, or TensorFlow Extended v0. 30.0 or higher. What does it mean for you? You don’t need to manage any ML infrastructure — you just delegate execution to Vertex AI. If you have your own Kubeflow clusters then you should be able to just take your Kubeflow pipelines and run them on Vertex AI.
In this article we will code MLOps pipeline using Kubeflow SDK.
Our pipeline consists of the following steps:
- Read data from BigQuery table and stage it on Google Cloud Storage.
- Read staged data from Google Cloud Storage and do feature engineering. Materialise results back to Google Cloud Storage.
- Train challengers in parallel
- Decision Gate
- Deploy best challenger if it is better than current champion
We will discuss some coding details below, but in the end when the pipeline is coded and executed on Vertex AI you should be able to dive into pipeline execution details and visualise the corresponding runtime graph as follows:
Every task is executed as container and when you double click on it you will be able to browse the corresponding execution logs. Our pipeline trains three challengers (tasks: train, train-2, train-3). Once all are trained the pipeline will run gate task which represents our Decision Gate.
When it comes to the code, Kubeflow pipeline is a plain python function annotated with @pipeline. You create dependencies between tasks, injecting output variable from the previous tasks as input variable into downstream task:
staging_task = stage(in_bigquery_projectid,……..
feature_eng_task = preprocess(staging_task.output, ……
Pipeline function has quite some number of input parameters. As a result this pipeline will work as a template.
Every pipeline task is coded as python function annotated with @component. Thanks to this annotation, train function will be executed as container from base image (python 3.7) with additional packages installed as specified on the packages_to_install list.
Lines 35–38 are where we use Vertex AI SDK to instantiate connection to Vertex AI Service. Please note that we need to specify, GCP Project, GCP Region and experiment name. If experiment does not exist yet — it will be created.
Line 42 is where we start new experiment run.
Line 77–82 and 84–90 are where we build two maps of key, value pairs for parameters and metrics, respectively which we want to register within that experiment run. We have parameter for training_set, model type (svm, random forrest, decision tree), location of the input file on Google Cloud Storage and location on Google Cloud Storage where we save our trained model (Lines 59–60). Metrics on the other hand describe our model performance and we decided to log accuracy, precision, recall, logloss and auc roc. The last one will be used by our Decision Gate.
Vertex AI SDK comes with two rather straightforward methods to persist parameters and metrics: log_params and log_metrics, respectively (lines 92–93).
Our Decision Gate task is coded as gate function.
Line 25–28 is where we use Vertex AI SDK to initiate communication with Vertex AI.
Line 31–46 is where we check Vertex AI Model Registry to see if champion model exists. You can imagine that first execution of this pipeline should not find champion model. Vertex AI Model Registry works as a registry for all ML Models not just the one for customer churn. Therefore when we scan that registry we look for models that are tagged with the name equal to the name of our experiment. In Line 35 we build filter string to account for this tag and then ask Vertex AI Model registry for models that satisfy this filter on Line 38–40.
If champion model does exist, then on Line 46 we read its label representing the experiment run id. This is the unique key that will help us read parameters and metrics registered in Vertex AI Experiments for this champion model.
Line 50 is all that is needed to read all experiment runs for all experiments in your Vertex AI ‘instance’. In other words — if experiment is like a table and experiment runs are like table rows then Line 50 will fetch union of all those tables that were created within given vertex AI instance. This is way to much and that is why in Line 51 we apply first filter to the resulting data frame which will help us work only with experiment runs from experiment (table) corresponding to our model. This filter is based on experiment_name column of experiment_df data frame. Of course this is still more than we need so in Line 54 we apply yet another filter which will filter out all experiment runs that do not represent challengers trained within this pipeline execution. As explained in Decision Gate design section we use training_set parameter for this (therefore we have param.training_set in filter expression). All challengers trained within the same pipeline execution should have the same value of training_set parameter.
So at this stage we collected everything that will help us identify the best challenger.
Line 59–60 get experiment run with its parameters and metrics corresponding to current champion model (if it exists).
Line 62 is where we instantiate auxiliary variable which represent the name of metric we want to use when deciding which model is better. In our demo we use metric.model_auc_roc.
OK, we are ready for semi-finals — let our challengers compete!
Lines 64–67 are used to find experiment run corresponding to the best challenger (with max AUC ROC).
When champion model does not exist in Model Registry then the best challenger becomes the champion (Line 84–86) — there is not need for grande finale.
If champion does exist, then in Line 77 we put it in privileged position assuming that the new champion must be better than the current one — otherwise we we leave our production system untouched.
Line 80 represents our great final where we compare AUC ROC metrics of current champion and best challenger. This decision gate expression can be as simple as it is in our article but you can code multiple conditions here and make it way more complex if you only need to (e.g. use data collected from the date of last pipeline execution as validation set to measure performance of current champion on the new data).
Our gate function returns NamedTuple object which will be used by downstream tasks. If is_current_champion field in this tuple is set to True then Condition task will stop the pipeline execution without executing downstream tasks. Otherwise it will enable the last step which registers the new champion in Vertex AI Model Registry and deploys it as REST Endpoint using serverless hosting infrastructure behind Vertex AI Endpoints.
With Vertex AI your team is very close to establishing factory of smart microservices where new models are trained and delivered by reliable pipelines for downstream consumption. All your models are versioned and registered in a central Vertex AI Model Registry. Each model has its own history that is fully auditable with Vertex AI Experiments. As we show in this article, Vertex AI Experiments can also be used as a source of metrics for Decision Gate of automated MLOps pipelines. I think your team is ready to meet with CRM Product Owner again and plan new innovations to be built in your ML Factory.
Please clap for this article if you enjoyed reading it. For more details on google cloud-based data science, data engineering, and AI/ML topics follow me on LinkedIn.