Models that once did well may start to deteriorate. This is how you could stay on course.
There’s only one constant in the universe — CHANGE!
Everything you see and sense was a version of itself a moment ago — the data you used to fit your models are no exception.
A perfect model over time fails to predict as it used to. It’s not a bug; that’s the way ML models work. It’s the ML engineer’s job to expect it.
An ML model in production is very much the same as whitewater rafting. Worse yet — it is a river no one had traveled before. You must have a sense of the current to expect what’s ahead. Ignoring the dangers may lead your raft to a cliff.
Translating it to MLOps, you must actively track the underlying data changes before your model starts falling off a cliff — before they make inaccurate predictions.
This phenomenon is popularly known as the “Concept drift.”
Since not all the changes are unique, handling them requires different strategies. For instance, social behavior, such as divorce rates, shows sluggish changes. On the other hand, stock markets often get shocks creating and destroying millionaires overnight.
And there are also cyclical changes. The change you experience now will likely reverse in the coming years.
Your domain may show one of these types of change. And you should be prepared before they occur.
Here are three techniques ML engineers often use to fight concept drift.
Retraining is a straightforward concept to understand. In every cycle, you get a new set of data. You can use them for training the model once again from scratch.
Given the simplicity, you may be tempted to use it often. But, retraining is a costly operation. You benefit from retraining only when your models are cheap and fast to train.
You could take one of three approaches to optimize the cost and training time. You can either retrain the model with…
- your new data only;
- all your past data, or;
- the full dataset with more weight on the latest data points.
Retraining models with recent data only.
Though abandoning old data and retraining entirely on new data may sound counterintuitive, they are helpful in many situations, especially if you’re sure that old data are obsolete and only create noise in a model.
Short-term financial market predictions are a good example. Because there are numerous bot trading accounts, if there’s a pattern, soon all bots learn them. As a result, the pattern quickly becomes obsolete. Your model may have to forget the outdated patterns and find new ones in your new dataset.
This way of retraining is less costly as it involves little data. But the model loses the context from its prior learnings.
Go for it only when your old data is undoubtedly irrelevant.
You should be aware of model overfitting when using a smaller dataset. While only the most recent dataset may seem a cheap and fast way to retrain, if your model has a lot of parameters, it may overlearn the data points.
Retrain on all your past dataset
You can’t discard what you already have when you need to persist in the knowledge. Therefore, you may have to retrain the model with all your past data.
Weather forecasts, for instance, go through changes for sure. We talk about climate change more seriously for this reason. But the transition is gradual. Your model can benefit from past data while learning the changing patterns from new data.
The downside of this training approach is the cost. It’s no surprise with more data points, you incur more costs — both for storage and training.
But it may be inevitable if your model has numerous parameters. To avoid overfitting, your dataset has to be sufficiently large.
Retrain with more weight on recent data points
You can use this approach when your dataset has to be large enough, yet you want to account for the aging dataset.
You could use weights when you need to retrain the model on your entire dataset yet want to give more importance to the latest. You can assign more weight for a newer data point and lower the weight for older ones.
But for this to work, your algorithm should support data point selection. If not, you could sample the dataset with more weight towards the latest.
This may sound similar to the previous strategy. But there’s a subtle difference. Here we don’t retrain the model from scratch as we did before. We update the already trained model with our new dataset.
Your algorithm should support initial weights and batch training. If you’re using a deep neural network, it’s a good fit.
Updating models may be more cost-efficient than retraining from scratch. But it also depends on the size of your model and the input data size. This may not be an excellent option if each update has plenty of data points and your model has millions of parameters.
On the flip side, this method is suitable if you must persist in past knowledge and adapt to new changes.
Based on the two previous strategies, it may be beneficial to retrain only the variable component of a model.
More advanced techniques such as transfer learning and model ensemble make this possible.
In transfer learning, you take a pre-trained neural network for retraining. Instead of retraining the entire model, you unpack only the network’s last layer. This is often called unfreezing. Now you train the previous layer with your new data. The model does not forget everything it learned before. Yet most recent data play a critical role in the final decision.
You could also use a new model to refine your prediction. We call such groups an ensemble of models. While your initial model stays untouched, there will be a new model between. The new model learns from new data and alters the behavior of the original model.
You’ve read it correctly.
Doing nothing means you train a model and use it for a prolonged period. It is also known as static models.
Static models are attractive for a couple of reasons.
You need some yardstick to measure anything. A static model allows us to identify and drift as they happen. For data drift, this static model is your yardstick.
Also, as you make changes to your existing model, you need a benchmark to evaluate. Static models are a perfect target as well.
Yet, you must be careful with static models. If you’re using one in production, you should be sure that the data and model drifts don’t affect the model too much.
If you think the job of a data scientist or ML engineer is over when the model is deployed, you’re making a risky bet.
It’s just the beginning.
Models in production deteriorate in performance. It’s known as concept drift, a widely researched challenge in machine learning.
Depending on the situation, you can use various techniques to ensure the continuous best performance of your models.
In this post, we’ve discussed five strategies often used by ML engineers to fight concept drift.
You can retrain, update or extend the model! Each has its benefits and drawbacks.
What would you do when your models fail to predict accurately?
Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission referring.