Recommendation systems are critical to the success of today’s online commercial platforms.
On some of the largest commercial platforms, recommendations account for as much as 30% of the revenue. A 1% improvement in the quality of recommendations can translate into billions of dollars in revenue.
I’ve tried the more traditional content-based and collaborative filtering based recommendation systems. There are also models based on deep NN networks using RNN, CNN or the attention mechanisms. Large scale recommendation systems are not easy to build, and are often constrained by memory capacity. Most of the resources online about recommendation systems are about the models. I was curious about the system design of a large scale production system when I discovered the talk by Dr. Even Oldridge who leads the Nvidia’s recommendation engine team (Merlin).
Let’s take a look together:
The following diagram displays a holistic view of the system’s components included in a recommendation system.
Recommender systems are trained using data gathered about the users, items, and their interactions, which include impressions, clicks, likes, mentions, and so on.
The front end displays the recommended results and is where the user interactions occur. These data are then logged and fed to the data lake for the regular process of model exploration and training.
One of the biggest challenges for building a recommendation system is the volume of data. In commercial applications, oftentimes there are millions of users and millions of items. It’s hard to compare every user to every item. So a typical solution is narrowing down the candidates pool before using a scoring model to rank the results.
It’s just too expensive to compute for all users across all items. Therefore, the first component in the system is a candidate generation and retrieval system.
Models used for candidate generation can be an approximate nearest neighbor model where you have embeddings that represent user interest and match to the embedding space of the items.
Other methods include a graph-based recommendation system where you can do a graph walk to traverse the graph randomly to understand the patterns of interactions.
Once you build the model, the next question is how do you evaluate the system to have confidence that your system is ready to be deployed. The model should perform as expected and so should the input/data for the model before and after deployment. Model evaluation is an essential step in the system design to ensure model robustness. We covered more on this topic in our previous post about Core Trust Principals in Machine Learning where we introduced the concept of “ML competence”. In addition, the consistency in feature transformation across environments is critical to ensure the “dual view” of data match at the training and servicing times.
Recommender systems also differ from other ML systems in that there are new users and items coming in and you need to decide what to do with them. The systems are not retrained constantly for new data. There are certainly techniques such as popularity-based recommendation for new users that can solve this problem. But eventually these data need to be processed and passed to data lake and ML feature stores for model retraining.
Things get more complicated when the use case requires more frequent retraining. How to keep the model and features up to date requires more interaction caching rather than the normal data lake path. In addition, the model analysis and evaluation step becomes even more critical to the success of frequent online retraining and servicing.
That’s just the lifecycle of one of such models.
In a real-life commercial use case, there may be potentially tenth and hundreds of these models working together so it’s not an easy undertaking to productionize a large scale recommendation system.
Let’s check Dr. Even Oldridge’s explanation on this (Tip: click on “watch full video” to hear a bit longer):
And we worked hard on our data science!