Explanation of Recommendations through Matrix Factorization
Netflix is a popular online streaming platform that offers its subscribers a wide range of movies, documentaries, and TV shows. To improve users’ experience, Netflix has developed a sophisticated recommendation system that suggests movies based on your past viewing history, ratings, and preferences.
The recommender system uses complex algorithms that analyze vast amounts of data to predict what users will most likely enjoy. With over 200 million subscribers worldwide, Netflix’s recommendation system is a key factor in its success and sets the standard for the streaming industry. Following is the source on how Netflix achieved 80% stream time through personalization link.
A recommender system is one of unsupervised learning that uses information filtering to suggest products, or content to users based on their preferences, interests, and behavior. These systems are widely used in e-commerce and online streaming settings, and other applications to help discover new products and content that may be of interest to users.
Recommender systems are trained to understand user and product preferences, past decisions, and characteristics using data collected about user-product interactions.
There are two types of recommendation systems as follows:
Content-based Filtering
The recommendation is based on the user or item attribute as the input to the algorithm. The contents of the shared attribute space are then used to create user and item profiles.
For instance, Spider-Man: No Way Home and Ant-Man and the Wasp: Quantumania have similar attributes as both movies are under the Action/Adventure genre. Not only that, both are part of Marvel. Therefore, if Alice watched Spider-Man movie, a content-based recommendation system may recommend movies with similar attributes like action/Marvel movies.
Collaborative Filtering
Based on several users who have similar past interactions. The key idea of this approach is leveraging the concept of collaboration to produce a new recommendation.
For instance, Alice and Bob have similar interests in particular movies genre. A collaborative filtering recommendation system may recommend items to Alice that Bob has watched previously which is new to Alice since both of them have pretty similar preferences. And the reverse is true for Bob as well.
There is a wide scope of Recommender System model types as shown in the figure below, but today this article will focus on collaborative filtering (CF) with Matrix Factorization
Put simply, Matrix Factorization is a mathematical process that transforms a complicated matrix into a lower-dimensional space. One of the most popular matrix factorization techniques used in recommender systems is Singular Value Decomposition (SVD), Non-negative Matrix Factorization (NMF), and Probabilistic Matrix Factorization
Following is the illustration of how the matrix factorization concept is capable of predicting the user-movie rating
Stage 1: Matrix Factorization will randomly initialize the number, and the number of factors (K) is set. In this sample, we will set K = 5
- User Matrix (green box) represents the association between each user and the features
- Item Matrix (orange box) represents the association between each item and the features
Here, for instance, we are creating 5 features (k=5) to represent the character of m_1 movie: comedy as 2.10, horror as 0.88, action as 0.04, parent-guide as 0.02, and family-friendly as 0.04. And the reverse is true for user_matrix. User_matrix represents the character of user such as prefered actors or directors, favorite movie production and many more
Stage 2: Rating Prediction is calculated from the dot product of User Matrix and Item Matrix
where R as true rating, P as User Matrix, Q as Item Matrix, resulted R’ as predicted rating.
In better mathematical notation, the predicted rating R’ can be represented in the equation as follows:
Stage 3: The squared error is used to calculate the difference between true rating and prediction rating
Once we have these steps in place, we can optimize our parameters, using stochastic gradient descent. It will then compute the derivative of this value
At each iteration, the optimizer will compute the match between each movie and each user by multiplying them using the dot product, then compare it to the actual rating that the user gave the movie. It will then compute the derivative of this value and update the weights by multiplying it by the learning rate ⍺. As we repeat this process many times, the loss will improve, leading to better recommendations.
One of matrix factorization models that have been widely used in recommendation systems is known as Singular Value Decomposition (SVD). SVD itself has broad applications, including image compression, and noise reduction in signal processing. Additionally, SVD is commonly employed in recommender systems, where it is adept at addressing the sparsity issue inherent in large user-item matrices.
This article will also provide an overview of SVD implementation using the Surprise Package.
So let’s get our hands dirty with the implementation!!
Implementation Contents
- Data Import
- Data Pre-Processing
- Implementation #1: Matrix Factorization in Python from Scratch
- Implementation #2: Matrix Factorization with Surprise Package
The complete notebook on Matrix Factorization implementation is available here.
Since we are developing a recommendation system like Netflix, but we may not have access to their big data, we are going to use a great dataset from MovieLens for this practice [1] with permission. Besides, you can read and review their README files for the usage licenses and other details. This dataset comprises millions of movies, users, and users’ past-interacting ranking.
After extracting the zip file, there will be 4 csv given as follows:
Btw, Collaborative Filtering has a problem with user cold-start. The cold-start problem refers to a situation in which a system or algorithm could not make accurate predictions or recommendations for new users, items, or entities that has no prior information. This can happen when there is little or no historical data available for the new users or items, making it difficult for the system to understand their preferences or characteristics.
The cold-start problem is a common challenge in recommendation systems, where the system needs to provide personalized recommendations for users with limited or no interaction history.
In this stage, we are going to select users who have at least interacted with 2000 movies and movies who have been rated by 1000 users (this can be a good way to reduce the size of data and ofc with less null data. Besides, my RAM could never handle massive table)
Actually, you can also use the small subset of 100k ratings which is provided by MovieLens. I just want to optimize my computer resources as much as I can with less null data.
As is customary, we will divide the data into two groups: a training set and a testing set — by utilizing the train_test_split method.
While the information we require is present, it is not presented in a way that is beneficial for humans to comprehend. However, I have created a table that presents the same data in a format that is easier for humans to understand.
Here is the Python snippet for implementing Matrix Factorization with the gradient descent. The matrix_factorization
function returns 2 matrices: nP (user matrix) and nQ (item matrix).
Then, fit the training dataset to the model and here I set n_factor K = 5. Following that, predictions can be computed by multiplying nP and the transpose of nQ using the dot product method, as illustrated in the code snippet below.
As a result, here is the final prediction that the matrix_factorization produce
Prediction on the Test Set
The following snippet leverages the given nP (user matrix) and nQ (movie matrix) to make a prediction on the test set
Evaluating The Prediction Performance
Although there are various evaluation metrics for Recommender Systems, such as Precision@K, Recall@K, MAP@K, and the list goes on. For this exercise, I will employ a basic accuracy metric namely RMSE. I probably will write other evaluation metrics in greater detail in the subsequent article.
As the result, the RMSE on the test set is 0.829, which is pretty decent even before the hyper-tuning is implemented. Definitely, we can tune several parameters like learning rate, n_factor, epochs steps for better outcomes.
In this segment, we opted for the Python library namely the surprise package. A surprise package is a Python library for building and evaluating recommendation systems. It provides a simple and easy-to-use interface for loading and processing datasets, as well as implementing and evaluating different recommendation algorithms.
Data Import and Model Training
Top-N recommendation generator
for UserId: 231832
following is the top 10 movie recommendation list:
m_912, m_260, m_1198, m_110, m_60069, m_1172, m_919, m_2324, m_1204, m_3095
The utilization of Matrix Factorization in modern entertainment like Netflix helps to understand user preferences. This information is then used to recommend the most relevant item/product/movie to the end user.
Here is a summary of the Matrix Factorization illustration that I created, in case I need to explain it to my grandkids one day….
[1] Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872