Summary of the main algorithms of the recommendation system and examples of Youtube's deep learning recommendation algorithm

Summary of the main algorithms of the recommendation system and examples of Youtube's deep learning recommendation algorithm

Collaborative filtering

Collaborative filtering (CF) and its variants are one of the most commonly used recommendation algorithms. Even a beginner in data science can use it to build their own personalized movie recommendation system, for example, for a resume project.

When we want to recommend something to a user, the most logical thing to do is to find users with the same hobbies as him/her, analyze their behavior, and recommend the same things to them. Or we can focus on things similar to what the user has bought before and recommend similar products.

There are two basic methods of collaborative filtering (CF): user-based collaborative filtering technology and item-based collaborative filtering technology.

The recommendation algorithm consists of two steps in each of the above scenarios:

1. Find how many users/items in the database are similar to the target user/item.

2. Evaluate other users/items to predict the rating you would give to a user of related products, given the total weight of users/items that are more similar to a product user/item.

What does "most similar" mean in this algorithm?

What we have is a preference vector for each user (columns of the matrix R), and a vector of user ratings for each product (rows of the matrix R).

First, keep only the elements whose values ​​are known in both vectors.

For example, if we want to compare Bill and Jane, and we know that Bill has not seen Titanic and Jane has not seen Batman, then we can only measure their similarity by Star Wars. How can someone not watch Star Wars, right? (Smile)

The best way to measure similarity is to measure the cosine similarity or correlations of the user/item vectors. The next step is to fill the empty cells in the table using a weighted arithmetic average based on the similarity.

Matrix Factorization for Recommendations

Another interesting approach is to use matrix decomposition. This is an elegant recommendation algorithm because usually when we decompose a matrix, we don’t think too much about which items in the rows and columns of the resulting matrix will be retained. But when using this recommendation tool, we can clearly see that u is a vector about the interests of the i-th user, and v is a vector about the parameters of the j-th movie.

We can then estimate x (the rating given to the jth movie by the ith user) by taking the dot product of u and v. We build these vectors using known ratings and use them to predict unknown ratings.

For example, after matrix decomposition we obtain Ted’s vector (1.4; .9) and movie A’s vector (1.4; .8). Now we can restore the rating of movie A-Ted simply by calculating the dot product of (1.4; .9) and (1.4; .8), and the rating result is 2.68.

Clustering

Previous recommendation algorithms were simple and applicable to small systems. And until now, we have still conceived the recommendation problem as a supervised machine learning task. Now is the time to use unsupervised methods to solve such problems.

Imagine that we are building a large-scale recommendation system, in which collaborative filtering and matrix decomposition should take longer. The first assumption is clustering.

In the early stages of a business, there is often a lack of prior user classification, and clustering is the best method.

But if used alone, clustering is a bit weak, because in fact what we are doing is actually identifying user groups and recommending the same things to every user in this group. When we have enough data, it is a better choice to use clustering as the first step, which can reduce the selection of relevant neighbors in the collaborative filtering algorithm. It can also improve the performance of complex recommendation systems.

Each cluster is assigned a representative preference based on the preferences of the users belonging to that cluster. Each group of users in a cluster receives recommendations calculated at the cluster level.

Deep Learning Methods for Recommender Systems

In the past decade, the development of neural networks has made great leaps. Now they are being used in a variety of applications and are gradually replacing traditional machine learning methods. Below I will show how deep learning methods are used in Youtube.

Needless to say, building a recommender system for such a service is a very challenging task due to its large scale, the ever-changing corpus, and various unobservable external factors.

According to the research on "Deep Neural Networks for YouTube Recommendation System", the YouTube recommendation system algorithm consists of two neural networks: one for candidate generation and the other for ranking. If you don't have enough time, I will give you a brief summary here.

Using the user's history as input, the candidate generation network significantly reduces the number of videos and can select a set of the most relevant videos from a large corpus. The generated candidate set is the most relevant to the user, and the purpose of this neural network is only to provide a broad personalization service through collaborative filtering.

At this step, we have a smaller number of candidates that are closer to the user's needs. Our goal now is to carefully analyze all the candidates so that we can make the best decision. This task is accomplished by the ranking network, which assigns a score to each video according to a desired objective function that uses data to describe the video and information about user behavior.

Using a two-stage approach, we are able to make video recommendations from a large video corpus, but we can be sure that only a small number of these recommendations are personalized and actually applied by users. This design also allows us to mix results generated by other resources with these candidate results.

The recommendation task is like an extreme multi-class classification problem, where the prediction problem becomes a problem of accurately classifying a specific video (wt) in a class (i) among millions of videos in a corpus (V) based on the user (U) and context (C) at a given time t.

Important points to note before creating your own recommendation system:

  • If you have a large database and you want to use it for online recommendation, the best way is to divide this problem into two sub-problems: 1) select the top N candidate results, 2) rank them.
  • How do you measure the quality of your model? In addition to standardized quality indicators, there are some specific indicators for recommendation problems: Recall@k, Precision@k. You can also take a look at the most descriptive indicators of recommendation systems.
  • If you are using classification algorithms to solve recommendation problems, you should consider generating negative samples. If a user buys a recommended item, you should not add it as a positive sample, nor should you treat the rest as negative samples.
  • Consider online and offline scoring of your algorithm’s quality. A model trained only on historical data can produce simplistic recommendations because the algorithm will not know new trends and preferences in the future.

<<:  State determines view - thinking about front-end development based on state

>>:  From Shallow Models to Deep Models: An Overview of Machine Learning Optimization Algorithms

Recommend

Wedding photography industry, Mayu APP promotion and distribution skills!

With the arrival of summer, the wedding photograp...

Godot runs on Android

The latest Godot 3.5 Beta 3 development version i...

Soul Product Analysis

Socializing with strangers is something that many...

New trends in video marketing in 2022

For every seller doing overseas marketing, the to...

How does the promotion dog celebrate the Chinese Valentine's Day?

Oh my god, I was really blown away. It turns out ...

How to develop and build WeChat Mini Programs with zero basic knowledge?

WeChat mini programs are applications that can be...

Apple’s new feature is gone in the iOS 17 beta version!

Recently, the topic of "Apple will prevent u...

It turns out that the Double Eleven copywriting can be written like this

In addition to promotions, how else can you write...