In the first two articles in this series on personalized recommendations, we covered foundational models (user-user, item-item) and the system architecture for serving of personalized recommendations.
So where are we now? We have built a baseline personalized recommendation system ("recsys") , and we are trying to figure out how to improve it. We will see the sort of feedback we get from live usage and the sort of feedback we can get from surveys. That will explain why the industry separates search and recommendations into two stages: retrieval and ranking. (Ref: paper on Youtube's two stage recommenders)
Implicit feedback from users
Let's say a user comes to the app and scrolls the entire home feed but does not find the content they are looking for and well close the app. That is call a recall loss. That means the items retrieved for the user just do not contain what they were looking for.
Another type of loss can be that the user comes to the app and scrolls and does find the content they were looking for but just a little lower down in the feed like say the 8th spot. That means there was a possibility of the recsys making the user's experience better by ranking this result at the top. This is called a ranking loss. A popular metric for this is nDCG.
The image above is from this research paper (Fig 2) and shows the now universally used two-stage recommender system.
The reason this is called implicit feedback is that we are trying to infer what the user has in mind from their actions. We will talk about explicit feedback from actual user surveys later in this article.
How to get data to train retrieval and ranking from implicit feedback.
To train retrieval: Imagine you have a user who comes to the app, doesn't find anything on the feed, then searches an item and finds what they were looking for. This is a great way to figure out what was missing in the set of retrieved items. Even if you have not implemented a recommender system yet and just have a search interface in your app, you could be collecting data to train your retrieval model.
To train ranker: Imagine the user sees a feed and selects the fifth item in the list. This shows us that the first four items should have been ranked lower in the list. We can train our ranker with this sort of data. (This is a super cool blog on ranking by Chris Burges where we talks about about training the ranking module.)
Explicit feedback from users
Wouldn't it be great if you could show a set of items to a user and ask them their opinion of each? Perhaps you could get a 1-5 rating from the user about how much they would like that item. Please see figure below for an example (source).
This 5 point rating is what many recommender systems developers use to train their initial rankers. The benefit of using rating data to train the ranker of a personalized recommender system is that since retrieval is already personalized, chances are that we already have decently good candidates for the user. Hence it is likely that the information derived from the user's ratings is very high. We are not just randomly picking items and showing them to the user I mean. In my experience a good personalized ranker based on an ensemble of boosted trees can be trained from such explicit feedback with as low as under 10,000 data points and it could generalize well to millions of users. This sort of sample efficiency is hard to achieve with implicit feedback.
Explicit feedback is really useful in early stages
In the early stages of building a recommendation system, explicit feedback can be very useful. In fact, even before you have the UX nailed down and before you have launched the service, if you have a survey system that is capable of delivering personalized recommendations to the user and asking their feedback, then you could train a model from it. This model would be able to rank videos higher that are likely to receive 5 star ratings from the user.
When the system is mature though, implicit feedback from actual usage has more information and makes explicit feedback redundant. You might still find value in using explicit feedback to measure aspects that the UX does not easily convey like measuring which content should be blocked for trust and safety reasons for instance.
Recap
In this article, we talked about the two types of losses in a recommendation / search system : retrieval loss and ranking loss. We talked about the types of feedback we can get to train both modules. We hinted at the efficiency of using explicit feedback, especially after a good baseline retrieval system has been built.
PS: This course by Google might be a good source of information on the terminology used in these articles.
This was originally posted on Linkedin.
Disclaimer: These are my personal opinions only. Any assumptions, opinions stated here are mine and not representative of my current or any prior employer(s).