Using online learning in short-video recommendations
Applied ML #15 | Part 1 ML Design of a short-video recommendation platform
In this note, we will discuss potential design considerations of building recommendations for short-videos (eg: Tiktok, Instagram Reels, Facebook Reels, YouTube Shorts). This is part-1 of the three part series of articles on the topic. This part focuses on learning which of the videos being created are likely to be hits.
Due to the unique characteristics of short-video platforms, the platform needs to be very good at finding lots of fresh good videos. Hence it encourages creation of millions of videos every day. Let’s forget for a second that we have a basic categorization of all recommendable videos and we will come back to it later.
Let’s say the goodness of a video is just the probability of it being watched by the user (and not swiped up before 10 seconds).
We could estimate P-watch(video v) = NumWatches(v) / NumRecs(v) i.e. the number of times the video was watched as a fraction of the number of times it was recommended.
However this estimate of P-watch might have a lot of variance early on, when the platform has shown the video to very few users. There are three popular approaches to this quandary, also described as a multi-armed bandit problem:
Almost Greedy (a.k.a. epsilon-greedy): Choose the video with the highest expected P-watch most of the time but sometimes choose a random video to show the user.
Upper confidence bounds: Choose videos with the highest upper confidence bound of P-watch.
Thompson Sampling: Choose videos with probability proportional to the current estimate of the expected P-watch
This article is a good exposition of these approaches.
For our specific problem all of these methods if applied verbatim might be suboptimal.
They may result in too many low quality videos since the estimate of P-watch for some videos with very few watches might be too optimistic.
The methods that require some randomness to be added per call, are hard to be implemented with a static index.
Divide the set of videos into k buckets with decreasing variance of P-watch, i.e.
All new videos go into the first bucket. They will all have the highest variance by definition.
As variance decreases, videos go to the next bucket and so on. Variance decreases mostly due to number of times a video is recommended.
For each user, we select a set of videos from each bucket. To further enhance user experience, we could allocate more to videos already known to be of high quality. For instance, we could select 80% of the videos from the last bucket, which has the videos with most recommendations / views and the remaining 20% from other buckets.
Model P-watch as a time-series so that we can predict lower P-watch for older videos (if that is indeed the trend).
The rate of picking from each bucket for a user could be personalized.
For users who have consumed a lot in a session, we can increase the rate of picking from non-best buckets.
For new users we are looking to activate and bring back again, we should skew more towards videos from the best bucket.
Probability of picking from different buckets can be regulated based on the inventory condition. If inventory of high quality fresh videos is running low, or if a larger than average number of videos are being created, then the probability of selection of non-best buckets should be higher.
References and further reading
Interfaces that help machine learning (short-video platforms have optimized the interface to help machine learning)
We will discuss a detailed ML design of recommending short-videos (optimization criteria in a multi sided ecosystem of creators, users and platform) in subsequent articles. Stay tuned!
Disclaimer: These are my personal opinions only. Any assumptions, opinions stated here are mine and not representative of my current or any prior employer(s).