Using online learning in short-video recommendations

Applied ML #15 | Part 1 ML Design of a short-video recommendation platform

In this note, we will discuss potential design considerations of building recommendations for short-videos (eg: Tiktok, Instagram Reels, Facebook Reels, YouTube Shorts). This is part-1 of the three part series of articles on the topic. This part focuses on learning which of the videos being created are likely to be hits.

Due to the unique characteristics of short-video platforms, the platform needs to be very good at finding lots of fresh good videos. Hence it encourages creation of millions of videos every day. Let’s forget for a second that we have a basic categorization of all recommendable videos and we will come back to it later.

Let’s say the goodness of a video is just the probability of it being watched by the user (and not swiped up before 10 seconds).

We could estimate P-watch(video v) = NumWatches(v) / NumRecs(v) i.e. the number of times the video was watched as a fraction of the number of times it was recommended.

However this estimate of P-watch might have a lot of variance early on, when the platform has shown the video to very few users. There are three popular approaches to this quandary, also described as a multi-armed bandit problem:

  1. Almost Greedy (a.k.a. epsilon-greedy): Choose the video with the highest expected P-watch most of the time but sometimes choose a random video to show the user.

  2. Upper confidence bounds: Choose videos with the highest upper confidence bound of P-watch.

  3. Thompson Sampling: Choose videos with probability proportional to the current estimate of the expected P-watch

This article is a good exposition of these approaches.

For our specific problem all of these methods if applied verbatim might be suboptimal.

  • They may result in too many low quality videos since the estimate of P-watch for some videos with very few watches might be too optimistic.

  • The methods that require some randomness to be added per call, are hard to be implemented with a static index.

Proposed solution

Divide the set of videos into k buckets with decreasing variance of P-watch, i.e.

  1. All new videos go into the first bucket. They will all have the highest variance by definition.

  2. As variance decreases, videos go to the next bucket and so on. Variance decreases mostly due to number of times a video is recommended.

  3. For each user, we select a set of videos from each bucket. To further enhance user experience, we could allocate more to videos already known to be of high quality. For instance, we could select 80% of the videos from the last bucket, which has the videos with most recommendations / views and the remaining 20% from other buckets.

  4. Model P-watch as a time-series so that we can predict lower P-watch for older videos (if that is indeed the trend).

Fig 2: Online learning of quality videos - Videos are added to the first (top most) bucket. As a video crosses a certain number of complete watches it is queued to be added to the next bucket. on being added to a bucket a video is added to a random slice of videos of that bucket. At serving time, for every video being served to a user, a bucket is selected with a probability equal to “probability of selection” (hyperparam) of that bucket. To reduce computational cost, from each bucket, videos could be served from a small set of say 3 slices that their userid hashes to.

Further Improvements:

  1. The rate of picking from each bucket for a user could be personalized.

    • For users who have consumed a lot in a session, we can increase the rate of picking from non-best buckets.

    • For new users we are looking to activate and bring back again, we should skew more towards videos from the best bucket.

  2. Probability of picking from different buckets can be regulated based on the inventory condition. If inventory of high quality fresh videos is running low, or if a larger than average number of videos are being created, then the probability of selection of non-best buckets should be higher.

References and further reading

  1. Recommender system algorithms that give new creators a fair shot

  2. Introduction to multi-armed bandits

  3. Multi armed bandits and its solutions

  4. A tutorial on Thompson Sampling

  5. Personalized approach to bandits in recommendations

  6. Interfaces that help machine learning (short-video platforms have optimized the interface to help machine learning)

  7. TDS: A previous article on this topic that talks about online learning

Coming next

We will discuss a detailed ML design of recommending short-videos (optimization criteria in a multi sided ecosystem of creators, users and platform) in subsequent articles. Stay tuned!

Disclaimer: These are my personal opinions only. Any assumptions, opinions stated here are mine and not representative of my current or any prior employer(s).