
Discover more from Applied ML | Recommender systems
ML Design Interview | Design a short-video platform Part 2
Applied ML #16 | Part 2 ML Design of a short-video recommendation platform
In this note, we will discuss potential design considerations of building recommendations for short-videos e.g. Tiktok, Instagram Reels, Facebook Reels, YouTube Shorts.
This is part-2 of the series.
In the first article of the short-video series, “Using online learning in short-video recommendation”, we showed an MVP implementation that:
learns to give every video a fair shot at popularity
maximizes the inventory of fresh good videos so that users can find high quality videos every time they open the app
is a scalable system with serving latency under 50 milliseconds (99 percentile) and independent of the number of videos.
However, we identified various avenues of improvement, personalization and creator-side value to name a few. Before we dive into those, let’s take a step back and approach short-video recommender systems the way should start every ML design problem, i.e. from the objectives.

Objective
Entertain / Inform / Educate users
Help creators find audience for their videos, and optionally help creators in monetizing their work. However, monetization is out of scope of this ML design exercise.
Optimizing criteria
Users find content they were looking for:
Engagement:
Number of video-watches (either completely or for more that 10 seconds).
Total-watch time: Historically video platforms have set watch-time as Goal (Youtube since 2012). However on short-video platforms, watch-time is not very different than number-videos-watched (>10 sec) since all videos are roughly around 10 - 20 seconds on the platform.
Session-success-rate: Percentage of watch sessions with more than 100 seconds of watches. This could be considered the equivalent of Click-through-rate (CTR) in a user experience that immediately starts playing videos. If the UX is similar to Youtube where the user is recommended videos and they have to explicitly start watching a video, then page-level long-click CTR can be a better measure of session-success. Here long-click
Satisfaction:
Number of videos liked last month
Number of users who have liked videos last month
Like-rate on the platform: Number of thumbs-up / Number of videos watched
Dislike-rate: Number of thumbs-down / Number of videos watched
At-risk-rate: Number of users who have dislike-rate > 10% in the last week. This could be a leading indicator for churn-rate.
Responsibility:
Percentage of videos marked inappropriate / offensive should be less than 0.1%
Creators find audience for great content
Creator engagement:
Number of creators who have created videos last month
Number of videos created last month
Creator quality:
Like-rate weighted number of creators who have created videos last month
Platform
Retention:
Monthly active users
Daily active users
L30 i.e. the number of days users are active in a month
Churn-rate: perhaps defining 20 video watches a month as an active c user. If using revenue based metrics of churn, one could use watch-time as a proxy of revenue.
Inventory:
Number of highly liked / watched videos. This is especially useful for new users. Perhaps the like rate or watch rate should be measured on the past month so that older videos with declining like-rates are excluded.
Number of topics/categories for which the platform has highly liked/watched videos. This can help point to gaps in the inventory, and to incentivize creation of new content.
The categorization between user, creator and platform values could be considered loose. Some of these criteria measure more than one type of value.
Guardrails
Latency:
The first video should start playing in under 500 milliseconds (P99). That would imply a roughly 250 ms (P99) for the video recommendation.
Subsequent video plays probably should have slightly tighter latency requirements.
Availability: Virtually available all the time, including degraded service if needed.
Recency: How soon should videos start getting recommended after they have been uploaded.
Consistency:
If a video is deleted or marked inappropriate how soon is it taken down from recommendations
If a video is shown to a user it should not be shown to them in future sessions
Coverage: Healthy inventory across:
languages
countries
topics / genre
Estimation
In this section we will outline some hypothetical assumptions that might help us in making tradeoffs between different model types
Number of monthly users on the platform: 100 million
Number of active creators: 5 million
Number of videos created per day : 10 million
Ideally there should be a discussion of the monthly budget available to run the platform. This should inform decisions like freshness and caching of recommendations.
API
The service provided:
syntax = "proto3";
message RecsRequest {
string userid = 1;
int32 page_number = 2;
int32 result_per_page = 3;
int64 request_id = 4;
}
message RecsResponse {
message Result {
string video_url = 1; // to show on app
string video_title = 2; // to show on app
string video_stats = 3; // to show on app
}
int64 request_id = 1; // needed to join with FE logs
repeated Result results = 2; // ranked recommendations
}
service RecsService {
rpc GetVideos(RecsRequest) returns (RecsResponse);
}
Feature engineering / User expectations
To motivate the features we might think of using in the scoring model, we should look at expectations of different stakeholders in the system.
New users expect to see interesting videos without entering an interest profile.
Returning users expect to see high quality new videos every time they open the app.
Creators expect their videos to be shown to users who are interested in such content.
New amazing creations should be shown to users even if the creator has not created popular videos in the past.
Showing users videos they could mimic could encourage video creation.
Available data
Logged data of user events like
syntax = "proto3";
message UserVideoEvent {
enum EventType {
Watch = 1;
Like = 2;
Share = 3;
Comment = 4;
}
EventType evt_type = 1;
string userid = 2;
Timestamp evt_time = 3;
}
Static video features
syntax = "proto3";
message Video {
string video_id = 1;
string creator_id = 2;
string title = 3;
string thumbnail_url = 4;
string content_url = 5;
}
Dynamic video features
syntax = "proto3";
message Video {
string video_id = 1;
int32 watch_count = 2; // number of times watched >= 10s
int32 rec_count = 3; // number of times recommended
}
Partner teams, infrastructure and collaborations
For this design, we might assume the availability of:
product engg support
design
UX research
a taxonomy of categories of videos
a set of labelers who could categorize new videos based on this taxonomy. However the bandwidth of this team might only scale to 1% of the videos.
a video transcription service
a team that collaborates with the labeling team to build an ML driven video categorization service, such that virtually all videos that are eligible to be recommended come with categories auto-labeled.
ML infrastructure for training of two-tower models like Tensorflow recommenders.
Embedding based retrieval like Vertex Matching Engine (based on ScaNN an algorithm that handles multiple millions of queries per second at Google).
ML infrastructure to get predictions of custom-trained models (like Vertex AI or Sagemaker)
Logging of user interactions on app.

Coming next - Detailed ML design
Now that we have a framework for measuring quality, in subsequent articles we will build on improvements like:
Separate videos into bucket-flow per category, learn the categories a user is interested in.
What if categories overlap? What if videos were wrongly categorized? How can we use our manual categorizing team efficiently? … i.e. how to maximize user relevance via embeddings.
Filtering, diversity, novelty, serendipity.
Encoding creator side value in the ranking function.
References and further reading
Using online learning in short-video recommendations (Platform Value)
Giving new creators a fair shot through recommender systems (Creator Value)
Two-tower models for recommendations (Efficient self-learning retrieval)
Vertex Matching Engine for fast retrieval of vectors given a query vector.
For other ML Design interview discussions …
This article was written in an ML design interview format. This format can also be helpful when writing an ML design doc for an internal project.
If you want to discuss more ML design questions, please join our Discord community here!
Disclaimer: These are the personal opinions of the author. Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s).
(photo credits [1] [2])
ML Design Interview | Design a short-video platform Part 2
Hi gorav ! I need your help.