System design of an Early Ranker

Towards greater user satisfaction, recommendation quality, reduced latency and compute capacity savings + Sample code

Dec 15, 2023

In the previous post, we discussed what an early ranker is. It receives candidates from generators and then selects a subset for the late ranker. It also tries to remove duplicates. In this post we will discuss alternative designs of early ranking service, compare them and provide a recommendation.

1. Sequential

Fig 1: In a sequential design candidates from all generators are fed to an early ranker service. This is required to deduplicate and optionally rank using a model and select a small set of candidates to send to the final ranker.

2. Partial bypass

Fig 2: The “Partial bypass” early ranker does not wait for all candidate generators. It keeps sending batches of the best (not all) candidates to the final ranking service, and on TimeOut it sends the remaining via an early ranking service. Please find illustrative code here.

Compared to the sequential implementation, this design

saves latency since candidates will start getting estimated using the final ranker earlier. In the sequential design, all candidates will be estimated by the final ranker only after all generators have completed.
should incur the same compute capacity (unless the final ranking module has a flat capacity to batch size curve)
should have higher user satisfaction since it is likely to be snappier.

Please find illustrative code here for this implementation.

3. Whole bypass

Fig 3: In this design, some high precision generators are allowed to bypass the early ranker since their candidates have a high pass through rate and bypassing early ranker is a good latency/compute capacity/user satisfaction tradeoff.

Compared to the “Partial bypass” alternative, this implementation saves latency even more since it allows some candidate generators to completely bypass the early ranker and send their candidates to the final ranker. As described in this seminal paper, candidate generators should not be measured and optimized for recall but a sort of ranking aligned recall. Hence, to bypass early ranker, one could choose generators that have a high precision or ranking consistency with final ranker.

Another way to look at this alternative, without the time axis, is

Fig 4: Another view of “Whole bypass” design

Recommendation

You could start with “Partial bypass” since it might be less complex and might have lower compute capacity usage. You could then experiment with “Whole bypass” once you have a better understanding of your candidate generators.

Disclaimer: These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.

Applied ML | Recommender systems

Discussion about this post