System design of an Early Ranker
Towards greater user satisfaction, recommendation quality, reduced latency and compute capacity savings + Sample code
In the previous post, we discussed what an early ranker is. It receives candidates from generators and then selects a subset for the late ranker. It also tries to remove duplicates. In this post we will discuss alternative designs of early ranking service, compare them and provide a recommendation.
1. Sequential
2. Partial bypass
Compared to the sequential implementation, this design
saves latency since candidates will start getting estimated using the final ranker earlier. In the sequential design, all candidates will be estimated by the final ranker only after all generators have completed.
should incur the same compute capacity (unless the final ranking module has a flat capacity to batch size curve)
should have higher user satisfaction since it is likely to be snappier.
Please find illustrative code here for this implementation.
3. Whole bypass
Compared to the “Partial bypass” alternative, this implementation saves latency even more since it allows some candidate generators to completely bypass the early ranker and send their candidates to the final ranker. As described in this seminal paper, candidate generators should not be measured and optimized for recall but a sort of ranking aligned recall. Hence, to bypass early ranker, one could choose generators that have a high precision or ranking consistency with final ranker.
Another way to look at this alternative, without the time axis, is
Recommendation
You could start with “Partial bypass” since it might be less complex and might have lower compute capacity usage. You could then experiment with “Whole bypass” once you have a better understanding of your candidate generators.
Disclaimer: These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.