Serving of personalized recommendations (system design)

ML in recommender systems #2

In the first article of this series we introduced foundational ideas in personalized recommendations, like how to recommend items to a user based on similarity to their favorite items and based on other users their behavior is similar to. In this article we will cover how to serve a request for a recommended list of items in a personalized manner.

We will primarily cover how to serve the request from a centralized service. A privacy preserving implementation of these models (a.k.a. federated learning), where no user identifiable data leaves the user's device, is out of scope for this note.

Serving a personalized recommendations request

To recap, when a user visits the home page of your app/website you may want to show them a "feed" of content relevant to them. Showing a home page that maximizes user engagement has been a silver bullet in growth for a lot of consumer tech companies.

Before one develops a personalized home feed, one usually starts with an unpersonalized implementation i.e. the recommended content is same for every user. However, personalizing the content has often led to greater user engagement.

Serving in general

Serving any search or recommendation request requires the service to:

  1. Retrieve a set of candidate items using an item-to-item (I2I) or user-to-user (U2U) model or a more advanced model.

  2. Merge candidates from different retrievals if needed.

  3. Retrieve metadata and features for candidates. These features would be needed for ranking and fulfillment (i.e. populating the text / image of the final page).

  4. Rank these retrieved candidates by picking the items ranking highest on relevance. To maximize the net utility of the page, while picking, demote items similar to items already picked above in the page.

  5. (optional) packing of the search result page / grouping results based on a theme (like the packs of results you find from a search on mobile)

  6. (optional) add an explanation / reason per result of why the result might be relevant to the user

Serving of I2I models

Serving an I2I model [Ref previous article for I2I] is one way to do step 1 above. In particular, to serve a user's request one would need to:

  1. Look up the items the user has liked in the past. Use these as seeds in the next step. Note that each of these seeds has a seed_score perhaps reflecting how relevant the seed is to the user right now.

  2. Retrieve the results of the I2I model based on these seeds. Note that each of these results will probably have a score based on how relevant that item is for that particular seed. Also note that the item lists retrieved for different seeds will be non overlapping.

  3. Merge candidates from different seeds and remove duplicates. This is because a good interface for the return value of each retrieval is a list of candidates and scores without any duplicates. This makes Step #4 in "Serving in general" easy.

and then steps 3 to 5 from "Serving in general".

Data accesses required

If you scan all the steps above, there are three steps above where you will see us accessing a data store, not including the "optional" steps:

  1. In step #1 of Serving of I2I models, we are looking up the items the user has liked in the past. Since this is specific to the user this is probably from a key value store with the user-id as a key and the value could be a proto / JSON. Let's call this key-value store "U2D-KV". Note that for your recommendations to not be stale you will probably need a datastore that is fairly real time, in the sense that the delay between a user liking an item and U2D-KV returning it when queried should be low.

  2. In Step #2 of Serving of I2I models, using items as seeds we are looking up candidate items computed by the I2I retrieval model. A fast way to do this would be a key value store with item ids as keys. Let's call this I2I-KV. I2I-KV would also need to be updated periodically with every new run of the I2I model inference. I2I-KV is probably updated less frequently than U2D-KV.

  3. In Step #3 of Serving in general we retrieve metadata and features for candidates. This is again a non personalized key-value store with item id as the key. Let's call it I2D-KV.

The overall latency of your recommendation system will probably be dominated by the latency of these three data accesses.

Serving of U2U models

Serving a recommendation request based on a U2U model [Ref previous article for U2U ] is another way to do step 1 of the above. One would need to:

  1. Look up the users this user is most similar to. Let's call this a U2U-KV. This key-value store is keyed by the user id of the user who initiated the request. The returned list of users will be used as seeds in step 2. U2U-KV will need to be updated periodically by an inference of your U2U model.

  2. Using these userid seeds in U2D-KV, retrieve the items the other influencer users have liked along with a relevance score. Again these lists of items might have overlaps, as in the same item could be liked by two users the user is similar to.

  3. Merge these item lists to produce a list of items and scores without duplicates.

This was originally posted on Linkedin.

Disclaimer: These are my personal opinions only. Any assumptions, opinions stated here are mine and not representative of my current or any prior employer(s).