Personalization at Bluesky
The past, present, and future of personalization of the Discover feed
At Bluesky, we are building an open foundation for the social internet, where anyone can create a feed, such as the Science feed, For You feed by spacecowboy, or GLAMS feed. We also aim to provide a great default Discover feed. This post discusses personalization of the Discover feed, from historical attempts to current deployment, and a path forward inspired by Pinterest’s work. If interested, come work with me at Bluesky!
As the first MLE at Bluesky, I initially attempted a two tower model but it failed to converge, possibly due to insufficient data or being a poor fit for Bluesky’s short-lifetime items and skewed interaction distributions. Bluesky was (and still is) a small team, so I couldn’t spend forever debugging this issue. Instead I switched to building a system that would generate post embeddings based on the content of a post, with the idea that I could build a personalization system on top of that.
Currently, posts are embedded using BLIP2, a variant of CLIP, which powers our topic models (27 topics users select during onboarding). While this topic model is accurate it is also quite broad, which hurts the user experience. I’ve also run HDBSCAN over a sample of the post embedding space to generate ~600 clusters which provide finer grained grouping of content. By measuring a user’s interaction with content from these clusters or topics we have a rudimentary personalization system that can help users find content they might be interested in.
My goal is to substantially improve personalization of the Discover feed. After reviewing papers, I chose to investigate techniques from Pinterest, specifically their PinnerSage paper. This choice was based on budget fit, simplicity, avoiding extensive fine-tuning, and the requirement to treat user and post representations separately. There are a lot of similarities between the papers published by Pinterest and Twitter, but I choose to use the Pinterest papers because they’ve continued publishing, providing a path to utilize more advanced models as the ML team at Bluesky grows.
Bluesky is hiring!
Speaking of growing the team, are you a mid-senior MLE with experience in recommender systems? Do you want to join a team laying the groundwork for how ML will operate at a fast growing social media platform? Do you want to increase your scope of work? Want to experiment with new, unconventional ideas? Think distributed social media is the way of the future? Then come work with me at Bluesky!
PinnerSage
Published in 2020, PinnerSage addresses the issue of a single user preference embedding failing to capture a user’s full range of interests, especially short and long-term interests. It does this by generating several (10-100) user preference embeddings via an offline path (last 90 days) and an online path (today’s interactions). This resulted in a 2% increase in user engagement propensity and a 4% increase in engagement volume in online A/B tests.
How it works
PinnerSage is a rather simple approach to the problem, with intentional design choices that match my own. They specifically mention that item embeddings should be fixed, which is a requirement for me.
Step 1: Cluster User Interactions
First, for a given user they take the last 90-days of item interactions (i.e. action pins) and gather the item embeddings. Next they cluster these embeddings using Ward clustering, generating a ‘small number’ (10-100) of clusters for a user. Their specific Ward implementation is based on the Lance-Williams algorithm, and has a complexity of O(n^2) where n is the number of items being clustered.
Step 2: Calculate the Medoid
Second, for each cluster, a medoid—an actual member of the cluster that minimizes the sum of the squared distances with other members—is calculated. This simplifies deployment by allowing Pinterest to reuse existing pin infrastructure.
Step 3: Importance Scoring
Finally, they calculate a user-cluster importance score. Since a user can have 10-100 clusters we need a way to choose which clusters to use during retrieval. They use a simple time decay average model:
lambda is a hyper-parameter that controls recency, with 0 ignoring time effects and 0.1 emphasizing recent interaction. Pinterest found 0.01 to be a good balance.
With these three steps we now have a set of per-user interest medoids (i.e. pins) and weights for how much a user interacts with those pins.
Integrating with your Recommender System
Applying this to retrieval is fairly straightforward. The medoids can be sampled, weighted by importance, and used as candidate sources for an ANN-based candidate generator. Pinterest sampled up to 3 medoids at a time, and applied additional (though unspecified) filtering to remove near duplicates and poor quality candidates.
One weakness with PinnerSage is the difficulty in using these user preference embeddings during ranking. Traditionally you create a feature for each item that is the similarity of that item’s embedding and the user preference embedding. With PinnerSage there are anywhere from 10-100 preference embeddings for each user, so it is unclear which of these embeddings you should choose. You could try using all of them, and taking the max score for a given item and the user preference embeddings, but this is expensive to do at runtime (i.e. 100 embeddings x 1,000 items = 100,000 ops). Another option is to take a weighted average of the user preference embeddings to combine them into a single user-preference embedding, but this naive approach will likely result in loss of accuracy due to smearing the users preferences.
The difficulty of integrating multiple user preference embeddings into ranking was a key motivator for PinnerFormer (Pancha et al., “PinnerFormer.”), which Pinterest developed to generate a single user preference embedding using Transformers to better capture user interests. We will discuss PinnerFormer in a future blogpost.
Short Term Interests & Item Embeddings
Earlier we alluded to an online system that captures short term interests. An event-based streaming system captures short-term interests by performing the same clustering and importance estimation steps on the twenty most recent actions since the last batch job. These results are combined with the batch results.
One thing not discussed in this paper is how item embeddings are generated. At the time of publication (2020), Pinterest used a sophisticated graph based embedding model called PinSage (Hamilton et al., “Inductive Representation Learning on Large Graphs.”). At BlueSky we are using BLIP2 to generate post embeddings. If you don’t already have an item embedding model then you can’t deploy PinnerSage.
Conclusion
This blog post presented an overview of PinnerSage, a clustering based approach to generating user preference embeddings while keeping item embeddings fixed. I also discussed a brief history of personalization at Bluesky, and provided my motivation for investigating PinnerSage. My current plans are to implement PinnerSage as a candidate generator, then move to PinnerFormer to generate a single user preference embedding for ranking. As we make progress on various parts of the stack we will share our work.
Bibliography
Pal, Aditya, Chantat Eksombatchai, Yitong Zhou, Bo Zhao, Charles Rosenberg, and Jure Leskovec. “PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest.” Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 23, 2020, 2311–20. https://doi.org/10.1145/3394486.3403280.
Hamilton, William L., Rex Ying, and Jure Leskovec. “Inductive Representation Learning on Large Graphs.” arXiv:1706.02216 [Cs, Stat], June 7, 2017. http://arxiv.org/abs/1706.02216.
Pancha, Nikil, Andrew Zhai, Jure Leskovec, and Charles Rosenberg. “PinnerFormer: Sequence Modeling for User Representation at Pinterest.” arXiv:2205.04507. Preprint, arXiv, May 9, 2022.http://arxiv.org/abs/2205.04507.



