Optimal whole page ranking = reward / risk
We show how tech can learn from finance in using risk models for better feed construction of recommender systems.
Are we looking at risk enough in recommender systems?
While it is tempting to equate “risk” with the absence of “reward”, we think we can learn from portfolio construction in finance in how modeling and accounting for risk in action selection leads to an increase in long term user value in a multi-iteration setting.
Illustration of the idea in finance
There is many decades of beautiful mathematics on optimal portfolio construction factoring into risk and reward. In this Colab, we have shown using a simplistic example how instead of allocating inversely proportional to risk leads to:
higher returns as evidenced by the final portfolio being 1.81 times of the normal full stocks allocation.
lower risk as evidenced by lower drawdowns during times of crisis and hence lower risk of the investor having to liquidate.
The above is just an illustration and all the disclaimers that you usually find with things related to financial advice apply here. Based on the personal experience of the authors, there are ways to mess it up and there are ways to deliver 100+ times more value than the above chart as well. Let’s take the basic idea and expand on it in the next section.
High level idea
Take the resources that are limited and make sure to use them optimally.
In financial portfolios, risk is also limited. It’s not just money.
In fact with leverage, which is to a large extent accessible and cheap, the portfolio invested value is not strictly limited to the money that is invested.
Risk is limited: An investor can only stomach a certain amount of it. Hence we see how a portfolio that maximizes reward while containing risk is optimal.
In recommender systems, you have a limited amount of attention, or interest from the user. You are constantly balancing the risk of depleting that resource while trying to deliver value.
Reward, Risk and Regret in financial portfolios
Reward could be the returns or the increase in the portfolio
Risk
short term risk of negative returns
long term risk of stopping to invest altogether. This is all too common. If you speak to an investor who has been personal investing for around 25 years, our experience has been that more than half of them just don’t invest in the stock market since they would have gone through some period of extreme risk where they just got disillusioned with the outcome.
medium term risk of underallocation
Regret
often people are looking to have some play in asset classes or exciting stocks / investments that their friends / peers are invested in. This comes from the fear of missing out. So if you are a portfolio manager and you are not allocating at all to something like say cryptocurrency, you could be incurring the risk of regret from your clients who have friends who are allocated.
Reward, Risk and Regret in recommended feed construction
Reward: This is often related to your definition of business value. For instance, it could be the number of daily active users for your platform, or the total time spent or user activity on your platform.
Risk:
users retaining less (high severity)
exiting from this session on your app (medium severity)
exit from the feed or skip the recommendation (low severity)
Regret
recommendations not capturing some category/topic/creator/job to be done that you consider par for the course. An inspiring way to reduce this risk is calibrated recommendations we think.
Final algorithm for feed recommendations
Instead of ranking items by Expected value / reward, borrowing from the formula in finance, we recommend ranking items for your recommended feed by
A simplistic formulation of this in say a video recommendation system could be
This is similar to what Guanfeng et al. show in Improving feeds by modelling scrolling behavior, i.e. the optimal solution is to rank by probability of reward / probability of ending session:
Disclaimer: These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.
Great write up. In the above construction, when we model for p(exit_app|user, item), how do we decouple favorable exits(user’s attention need was satisfied ) from say unfavorable exits(user’s attention need wasn’t satisfied probably because of bad recommendation )?
Interesting writeup, how should we defined session drop-off? Should it be "app-closed" signal or "product-switch" (where user goes back to search/explore and finds something relevant to consume)? I think it will depend a lot on the product and consumption behaviour but do you have any empirical findings?