Optimal whole page ranking = reward / risk

May 11, 2024

We show how tech can learn from finance in using risk models for better feed construction of recommender systems.

7 Comments

Jun 4, 2024

Great write up. In the above construction, when we model for p(exit_app|user, item), how do we decouple favorable exits(user’s attention need was satisfied ) from say unfavorable exits(user’s attention need wasn’t satisfied probably because of bad recommendation )?

Expand full comment

Reply (1)

Gaurav Chakravorty

Jun 8, 2024

You are spot on. In addition to a formulation like E[Reward] / E[Risk], we also have to improve our estimation of risk and reward. I agree with you that p(exit_app) could be after a satisfied experience. I have often found myself close a video app after watching a video. I think the direction you are getting at is that a better bad_exit event could be exit && video_watch_time < 5 seconds or something ? I am down with that. This can be empirically validated.

Expand full comment

Avnish Kumar

May 19, 2024

Interesting writeup, how should we defined session drop-off? Should it be "app-closed" signal or "product-switch" (where user goes back to search/explore and finds something relevant to consume)? I think it will depend a lot on the product and consumption behaviour but do you have any empirical findings?

Expand full comment

Reply (1)

Gaurav Chakravorty

May 19, 2024

Good question.

We can't share empirical findings of course :) but at a high level what I can say is that each of these signals:

- user skips item

- user moves to other tab

- user closes app

have different implication on different metrics and different "risk" levels in terms of different rewards.

app-closed is clearly the biggest risk here but you will find that it has higher occurrence at top position. In some of the illustrative configs in https://github.com/gauravchak/risk_aware_feed_construction/tree/main/value_model_configs I have shown an example of this. For instance in https://github.com/gauravchak/risk_aware_feed_construction/blob/main/value_model_configs/separate_top_position_inverse_exit_conditional.json I recommend using app closed at top position and skip in other positions.

In general I think risk = function (app-closed, product-switch, skip, position, expected time for user to come back to app a.k.a. session frequency)

What do you think? Would love to learn your intuition as well.

Expand full comment

Reply (1)

Avnish Kumar

May 20, 2024

Makes sense, In my work I usually transform all the different risk and reward in terms of sample weightage that are a function of item-meta and contextual-features (eg. positions of the content). But this idea of predicting both point-wise risk and reward for all the candidates is pretty interesting. Is there any good reference I can follow on this topic? Thanks

Expand full comment

Reply (1)

Gaurav Chakravorty

May 20, 2024

Is this what you are saying is

```

composite value = w1 * risk + w2 * reward

(w1, w2) = f ( item features, context/user features )

```

That makes sense. While our formulation has a different form (reward / risk), if you also have access to the batch mean risk and reward it might be possible to make what you are saying similar.

I did not go into this but your approach of making (w1, w2) a function of inputs is pretty powerful. The multi-task fusion paper goes into this as well.

> Is there any good reference I can follow on this topic?

Not that I know of. These are empirical learnings.

Expand full comment

Reply (1)

Avnish Kumar

May 20, 2024

Yes, along the same lines. It's just that if are considering both risk and reward both tasks will have their own output heads. Learning the sample weightage might be a strong idea, have to look at the math.

Expand full comment

Applied ML | Recommender systems

Optimal whole page ranking = reward / risk