7 Comments
Jun 4Liked by Gaurav Chakravorty

Great write up. In the above construction, when we model for p(exit_app|user, item), how do we decouple favorable exits(user’s attention need was satisfied ) from say unfavorable exits(user’s attention need wasn’t satisfied probably because of bad recommendation )?

Expand full comment
author

You are spot on. In addition to a formulation like E[Reward] / E[Risk], we also have to improve our estimation of risk and reward. I agree with you that p(exit_app) could be after a satisfied experience. I have often found myself close a video app after watching a video. I think the direction you are getting at is that a better bad_exit event could be exit && video_watch_time < 5 seconds or something ? I am down with that. This can be empirically validated.

Expand full comment
May 19Liked by Gaurav Chakravorty

Interesting writeup, how should we defined session drop-off? Should it be "app-closed" signal or "product-switch" (where user goes back to search/explore and finds something relevant to consume)? I think it will depend a lot on the product and consumption behaviour but do you have any empirical findings?

Expand full comment
author

Good question.

We can't share empirical findings of course :) but at a high level what I can say is that each of these signals:

- user skips item

- user moves to other tab

- user closes app

have different implication on different metrics and different "risk" levels in terms of different rewards.

app-closed is clearly the biggest risk here but you will find that it has higher occurrence at top position. In some of the illustrative configs in https://github.com/gauravchak/risk_aware_feed_construction/tree/main/value_model_configs I have shown an example of this. For instance in https://github.com/gauravchak/risk_aware_feed_construction/blob/main/value_model_configs/separate_top_position_inverse_exit_conditional.json I recommend using app closed at top position and skip in other positions.

In general I think risk = function (app-closed, product-switch, skip, position, expected time for user to come back to app a.k.a. session frequency)

What do you think? Would love to learn your intuition as well.

Expand full comment
May 20Liked by Gaurav Chakravorty

Makes sense, In my work I usually transform all the different risk and reward in terms of sample weightage that are a function of item-meta and contextual-features (eg. positions of the content). But this idea of predicting both point-wise risk and reward for all the candidates is pretty interesting. Is there any good reference I can follow on this topic? Thanks

Expand full comment
author

Is this what you are saying is

```

composite value = w1 * risk + w2 * reward

(w1, w2) = f ( item features, context/user features )

```

That makes sense. While our formulation has a different form (reward / risk), if you also have access to the batch mean risk and reward it might be possible to make what you are saying similar.

I did not go into this but your approach of making (w1, w2) a function of inputs is pretty powerful. The multi-task fusion paper goes into this as well.

> Is there any good reference I can follow on this topic?

Not that I know of. These are empirical learnings.

Expand full comment
May 20Liked by Gaurav Chakravorty

Yes, along the same lines. It's just that if are considering both risk and reward both tasks will have their own output heads. Learning the sample weightage might be a strong idea, have to look at the math.

Expand full comment