Declarative Value-Model Tuning

Sep 10, 2024

Code to show a couple of approaches to achieve the desired task importance in value model

2 Comments

Could you also talk a bit about convergence of the approach. what if desired ndcg is not achievable i.e reducing ndcg regret for one component increases ndcg regret for another , how do we ensure the solution will converge ?

the components are highly correlated as well, does this method assume indepdence among different components going into value model ?

Expand full comment

random developer

Sep 11

Few questions :

1) Curios when using slsqp, does the objective fun needs to be differentiable , how would that work when using ndcg. IIRC sqlsp assumes f(x) needs to be differentiable and approximates the linearizations , i am not sure how are we able to use ndcg here

2) Also it seems the problem changed from finding weights to finding desired ndcg regrets or importance to begin with ? how does one come up with those desired target weights ? It seems the desired weights also needs to be personalized based on user and videos

3) What if we want weights to be negative as well to demote click baits or low quality

4) the regret based approach : does it have any relation to scalarization, essentially what it seems to me is we are taking the multiobjective function and scalarizing it : sum ( lambda * MSE(1 - ndcg(task_rank, current_rank) . The part i dont understand is how does this work or converge when ndcg itself is non diffeerntiable.

Expand full comment

Applied ML | Recommender systems

Declarative Value-Model Tuning