Logging | How to build self-learning search and recommender systems

Server side and client side logging needed to build recommender systems, ads and search systems

Let’s look at why logging is important in recommender / search / ads systems and what to log, how to log, and what to do after logging.

Fig1: The figure above shows that when serving the query the backend logs the query_id along with the results sent by the backend, in this case ads, and all the features associated with the results. On a click the client logs a unique click_id, the query_id and id of the result/ad that was clicked. Later these two logs are joined to generate data for modeling and analytics (Sourced from Fig 1 of the Google Photon paper, which talks about the complexities of joining query and click events to generate correct training data and analytics for ads.)

To whet your appetite, and to get a gist of the article, see the image above from Google’s Photon paper. It shows why logging is absolutely critical for Google ads.

Why is logging important?

  1. Accounting / Attribution : The Photon paper above, for instance, is motivated by a core business problem of showing advertisers what queries their ads were shown in and in which ones they were clicked. In search systems, such logging enables us to find the queries we are not doing well on, the ones where the top results need improvement.

  2. Ranking improvements : The operative word of machine learning is “learning”. For the (result) ranking model to improve it needs to improve upon usage. See section below for details.

  3. Launch decisions : To be able to see what is working and which experiments are launch worthy.

In this article we will focus on ranking without loss of generality to other use cases.

What’s a ranking model and why do we need logging to train it?

Modern ML systems are dynamical systems, they are constantly learning and changing. Scalable feature and feedback logging is a means to support that learning.

Most successful recommender systems today have a two-stage process (see Fig 2 above). Retrieval refers to the pink boxes that are in parallel trying to get the results that could be relevant. Ranking refers to the light green boxes in the bottom right that are trying to order the 100 or so retrieved results at serving time.

Fig 3: Figure above shows an example ranking model that uses video features to produce a number of estimates of goodness of the this video item for the user/query. These estimates are then used to create an ordered list of results. Ranking is a like pure function that takes in all the features and comes up with these scores.

Schematic of what to log server side and client-side

message Query {
  string query_id = 1;
  string query_text = 2;
}

The search / recommendation query should provide a query id.

import "result_features.proto";
// ^ the features used by the ranking model

message QueryResultInternal {
  // a unique identifier for the item being
  // returned. result_id does not depend on
  // the query.
  string result_id = 1;

  // url below is being used generally to denote
  // what the user is looking for.
  string url = 2;

  // ranking features
  ResultFeatures features = 3;

  // ranking outputs, i.e. estimates of objectives
  // being optimized.
  ResultPropensities propensities = 4;
}

message QueryResponseInternal {
  // same query id as was provided in the request
  string query_id = 1;
  repeated QueryResultInternal results = 2;
}

The ranking module should send results with the features that were used by the ranking model to compute the ranking scores. These are logged by the server-side logging module.

message QueryResult {
  // a unique identifier for the item being
  // returned. result_id does not depend on
  // the query.
  string result_id = 1;

  // url below is being used generally to denote
  // what the user is looking for.
  string url = 2;

}

message QueryResponse {
  // same query id as was provided in the request
  string query_id = 1;
  repeated QueryResult results = 2;
}

The server-side logging module, would then strip out the features since they are not needed by the client and also for privacy and security concerns. The results sent to the client would only contain ids and what is needed to show them to the client.

message Click {
  // same query id as was provided in the response
  string query_id = 1;

  // a unique identifier for the item being clicked
  string result_id = 2;
}

The client-side logger would only need to log a click event with the original query id and the result that was clicked.

Joining the three above we can produce the required training data.

Let’s assume the user clicked on the third result. A positive example can thus be made from the query_text and the ranking features of the result_id corresponding to the Click event.

Results above what was clicked can be assumed to have been considered but not clicked. From these we could generate negative examples. These would similarly be the query_text and the ranking features of the result_ids corresponding to the results above the result_id in the Click event. The only difference from positive examples is that the label for these data points is 0, i.e. a non-click.

Training data

Doing what we did above we are able to generate in a scalable, and in exactly-once semantics (read Photon for reference) if needed, training data to update the ranking model.

Deepak Agarwal describes the above as “loose coupling between frontend and backend data tracking” in his tutorial on building real-life recommender systems.

Joining is also covered in Real-time data processing at Facebook.

Training serving skew

Is this necessary? Can there be another way? What if we don’t log the features? Could we have generated them later when we are modeling? Yes, but that introduces a risk knows as training-serving skew. That means we could be training models with very different features than they would have been if recorded at the time.

Quiz

  1. Why is it important to log query_id in the click event?

  2. Which process generates the query_id event and how would you make sure it is globally unique?

  3. How does the Photon paper deliver exactly once semantics with this paradigm?

  4. How would you want to relax exactly once, in an effort to make the ranking model learn real time? (an example of online learning is the ads ranking setup described here)

Hope this helps and come join us in learning together

If you want to discuss/teach/learn about applied ML, please join our Discord community here!

References

  1. Photon: Fault-tolerant and scalable joining of continuous data streams

  2. Tutorial: Lessons learned from building real-life recommender systems

  3. Real-time data processing at Facebook

  4. Rules of ML : training-serving skew

Disclaimer: These are the personal opinions of the author. Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s).