Improving the Quality of Recommended Pins with Lightweight Ranking

821
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Poorvi Bhargava | Software Engineer, Homefeed Recommendations, Sen Wang | Software Engineer, Homefeed Recommendations, Andrew Liu | Tech Lead, Homefeed Recommendations, Duo Zhang | Engineering Manager, Homefeed Recommendations


The Pinterest corpus is composed of billions of Pins, however, each Pinner only sees a small subset based on their interests when browsing their home feed or other recommendation surfaces. How do we provide these recommendations to each person?

Introduction to Pixie: a key recommendation system at Pinterest

Pixie is one of Pinterest’s major recommendation systems used for fetching relevant Pins. Pixie is composed of a bipartite graph of all Pins and boards on Pinterest. Starting at a Pin that has recently been interacted with, we perform multiple random walks along the graph to generate thousands of similar Pins for that Pinner. These Pins are sorted by “visit count”, or the number of times it was “visited” during the random walks. After Pins are fetched from Pixie, they’re sent to Pixie’s clients for further personalized ranking.

As a major recommendation source, Pixie generates over 75 million Pins per second and powers multiple downstream clients, including important surfaces like the home feed, recommendations feeds, email notifications, search, and ads.

How does Pixie fit into the Pinterest recommendation pipeline?

Like other recommendation systems, we use a two-step approach to narrow down billions of Pins to select the best few for the Pinner.

Figure 1. Number of Pins at each stage of the current Recommendation Funnel

The first step, “Candidate Generation”, is a recall-driven step that aims to efficiently fetch a set of broadly relevant Pins. Pixie is one such candidate generator. This step uses recent user engagement to formulate an input representative of the Pinner’s interests. The input Pin is used to fetch similar Pins, which are quickly scored based on simple heuristics (in Pixie’s case, this is the visit count score), with additional boosting logic applied for specific business needs. The Pins with the highest score are then passed to the next step of the funnel.

The second step, or the “Full Ranking” layer, aggregates and ranks all of the recommendations fetched from multiple candidate generators. It consists of a precise, yet complex, neural network that uses user, Pin, and context features to accurately predict how likely a Pinner is to engage with the candidates. Largely due to model complexity, this step is usually quite costly and time-consuming, so we rank a limited number at this step. The highest ranked Pins are then shown to the user.

Making Pixie recommendations even more personalized

Despite its great success across multiple major product surfaces at Pinterest, Pixie currently faces several challenges that limit it from generating even more relevant content:

  1. Pixie’s “visit count” score sorts the generated pins purely on graph structure; it does not take any user preferences into account. At Pinterest, we’ve built a bunch of features that represent user interest and pin information that may help improve personalization.
  2. Ad-hoc business needs (such as promotion of local content) are not incorporated easily and require adding to the existing web of boosting logic.

Both of these challenges can be elegantly solved by replacing the existing visit count scoring and boosting layer within Pixie with a machine learning model.

Machine learning models, however, can be expensive and time-consuming to run in production. Since Pixie recommends more than 75 million Pins per second and lies on the critical path of multiple recommendation surfaces, it is crucial that adding a model would not significantly increase Pixie’s latency. Additionally, since Pixie candidate generation is followed by a more thorough full ranking step on the client side, we can afford to use a simpler model and trade off some precision for efficiency. This type of “lightweight” ranking (in contrast to the “heavyweight” full ranker) is often instituted in industry to improve personalization earlier on in the recommendation funnel. Lastly, since Pixie powers a wide range of major surfaces at Pinterest, it is important to keep in mind that the design should be flexible, scalable and extensible to support multiple client needs across the company. Thus, the main goals for this lightweight ranking model for Pixie are to:

  1. Recommend Pins with better relevance and personalization to the user, especially compared to those from the existing “visit count” solution.
  2. Build an efficient machine learning model to support scoring 75 million Pins per second.
  3. Build a scalable and extensible multi-tenant framework that can be easily applied to support different Pixie clients.
  4. Add flexibility to be able to stress certain types of pins based on the client surface’s ever-changing business needs.

Figure 2. Recommendation Funnel after adding a Lightweight Ranking Step

Building a multi-tenant lightweight ranking system

In practice, machine learning-based recommendation systems consist of multiple key components, such as a training data pipeline and model training strategy. Here, we will focus on the components that required major design decisions which allowed us to build a multi-tenant lightweight ranking system for Pixie.

Creating a training dataset extensible to multiple clients

The success of a machine learning model heavily relies on the training data. To generate this data, we joined the labels from Pinners’ explicit engagement with Pin and user context features.

Figure 3. Logging Pipeline Design: logging at the front end stage versus at the serving stage.

One common practice for logging training data is to log at the “front end stage”. This means that we pass feature data to the front end and only store data for pins seen by users. The main benefit of logging at the front end stage is savings on storage space, since we do not need to store any data unseen (and therefore, unlabeled) by the user. The vast majority of data, especially at the lightweight scoring level, is unseen by the user.

However, since one of the project’s major design goals was to build an extensible pipeline for multiple clients, we decided to use another approach in which we log directly at the “serving stage”. This involves storing feature data for all candidate pins immediately generated by Pixie, including that of unseen Pins. This avoids each client having to pass features to the front end and set up their own logging infrastructure. Additionally, the unseen data helps us define training optimization strategies unique to lightweight ranking, as described below.

How to train and optimize a lightweight ranking model

Since the goal of lightweight ranking is to efficiently score a large number of Pins, we started by training a low-complexity XGBoost GBDT model. We included the most important user and Pin features from each of the client full rankers, but also included features unique to Pixie, such as graph and input pin features. The feature set is shared across all Pixie client surfaces.

Since our lightweight model is followed by a more precise full ranking step downstream, the major design choice during model training came in choosing the model optimization strategy. A full ranker usually optimizes for predicting user engagement. However, because there is an additional layer of ranking following lightweight ranking and unseen data is available to us, we could also choose to optimize our model for (a) trying to explicitly mimic the results of the full ranker through a technique known as Model Distillation or (b) passing through the full ranker by improving the “funnel efficiency”.

Model distillation refers to the process of training a smaller “student” model to reproduce the behavior of a larger “teacher” model as accurately as possible, but using fewer parameters. This involves “distilling” knowledge from the teacher model to the student model. Funnel efficiency optimization is a less explored, but related concept. It aims to maximize the number of candidates that pass through downstream ranking layers.

We’ll explore model distillation in a future blog post, and for now focus on comparing engagement and funnel efficiency optimization strategies. How did we implement these strategies? Because our logging pipeline captures data at the serving stage, we have access to examples that did pass (impressed Pins) and did not pass (unimpressed Pins) the full ranking layer. We designed our training pipeline to allow us to easily define the positive and negative labels depending on our optimization strategy:

In practice, we saw that both types of funnel efficiency approaches did indeed increase the number of Pixie pins being passed through the recommendation funnel. However, the “pure” funnel efficiency approach was less effective at predicting which Pins a user was going to take action on. This is because, even with significantly higher training weights for action labels, the large number of impression labels and low complexity of the model made it hard for the model to distinguish between the two types of positive labels. We saw that the “blended” funnel efficiency approach showed the most favorable results, even compared with those of the models optimized for engagement.

Additionally, each client has different business needs. For example, the home feed may want to maximize saved Pins for a given person whereas notifications may want to maximize clicks within an email. To address these concerns, we trained a different model for each client and emphasized the training weights relevant to the particular surface. In practice, we saw this led to much finer control and major metrics gains on important labels as compared with the original, rigid heuristic-based sorting.

Lastly, ad-hoc boosting, such as “localness boosting”, was previously applied to the final rank for each Pin. To match the performance of this type of boosting, we not only added locale features to the model, but also assigned “local labels” for those examples in our training set where the locale of the pin matched that of the user. In practice, we saw that these labels significantly boosted local recommendations, allowing us to replace the previous boosting layer.

Wins

Impact to Pixie and its clients

We saw great wins for the models instituted for each of Pixie’s major clients. On the home feed we saw an increase in saves by 1–2% and on Related Pins we saw a 1% increase in both time spent on the site and CTR. For e-mail notifications powered by Pixie, we saw a 6% increase in CTR with significant gains in weekly active users as a result. Having a scalable pipeline with the flexibility to define new labels and optimization strategies allows us to cleanly cater to each client’s dynamic business needs.

Impact to the user: improved quality of recommendations

On the user side, lightweight ranking allows us to fetch more relevant and personalized recommendations earlier on in the funnel. Below is one such example of this.

Figure 4. Comparing recommendations from old heuristic-based sorting versus those with lightweight ranking.

On the left side of Figure 4, you can see that for a user looking at tips on hiking Rainbow Mountain in Peru, the recommendations without lightweight ranking are dominated by wallpaper images of general travel scenes, unrelated to tips, hiking, or Peru. On the right side, you see that all of the recommended pins are about either Peru, hiking tips, or other mountainous travel destinations, and are, therefore, much more relevant.

Future directions

In the development of this model, we faced a few recurring challenges:

  1. Infrastructure limitations: Due to the scale of Pixie, the high workload limited our capacity to add new features and to use more complex machine learning models to improve performance.
  2. Model fragmentation: On some client surfaces, such as the Home Feed, we have several different lightweight ranking models. Having multiple models leads to a waste of computational resources and redundant development efforts across a single surface.

To address these challenges, we plan to unify the serving framework and model design used across all candidate generators (not just limited to Pixie) on a particular surface. By moving the lightweight ranking away from an individual candidate generator, we are able to share features more broadly, unify the logging pipeline, and easily iterate on all models in unison.

We plan to further investigate the difference in performance between different optimization strategies, such as model distillation, and to experiment with more powerful, but less costly, model architectures, such as a Two Tower Embedding-Based DNN (inspired by YouTube).

Acknowledgements

Tao Cheng, Angela Sheu, Chen Chen, Se Won Jang, Jay Adams, Nadia Fawaz

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Engineer Manager, Content Knowledge S...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest helps people Discover and Do the things they love. We have more than 450M monthly active users who actively curate an ecosystem of more than 100B Pins on more than 1B boards, creating a rich human curated graph of immense value. 

Technically, we are building out an internet scale personalized recommendation engine in 22+ languages, which requires a deep understanding of the users and content on our platform. As engineer manager on the Content Knowledge Signal team, you’ll work on building 20+ content understanding signals based on Pinterest Knowledge Graph, which will make measurably positive impact on hundreds of millions of users with improved recommendation and featurization breakthroughs on almost all Pinterest product surfaces (Discovery, Shopping, Growth, Ads, etc). 

What you'll do:

  • Manage a horizontal team of talented and dedicated ML engineers to build the foundational content understanding and engagement features of our contents to be used across all Pinterest ecosystems
  • Utilize state of the art algorithms/industry best practice to build and improve content understanding signals 
  • Partner with other engineering teams and sales & marketing team to discover future opportunities to improve content recommendation on Pinterest
  • Hire new engineers to grow the team
  • Build ML models using text and visual information of a pin, identify the most relevant set of text annotations for that pin. These sets of highly relevant annotations are among the most important features used in more than 30 use cases within Pinterest, including key ranking models of Homefeed, Search and Ads.
  • Build ML models using text and images of the products, to understand their product categories (bags, shoes, shirts, etc) and their attributes (brand, color, style, etc). They are used to greatly improve relevance for product recommendation on major shopping surfaces. 
  • Build ML models to understand search queries, then use them, together with Pin level signals, to boost search relevance. 
  • Build graph based embedding as well as explicit annotation to represent the specialties of our native content creators, to improve creator and native content recommendation.
  • Build highly efficient and expandable data pipelines to understand engagement data at various entity levels. Such engagement signals are the major feature of the ranking models for our three main Discovery surfaces. 
  •  

What we're looking for:

  • 2+ years of industrial experience in ML team’s EM or TL for one or multiple of the following use cases with large scale: ads targeting, search and discovery, growth, content/user understanding
  • Hands-on experience working with ML algorithm development and productization.  
  • Experience working with PMs and XFN partners on E2E systems and moving business metrics

#TG1

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Software Engineer, Machine Learning P...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

We are seeking a senior software engineer to build and boost Pinterest’s machine learning training and serving platforms and infrastructure. The candidate will work with different teams to design, build and improve our ML systems, including the model training computation platform, serving systems and model deployment systems.

What you'll do:

  • Design and build solutions to make the model training, serving and deployment process more efficient, more reliable, and less error-prone by human mistakes.
  • Design and build long term solutions to boost the model iteration velocity for machine learning engineers and data scientists.
  • Work extensively with ML engineers across Pinterest to understand their requirements, pain points, and build generalized solutions. Also work with partner teams to drive projects requiring cross-team coordination. 
  • Provide technical guidance and coaching to other junior engineers in the team.

What we're looking for:

  • Hands-on experience developing large-scale machine learning models in production, or experience working on the systems supporting onboarding large-scale machine learning models.
  • Ability to drive cross-team projects; Ability to understand our internal customers (ML practitioners), their common usage patterns and pain points.
  • Flexibility to work across different areas: tool building, model optimization, infrastructure optimization, large scale data processing pipelines, etc.
  • 5+ years of professional experience in software engineering.
  • Fluency in Python and either Java or Scala (Fluency in C++ for the MLS role).
  • Past tech lead experience is preferred, but not required. (Not necessary for the MLS role).

#LI-GB2

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Engineering Manager, Ads Engagement M...
San Francisco, CA, US; Palo Alto, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is one of the fastest growing online ad platforms, and our success depends on mining rich user interest data that helps us connect users with highly relevant advertisers/products. We’re looking for an Engineering Manager with experience in machine learning, data mining, and information retrieval to lead a team that develops new data-driven techniques to show the most engaging and relevant promoted content to the users. You’ll be leading a world-class ML team that is growing quickly and laying the foundation for Pinterest’s business success.

What you’ll do:

  • Manage and grow the engineering team, providing technical vision and long-term roadmap
  • Design features and build large-scale machine learning models to improve ads engagement prediction
  • Effectively collaborate and partner with several cross functional teams to build the next generation of ads engagement models
  • Mentor and grow ML engineers to allow them to become experts in modeling/engagement prediction 

What we’re looking for:

  • Degree in Computer Science, Statistics or related field
  • Industry experience building production machine learning systems at scale, data mining, search, recommendations, and/or natural language processing
  • 1+ years of experience leading projects/ teams either as TL/ TLM/ EM
  • Cross-functional collaborator and strong communicator
  • Experience with ads domain is a big plus

#LI-SM4

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Engineering Manager, Ads Marketplace
San Francisco, CA, US; Palo Alto, CA, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Within the Ads Quality team, we try to connect the dots between the aspirations of Pinners and the products offered by our partners. In this role, you will lead a team of engineers that is responsible for monetizing our Shopping and Creator surfaces. Using your strong analytical skill sets, a thorough understanding of auction mechanisms, and experience in managing an engineering team, you will advance the state of the art in Marketplace design and yield management.

What you’ll do:

  • Manage a team of engineers with a background in ML, backend development, economics, and data science to:
    • Monetize new surfaces effectively and responsibly 
    • Interface with our Product and Organic teams to understand requirements and build solutions that cater to our advertisers and users
    • Build models to enable scalable solutions for ad allocation, eligibility and pricing on the new surfaces
    • Hold a high standard for engineering excellence by building robust and future proof systems with an appreciation for simplicity and elegance
    • Identify gaps and opportunities as we expand and execute on closing those gaps effectively and in a timely manner
  • Work closely with Product on planning roadmap, set technical direction and deliver values
  • Coach and mentor team members and help them develop their career path and achieve their career goals

What we’re looking for:

  • Degree in Computer Science, Statistics, or related field
  • 2+ years of management experience
  • 5+ years of relevant experience
  • Background in computational advertising, econometrics, shopping
  • Strong industry experience in machine learning
  • Experience with ads domain is a big plus
  • Cross-functional collaborator and strong communicator

#LI-SM4

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like