How Pinterest Fights Spam Using Machine Learning

642
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Vishwakarma Singh | Trust and Safety Machine Learning Lead


Hundreds of millions of people regularly visit Pinterest to visually discover inspiring ideas among billions of Pins. Inspiration is a high bar and we must be vigilant in ensuring that Pinners don’t see spam, harmful content or misinformation. To enforce our community policies and maintain an inspiring environment, we use the latest in machine learning technology to build automated systems that swiftly detect and act against both spammy content and spammers.

Our anti-spam system consists of both reactive and proactive components to effectively counter adversarial abusers — users who intentionally try to evade the system. Our proactive system consists of sophisticated machine learning models, whereas the reactive system includes both rules executed in a real-time rules engine and lightweight machine learning models. We not only use the latest modeling techniques but also iterate on these models at regular intervals by adding new data and exploring new technical breakthroughs to either maintain or improve their performance over time to effectively address spam.

One tactic malicious actors enact is misusing a Pin’s image and linking to a malicious external website. Our models detect spam vectors, like Pin links, as well as users engaging in spammy behaviors. We quickly limit distribution of Pins with spam links and take direct action against users identified with a high confidence to be engaging in spammy behavior. We perform a manual review for those identified with low confidence to limit false positives, and we notify users of our actions to maintain transparency and also provide an option of appeal against our decision.

Machine Learning Models

Spam Domain Model

We proactively identify spam Pin links using a Deep Neural Network classifier (shown in Figure 1). To maximize impact, our model learns to classify a domain as spam rather than a link. We apply the same enforcement to all Pins with links belonging to the same domain. This model is trained interactively on manually labeled domains to achieve a higher recall and lower false positive rate. We use features created from links, web page text and media, user-domain interactions, and user behavior as inputs. For each domain, we sample links and webpages to create features. We semantically split links into semantic tokens and use only frequent tokens as features. We analyze outlying patterns in user actions over time to create behavioral features. This model is periodically batch inferred at scale by a PySpark job using Tensorflow, Spark SQL, and a UDF.

Figure 1. Deep Neural Network for domain classification

Spam User Model

Identifying users engaging in spam activities is the ultimate solution for fighting spam, but it is extremely hard to achieve. We leverage both supervised and unsupervised models to build an effective spam user identification system.

Classification Model

Our spam user classification model is a Deep Neural Network (shown in Figure 2) and is part of our proactive system. It is trained using synthetically labeled data generated with minimal human supervision to ensure quality. We use features created from user attributes and their past behaviors as inputs. We also use user-domain interaction, summarized as a domain scores distribution for each user where domain scores are reused from the spam domain model, as an input. This model is periodically batch inferred to score millions of Pinners by a PySpark job using Tensorflow, Spark SQL, and a UDF.

Figure 2. Deep Neural Network for user classification

Clustering

We have developed lightweight clustering models for early detection of suspicious users and bots. This technique also addresses gaps in our classification models, which are unaware of emerging patterns unless re-trained with fresh labeled data. We cluster users on attributes which can successfully isolate suspicious groups with high accuracy. Experts identify these attributes by exploring the behavior of suspicious users and their use of resources for creating spammy content. This model is implemented using PySpark and SparkSQL and executes daily.

Spam User-Domain Model

Interactions of users with domains are explicitly captured by a heterogeneous bipartite graph as shown in Figure 3. We represent users and domains as nodes in the graph and create an edge between a user and a domain if the user has created or saved a Pin with the domain’s link. This graph facilitates simultaneous identification of spam users and domains using a semi-supervised learning. We use a small set of labeled users and domains to run a label propagation algorithm and learn scores for the unlabeled users and domains. We implement this iterative algorithm in Spark and run it periodically.

Figure 3. Bipartite graph of users and domains for label propagation

Measurement

We measure spam prevalence on Pinterest by computing the number of Pin impressions which either have spam links or have been created by users engaging in spammy activities. We periodically sample and manually review both impressed Pins and users. We scaled our measurement by starting to sample and review from highly impressed head domains and then extended the coverage to tail domains over a period of time. These samples are used for measuring overall spam prevalence as well as training our machine learning models.

Conclusion

Pinterest’s mission is to bring everyone the inspiration to create a life they love. We strive to protect our Pinners’ experiences by swiftly and appropriately acting against malicious users and spam content as identified by our array of latest machine learning models. We plan to keep investing in evolving our community guidelines and technology to address inevitably emerging challenges and bring the best experience to our millions of valued users.

Acknowledgements

Thanks to Yuanfang Song, Omkar Panhalkar, Rundong Liu, Qinglong Zeng, Attila Dobi, Abhijit Mahabal, Alok Singhal, Maisy Samuelson, and the rest of the Trust and Safety team for their contributions in developing machine learning models for spam! Thanks to Harry Shamansky for helping with the publication of the blog post!

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Android Engineer, Client Excellence
Mexico City, MEX

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

On the Client Excellence team you ensure Pinners have a high quality experience on Pinterest. You do this by improving our critical client metrics like crash-free users and by upgrading our supported libraries and operating systems. You also partner with other engineering teams to improve the developer experience and champion operational excellence.

What you’ll do:

  • Improve the quality of our apps by monitoring and improving core client metrics e.g. crash-free user rate, app size, memory management and cpu usage
  • Drive library and OS upgrades with minimal disruption across Pinterest
  • Partner with other engineering teams to improve client developer experience
  • Champion operational excellence across all client engineering teams

What we’re looking for:

  • Deep understanding of Android development and best practices in Java or Kotlin
  • Knowledge on multi-threading, logging, memory management, caching and builds on Android
  • Expertise in developing and debugging across a diverse service stack including storage and data solutions
  • Demonstrated track record of improving software quality with stable releases
  • Experience on platform teams/initiatives, driving technology adoption across feature teams
  • Keeps up to date with new technologies to understand what should be incorporated 
  • Strong collaboration and communication skills
Backend Engineer, Discovery Measurements
Mexico City, MEX

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest personalizes millions of experiences by using machine learning algorithms to sift through our catalog of one hundred billion Pins to find the best content for each Pinner. It is critical to measure the users experience across Pinterest and identify opportunities for improvement. The Discovery Measurements team’s charter is to establish human-powered ground truth for major Pinterest products, e.g. Search and Ads, and develop company critical measurements about relevance, domain quality, session experience, retention, etc. As we look to scale these platforms both vertically and horizontally, we’re looking for strong software engineers to join the team to drive technical excellence and curiosity. We need someone who has experience as a backend developer as well as drive to dive into challenging data processing and data mining problems.

What you’ll do:

  • Build a platform that enables teams to evaluate and train their ML models
  • Design and scale company-wide online & offline measurement platforms for organic and ad content
  • Design and develop company critical measurements, including relevance, domain quality, session experience, retention, user satisfaction
  • Establish technical foundation to generate insightful signals about Pin and Pinners that could power other ML models in the Pinterest ecosystem
  • Partner with cross-functional stakeholders to align engineering efforts for high impact technical initiatives

What we’re looking for:

  • Fluent in any of the following languages: C/C++, Java, JavaScript, Python
  • Exposure to architectural patterns of a large, high-scale web application (e.g., well-designed APIs, high volume data pipelines, efficient algorithms)
  • Model of software engineering best practices, including agile development, unit testing, code reviews, design documentation, debugging, and problem solving
  • Familiar with large data processing and measurement
  • Curiosity for leveraging data and metrics to identify challenging opportunities and build impactful solutions
Engineering Manager, Client Excellence
Mexico City, MEX

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

We’re looking for an Engineering Manager to build out the Client Excellence team. This team of Android, iOS, Web and API engineers is responsible for ensuring Pinners have a high quality experience on Pinterest. They do this by creating tools to monitor and improve our critical client metrics like crash-free sessions, keeping our critical libraries up to date and partnering with other engineering teams to champion operational excellence.

What you’ll do:

  • Build out an experienced team of Android/iOS/Web/API engineers and help them develop new skills and advance in their careers
  • Provide a vision to the team, drive technical excellence and partner with key stakeholders to prioritize and deliver on the team's roadmap
  • Improve the quality of our apps by monitoring and improving core client metrics e.g. crash-free user rate, app size, memory management and cpu usage
  • Create an operational strategy to drive library and OS upgrades with minimal disruption across Pinterest
  • Partner with other engineering teams to discover future opportunities to improve client developer experience
  • Champion operational excellence across all client engineering teams

What we’re looking for:

  • Strong communication, people development and software project management skills
  • Ability to deliver on immediate goals and form long-term strategies around technology, processes, and people
  • Demonstrated track record of improving software quality with stable releases
  • Ability to dive deeply into platform metrics (e.g. crash rates, logging) to identify opportunities for focus
  • Experience leading platform teams/initiatives, driving technology adoption across feature teams
Fullstack Engineer, Discovery Measure...
Mexico City, MEX

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest personalizes millions of experiences by using machine learning algorithms to sift through our catalog of one hundred billion Pins to find the best content for each Pinner. It is critical to measure the users experience across Pinterest and identify opportunities for improvement. The Discovery Measurements team’s charter is to establish human-powered ground truth for major Pinterest products, e.g. Search and Ads, and develop company critical measurements about relevance, domain quality, session experience, retention, and more. As we look to scale these platforms both vertically and horizontally, we’re looking for strong software engineers to join the team to drive technical excellence and curiosity. We need someone who has experience as a full-stack engineer to dive into challenging human-in-the-loop AI problems.

What you’ll do:

  • You will start by building human-in-the-loop AI platforms to power ML models on production
  • Design and implement the UI layer by closely working with Data Scientist, Product Managers, and Machine Learning engineers
  • Contribute to the new unified human computation backend service
  • Build the scalable backend API infrastructure which can be used to measure and evaluate all various deep learning and machine learning models on production

What we’re looking for:

  • Mastery in frontend stack (Javascript/HTML/CSS), familiarity with modern frontend frameworks (e.g. React/Redux)
  • Knowledge of backend stack (Java, Python, Go) and how they interact with MySQL, Redis, Kafka, etc.
  • Good judgment about shipping improvement quickly while ensuring the sustainability of platforms
  • Ability to measure and improve large scale platforms
Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like