Powering Pinterest Ads Analytics with Apache Druid

1,206
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

The Change

When we launched Promoted Pins in 2014, we chose Apache HBase as our database to store and serve all of our reporting metrics. At the beginning of our ads business, this was an appropriate choice because the number of reporting features needed and overall traffic was low. Additionally, HBase already had a proven record in the industry at this time, and we knew how to successfully operate an HBase cluster.

Five years later, our business has matured. As our ads scale has increased dramatically, so have the complexities of the metrics we report to our partners, which has rendered HBase insufficient for our fine-grained analytical needs. As a result, we surveyed the available options and settled on Druid to be the core component of our next iteration.

Why Druid?

HBase works very well when it comes to accessing random data points, but it’s not built for fast grouping and aggregation. In the past, we’ve solved this by pre-building these data views, but as the features needed for our reporting expanded, it was no longer possible to store so many different cuts. Druid allowed us to bypass all of this complicated data slicing ingestion logic, and also supports:

  • Real-time ingestion via Kafka
  • Automatic versioning of ingested data
  • Data pre-aggregation based on user-set granularity
  • Approximate algorithms for count-distinct questions
  • A SQL interface
  • Easy to understand code and a very supportive community

Data Ingestion

Druid supports two modes of ingestion: Native and Hadoop. Both are launched via a REST call to the Overlord node. In the native ingestion case, a thread is spawned directly on the MiddleManager node to read input data, while in the Hadoop case, Druid launches a MapReduce job to read the input in parallel. In both cases, the ingested data is automatically versioned based on its output datasource (table) and time interval. Druid will automatically start serving the newest version of the data as soon as it is available and keep the older segments in a disabled state, should we ever need to revert to a previous version. Since we have several different data pipelines producing different sets of metrics with the same dimensions into a single datasource, this was a problem for us. How do we keep the data versioned but not have each independent pipeline overwrite the previous one’s output?

Namespacing shard specs proved to be the answer. Druid’s standard approach to versioning segments is by their datasource name, time interval and time written. We expanded on this system by also including a namespace identifier. We then built a separate versioned interval timeline per namespace in a datasource, rather than just one timeline per datasource:

This also meant that we needed to either change the existing ingestion mechanisms to create segments with namespaces or invent a new ingestion mechanism. Since we ingest billions of events per day, native ingestion is too slow for us, and we were not keen on setting up a new Hadoop cluster and changing the Hadoop indexing code to adhere to namespaces.

Instead, we chose to adapt the metamx/druid-spark-batch project to write our own data ingestion using Spark. The original druid-spark-batch project works in a similar fashion to the Hadoop indexer, but instead of launching a Hadoop job, it launches a Spark job. Our project runs inside of a stand-alone job without the need to use any resources of the Druid cluster at all. It works as follows:

  1. Filter out events not belonging to the output interval
  2. Partition data into intervals based on the configured granularity and number of rows per segment file
  3. Use a pool of Druid’s IncrementalIndex classes to persist intermediate index files on disk in parallel
  4. Use a final merge pass to collect all index files into a segment file
  5. Push to deep storage
  6. Construct and write metadata to MySQL

Once the metadata is written, the Druid coordinator will find new segments on its next pull of the metadata table and assign the new segments to be served by historical nodes.

Cluster Setup

In general, the date ranges for querying advertising data fall into three categories:

  1. Most recent time period to display
  2. Year-over-year performance reporting
  3. Random ad-hoc queries of old, historical data.

The number of queries for the most recent day vastly outnumber all other reporting types. With this understanding, we bucketed our Druid cluster into three historical tiers:

  • A “hot” tier serving the most recent data on expensive compute-optimized nodes to handle large QPS.
  • A “cold” tier on mid compute, lots of disk space-optimized nodes. Serves the last year of data sans data in the Hot tier.
  • An “icy” tier on low compute nodes having even more disk space. Serves all other historical data.

Each historical in the hot tier has very low maximum data capacity to guarantee that all segments the node is serving are loaded in memory without needing to page swap. This ensures low latency for most of our user-driven queries. Queries for older data are generally made by automated systems or report exports which allow for higher latency in preference to high operating cost.

While this works very well for the average query patterns, there are cases of unexpected high load which require higher QPS tolerance from the cluster. The obvious solution here would be to scale up the number of historical nodes for these specific cases, but Druid’s data rebalancing algorithm is very slow at scale. It can take many hours or even days for a multi-terabyte cluster to rebalance data evenly once a new set of servers joins the fleet. To build an efficient auto-scaling solution, we could not afford to wait so long.

Since optimizing the rebalancing algorithm would be very risky to deploy on a huge production system, we decided instead to implement a solution for mirroring tiers. This system uses maximum bipartite matching to link each node in the mirror tier to exactly one node in the primary tier. Once the link is established, the mirroring historical doesn’t need to wait to be assigned segments by the rebalancing algorithm. Instead, it will pull the list of segments served by the linked node from the primary tier and download those from deep storage for serving. It doesn’t need to worry about replication since we expect these mirror tiers to be turned on and off very frequently, operating only during periods of heavy traffic. See below for more information:

During testing we were able to achieve significant auto-scaling improvement given a mirroring tier solution. The most significant portion of time taken now from server launch to query serving is limited I/O bandwidth from deep storage.

Time taken to load 31 TB of data. 2 hours for natural rebalancing. 5 minutes for mirroring tier.

Query Construction

Our Druid deployment is external facing, powering queries made interactively from our ads management system as well as programmatically through our external APIs. Often these query patterns will look very different per use case, but in all cases, we needed a service to construct Druid queries quickly and efficiently as well as to reject any invalid queries. Programmatic access to our API means that we receive a fair number of queries which request invalid dates or repetitive queries asking for entities which have no metrics.

Percent of queries returning empty results per API client. Some clients request non-existent metrics up to 90% of the time.

Constructing and asking Druid to execute these queries is possible but accrues overhead which is unaffordable in a low-latency system. To short-circuit queries for non-existent entities, we developed a metadata store listing entities and their metric-containing time intervals. If a query’s requested entities have no metrics for the specified time intervals, we can return immediately and relieve Druid from additional network and CPU workload.

Druid supports two APIs to query data: native and SQL. SQL support is a newer feature backed by Apache Calcite. In the backend, it takes a Druid SQL query, parses it, analyzes it, and turns it into a Druid native query which is then executed. SQL support has numerous advantages — it’s much more user friendly and certainly better at constructing more efficient ad-hoc queries than if the user was to come up with some unfamiliar JSON.

SQL was our first choice when implementing our query constructor and execution service namely due to our familiarity with SQL. It worked, but we quickly identified certain query patterns which Druid could not complete and traced the issue to performance bottlenecks in the SQL parser for queries with thousands of filters or many complicated projections. In the end, we settled on using native queries as our primary access path to Druid, keeping SQL support for internal use cases that are not latency sensitive.

System Tuning

Coming from a key-value world, the individual queries originating from our API layer were tailored to be low in complexity to allow an optimal number of point lookups. This also meant querying each entity individually, resulting in high QPS in the backend. To minimize the disruption to our entire infrastructure, we wanted to keep our changes simple and get as close as possible to simply exchanging HBase for Druid. In practice, that proved to be completely impossible.

Druid holds network connections between servers in a greedy manner, using a set of new connections per query. It also opens object handles per query, which is the primary bottleneck in a high QPS system. To lessen the network load, we ramped up the complexity of each query by batching the number of requested entities. We observed our system to perform at its best with between 1,000 to 2,000 requested entities in IN filter type queries, although every deployment will differ.

QPS after implementing query batching. 15,000 request / second peaks lowered by 10x

On the server side, we found the basic cluster tuning guidance suggested by the Druid documentation very helpful. One non-obvious caveat is being mindful of how many GroupBy queries can be executed at any time given the number of merge buffers configured. GroupBy queries should be avoided whenever possible in preference to Timeseries and TopN queries. These types of queries do not require merge buffers and therefore need fewer resources to execute. In our stack, we have the option to impose rate limiting based on query type to avoid too many GroupBy queries at once given the number of configured merge buffers.

The Future

We’re excited to have finished the long journey to bring Druid into production, but of course our work continues. As Pinterest’s business grows, our work on the core Druid platform for analytics has to evolve alongside it. It might be difficult to seamlessly contribute all our effort into the main Druid repository, but we hope to share our effort with the community. Namely on features such as a Spark writer and reader of Druid segments, mirroring tiers for auto scaling, and developing a new multiplexing IPC protocol instead of HTTP. While ads analytics matures, we are also onboarding other teams’ use cases, helping them discover how best to use Druid at scale for their needs.

Acknowledgments

This project was a joint effort across multiple teams: Ads Data, Ads API, and Storage & Caching. Contributors and advisors include Lucilla Chalmer, Tian-Ying Chang, Julian Jaffe, Eric Nguyen, Jian Wang, Weihong Wang, Caijie Zhang, and Wayne Zhao.

Credit also goes to Imply.io leaders Gian Merlino and Fangjin Yang for introducing us to and helping us bootstrap Druid.

We’re building the world’s first visual discovery engine. More than 320 million people around the world use Pinterest to dream about, plan and prepare for things they want to do in life. Come join us!

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Senior Staff Machine Learning Enginee...
San Francisco, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Homefeed is a discovery platform at Pinterest that helps users find and explore their personal interests. We work with some of the largest datasets in the world, tailoring over billions of unique content to 330M+ users. Our content ranges across all categories like home decor, fashion, food, DIY, technology, travel, automotive, and much more. Our dataset is rich with textual and visual content and has nice graph properties — harnessing these signals at scale is a significant challenge. The Homefeed ranking team focuses on the machine learning model that predicts how likely a user will interact with a certain piece of content, as well as leveraging those individual prediction scores for holistic optimization to present users with a feed of diverse content.

What you’ll do:

  • Work on state-of-the-art large-scale applied machine learning projects
  • Improve relevance and the user experience on Homefeed
  • Re-architect our deep learning models to improve their capacity and enable more use cases
  • Collaborate with other teams to build/incorporate various signals to machine learning models
  • Collaborate with other teams to extend our machine learning based solutions to other use cases

What we’re looking for:

  • Passionate about applied machine learning and deep learning
  • 8+ years experience applying machine learning methods in settings like recommender systems, search, user modeling, image recognition, graph representation learning, natural language processing

#L1-EA2

EPM Lead Developer, Adaptive Planning...
San Francisco, CA

About Pinterest: 

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

The EPM technology team at Pinterest is looking for a senior EPM architect who has at least four years of technical experience in Workday Adaptive Planning. You will be the solutions architect who oversees technical design of the complete EPM ecosystem with emphasis on Adaptive Financial and Workforce planning. The right candidate will also need to have hands-on development experience with Adaptive Planning and related technologies. The role is in IT but will work very closely with FP&A and the greater Finance/Accounting teams. Experience with Tableau suite of tools is a plus.

What you'll do: 

  • Together with the EPM Technology team, you will own Adaptive Planning and all related services
  • Oversee architecture of existing Adaptive Planning solution and make suggestions for improvements
  • Solution and lead Adaptive Planning enhancement projects from beginning to end
  • Help EPM Technology team gain deeper understanding of Adaptive Planning and train the team on Adaptive Planning best practices
  • Establish strong relationship with Finance users and leadership to drive EPM roadmap for Adaptive Planning and related technologies
  • Help establish EPM Center of Excellence at Pinterest

What we're looking for: 

  • Hands-on design and build experience with all Adaptive Planning technologies: standard sheets, cube sheets, all dimensions, reporting, integration framework, security, dashboarding and OfficeConnect
  • Strong in application design, data integration and application project lifecycle
  • Comfortable working side-by-side with business
  • Ability to translate business requirements to technical requirements
  • Strong understanding in all three financial statements and the different enterprise planning cycles
  • Familiar with Tableau suite of tools

 

Software Engineer, Shopping Discovery
San Francisco, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Shopping is at the core of Pinterest’s mission to help people create a life they love. The shopping discovery team at Pinterest is inventing a brand new, more visual and personalized shopping experience for 350M+ users worldwide. The team is responsible for delivering mid-funnel shopping experience on shopping surfaces like Product Detail Page, Shopping Search, Shopping on Board etc. You'll be responsible for optimizing the whole page layout by appropriately selecting and slotting the UI templates and recommendation modules optimizing towards a shopping metric. As an engineer of the team you will be working on the most cutting edge recommendation algorithms to develop diverse types of shopping recommendations that will be displayed across different shopping surfaces on Pinterest.

What you'll do: 

  • Develop large scale shopping recommendation algorithms
  • Build data pipelines to do data analysis and collect training data
  • Train deep learning models to improve quality and engagement of shopping recommenders
  • Work on backend and infrastructure to build, deploy and serve machine learning models
  • Develop algorithms to optimize the whole page layout of the shopping surfaces
  • Drive the roadmap for next generation of shopping recommenders

What we're looking for: 

  • 5+ years working experience in the area of applied Machine Learning
  • Interest or experience working on a large-scale search, recommendation and ranking problems
  • Interest and experience in doing full stack ML, including backend and ML infrastructure
  • Experience with big data technologies MapReduce/Hadoop/Hive/Presto/Spark
  • Expert in Java, C++ or Python

#LI-LP2

Backend Engineer, Ads Indexing Platform
San Francisco, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Pinterest is one of the fastest growing online advertising platforms and our continued success depends on the reliability, performance, and scalability of the Ads indexing  systems. You’ll help us build the world class Ads indexing in-house solutions that will deliver world class performance and scale to 100x our current size. You'll join a small, early-stage team, working on multiple critical functional areas and lay the foundation for Pinterest’s business success. 

What you'll do:

  • Own and innovate the core functional areas of the Ads indexing systems
  • Re-architect core Ads indexing services/components to achieve greater scalability, freshness, performance, efficiency, and reliability
  • Apply distributed systems principles to build next-generation Ads indexing services/components
  • Develop a low latency incremental indexing pipeline to empower various of Pinterest products with the fresh Shopping Catalog data

What we're looking for:

  • 3+ years of industry experience with distributed systems, data infrastructure, and systems programming
  • Experience in solving complex scaling, latency, or performance problems in high-volume distributed systems
  • Proficiency in at least one of the systems languages (Java, C++)
  • Experience in building and owning critical user-facing backend serving systems

#LI-GK1

Verified by
Security Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like