Open Sourcing Querybook, Pinterest’s Collaborative Big Data Hub

9,072
Pinterest
Pinterest's profile on StackShare is not actively maintained, so the information here may be out of date.

An efficient big data solution for an increasingly remote-working world.

By Charlie Gu | Tech Lead, Analytics Platform, Lena Ryoo | Software Engineer, Analytics Platform, and Justin Mejorada-Pier | Engineering Manager, Analytics Platform

With more than 300 billion Pins, Pinterest is powering an ever-growing and unique dataset that maps interests, ideas, and intent. As a data-driven company, Pinterest uses data insights and analysis to make product decisions and evaluations to improve the Pinner experience for more than 450 million monthly users. To continuously make these improvements, especially in an increasingly remote environment, it’s more important than ever for teams to be able to compose queries, create analyses, and collaborate with one another. Today we’re taking Querybook, our solution for more efficient and collaborative big data access, and open sourcing it for the community.

The common starting point for any analysis at Pinterest is an ad-hoc query that gets executed on a SparkSQL, Hive, Presto cluster, or any Sqlalchemy compatible engine. We built Querybook to provide a responsive and simple web UI for such analysis so data scientists, product managers, and engineers can discover the right data, compose their queries, and share their findings. In this post, we’ll discuss the motivation to build Querybook, its features, architecture, and our work to open source the project.

The Journey

The proposal to build Querybook started in 2017 as an intern project. During that time, we used a vendor-supplied web application as the query UI. There were often user complaints about that tool regarding its UI, speed & stability, lack of visualizations, as well as difficulty in sharing. Before long, we realized there was a great opportunity to build a better querying interface.

We started to interview data scientists and engineers about their workflows while scoping out technical details. Shortly, we realized most were organizing their queries outside of the official tool, and many used apps like Evernote. Although Jupyter had a notebook user experience, its requirement to use Python/R and the lack of table metadata integration deterred many users. Based on this finding, our team decided Querybook’s query interface would be a document where users can compose queries and write analyses all in one place with the power of collocated metadata and the simplicity of a note-taking app.

Released internally in March 2018, Querybook became the official solution to query big data at Pinterest. Nowadays, Querybook on average has 500 DAUs and 7k daily query runs. With an internal user rating of 8.1/10, it’s one of the highest-rated internal tools at Pinterest.

Feature Highlights

Figure 1. Querybook’s Doc UI

When a user first visits, they‘ll quickly notice its distinctive DataDoc interface. This is the primary place for users to query and analyze. Each DataDoc is composed of a list of cells which can be one of three types: text, query, or chart.

  • The text cell comes with built-in rich-text support for users to jot down their ideas or insights.
  • The query cell is used to compose and execute queries.
  • The chart cell is used to create visualizations based on execution results. Similar to Google Docs, when users are granted access to a DataDoc, they can collaborate with each other in real-time.

With the intuitive charting UI, users can easily turn a DataDoc into an illustrative dashboard. You can choose from different visualization options, such as time-series, pie-charts, scatter plots, and more. You can then connect your visualization to the results of any query on your DataDoc and post-process them with sorting and aggregation as needed. To automatically update these charts, you can use the scheduling options and select your desired cadence. The scheduler can notify users of success or failure. Combined with the templating option powered by Jinja, creating a live updating DataDoc is very quick.

Scheduling and visualization features aren’t intended to replace tools such as Airflow or Superset. Rather, these features provide users a simple and fast way to experiment with their queries and iterate on them. Often, Pinterest engineers use Querybook as the first step to compose queries before creating production-level workflows and dashboards.

Last but not least, Querybook comes with an automated query analytics system. Every query executed gets analyzed to extract metadata such as referenced tables and query runners. Querybook uses this information to automatically update its data schema and search ranking, as well as to show a table’s frequent users and query examples. The more queries, the more documented the tables become.

Architecture

Figure 2. Overview of Querybook’s architecture

To understand how Querybook works, we’ll walk through the process of composing and executing a query.

  1. The first step is to create a DataDoc and write the query in a cell. While the user types, the user’s query gets streamed to the server via Socket.IO.
  2. The server then pushes the delta to all users reading that DataDoc via Redis. At the same time, the server would save the updated DataDoc in the database and create an async job for the worker to update the DataDoc content in ElasticSearch. This allows the DataDoc to be searched later.
  3. Once the query is written, the user can execute the query by clicking the run button. The server would then create a record in the database and insert a query job into the Redis task queue. The worker receives the task and sends the query to the query engine (Presto, Hive, SparkSQL, or any Sqlalchemy compatible engine). While the query is running, the worker pushes live updates to the UI via Socket.IO.
  4. When the execution is completed, the worker loads the query result and uploads it in batches to a configurable storage service (e.g. S3). Finally, the browser gets notified of the query completion and makes a request to the server to load the query result and display it to the user.

For brevity, this section only focused on one user flow of Querybook. However, all the infrastructure used is covered. Querybook allows some of it to be customized. For example, you can choose to upload execution results to either S3, Google Cloud Storage, or a local file. In addition, MySQL can also be swapped with any Sqlalchemy-compatible database such as Postgres.

The Path To Open Source

After noticing the success that Querybook had internally, we decided to open source it. One challenge we bumped into was how to make it generic while preserving some of the Pinterest-specific integrations. For this, we decided to have a two-layer organization through a plugin system and to add an Admin UI.

The Admin UI lets companies configure Querybook’s query engines, table metadata ingestion, and access permissions from a single friendly interface. Previously, these configurations were done inside configuration files and required a code change as well as a deployment to be reflected. With this new UI, admins can make live Querybook changes without going through code or config files.

Figure 3. The Admin UI

The plugin system integrates Querybook with the internal systems at Pinterest by utilizing Python’s importlib. With the plugin system, developers can configure auth, customize query engines, and implement exporters to internal sites. Customized behaviors provided by the plugin system allow Querybook to be optimized for the user’s workflow at Pinterest while ensuring the open-source is generic for the public.

You can check out more of Querybook’s features and its documentation on Querybook.org, and you can reach us at querybook@pinterest.com.

Acknowledgments: We want to thank the following engineers that have made contributions to Querybook: Lauren Mitchell, Langston Dziko, Mohak Nahta, and Franklin Shiao. And to Chunyan Wang, Dave Burgess, and David Chaiken for their critical advice and support.

Pinterest
Pinterest's profile on StackShare is not actively maintained, so the information here may be out of date.
Tools mentioned in article
Open jobs at Pinterest
Backend Engineer, Core & Monetization
San Francisco, CA, US; , CA, US
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p><span style="font-weight: 400;">We are looking for inquisitive, well-rounded Backend engineers to join our Core and Monetization engineering teams. Working closely with product managers, designers, and backend engineers, you’ll play an important role in enabling the newest technologies and experiences. You will build robust frameworks &amp; features. You will empower both developers and Pinners alike. You’ll have the opportunity to find creative solutions to thought-provoking problems. Even better, because we covet the kind of courageous thinking that’s required in order for big bets and smart risks to pay off, you’ll be invited to create and drive new initiatives, seeing them from inception through to technical design, implementation, and release.</span></p> <p><strong>What you’ll do:</strong></p> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">Build out the backend for Pinner-facing features to power the future of inspiration on Pinterest</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Contribute to and lead each step of the product development process, from ideation to implementation to release; from rapidly prototyping, running A/B tests, to architecting and building solutions that can scale to support millions of users</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Partner with design, product, and backend teams to build end-to-end functionality</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Put on your Pinner hat to suggest new product ideas and features</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Employ automated testing to build features with a high degree of technical quality, taking responsibility for the components and features you develop</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Grow as an engineer by working with world-class peers on varied and high impact projects</span></li> </ul> <p><strong>What we’re looking for:</strong></p> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">2+ years of industry backend development experience, building consumer or business facing products</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Proficiency in common backend tech stacks for RESTful API, storage, caching and data processing</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Experience in following best practices in writing reliable and maintainable code that may be used by many other engineers</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Ability to keep up-to-date with new technologies to understand what should be incorporated</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Strong collaboration and communication skills</span></li> </ul> <p><strong>Backend Core Engineering teams:</strong></p> <ul> <li><span style="font-weight: 400;">Community Engagement</span></li> <li><span style="font-weight: 400;">Content Acquisition &amp; Media Platform</span></li> <li><span style="font-weight: 400;">Core Product Indexing Infrastructure</span></li> <li><span style="font-weight: 400;">Shopping Catalog&nbsp;</span></li> <li><span style="font-weight: 400;">Trust &amp; Safety Platform</span></li> <li><span style="font-weight: 400;">Trust &amp; Safety Signals</span></li> <li><span style="font-weight: 400;">User Understanding</span></li> </ul> <p><strong>Backend Monetization Engineering teams:&nbsp;</strong></p> <ul> <li><span style="font-weight: 400;">Ads API Platform</span></li> <li><span style="font-weight: 400;">Ads Indexing Platform</span></li> <li><span style="font-weight: 400;">Ads Reporting Infrastructure</span></li> <li><span style="font-weight: 400;">Ads Retrieval Infra</span></li> <li><span style="font-weight: 400;">Ads Serving and ML Infra</span></li> <li><span style="font-weight: 400;">Measurement Ingestion</span></li> <li><span style="font-weight: 400;">Merchant Infra&nbsp;</span></li> </ul> <p>&nbsp;</p> <p><span style="font-weight: 400;">At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. This position will pay a base salary of $145,700 to $258,700. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.</span></p> <p><span style="font-weight: 400;">Information regarding the culture at Pinterest and benefits available for this position can be found at <a href="https://www.pinterestcareers.com/pinterest-life/">https://www.pinterestcareers.com/pinterest-life/</a>.</span></p> <p><span style="font-weight: 400;">This position is not eligible for relocation assistance.</span></p> <p>#LI-CL5&nbsp;</p> <p>#LI-REMOTE</p> <p>&nbsp;</p><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.</p></div>
Engineering Manager, Advertiser Autom...
San Francisco, CA, US; , CA, US
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p><span style="font-weight: 400;">As the Engineering Manager of the Advertiser Automation team, you’ll be leading a large team that’s responsible for key systems that are instrumental to the performance of ad campaigns, tying machine learning models and other automation techniques to campaign creation and management. The ideal candidate should have experience leading teams that work across the web technology stack, be driven about partnering with Product and other cross-functional leaders to create a compelling vision and roadmap for the team, and be passionate about helping each member of their team grow.</span></p> <p><strong>What you’ll do:</strong></p> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">Managing a team of full-stack engineers</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Work closely with Product and Design on planning roadmap, setting technical direction and delivering value</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Coordinate closely with XFN partners on multiple partner teams that the team interfaces with</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Lead a team that’s responsible for key systems that utilize machine learning models to help advertisers create more performant campaigns on Pinterest</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Partner with Product Management to provide a compelling vision and roadmap for the team.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Work with PM and tech leads to estimate scope of work, define release schedules, and track progress.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Mentor and develop engineers at various levels of seniority.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Keep the team accountable for hitting business goals and driving meaningful impact</span></li> </ul> <p><strong>What we’re looking for:</strong></p> <ul> <li style="font-weight: 400;"><em><span style="font-weight: 400;">Our PinFlex future of work philosophy requires this role to visit a Pinterest office for collaboration approximately 1x per quarter. For employees not located within a commutable distance from this in-office touchpoint, Pinterest will cover T&amp;E. Learn more about PinFlex <a href="https://www.pinterestcareers.com/pinflex/" target="_blank">here</a>.</span></em></li> <li style="font-weight: 400;"><span style="font-weight: 400;">1+ years of experience as an engineering manager (perf cycles, managing up/out, 10 ppl)</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">5+ years of software engineering experience as a hands on engineer</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Experience leading a team of engineers through a significant feature or product launch in collaboration with Product and Design</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Track record of developing high quality software in an automated build and deployment environment</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Experience working with both frontend and backend technologies</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Well versed in agile development methodologies</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Ability to operate in a fast changing environment / comfortable with ambiguity</span></li> </ul> <p>&nbsp;</p> <p><span style="font-weight: 400;">At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. This position will pay a base salary of $172,500 to $258,700. The position is also eligible for equity and incentive compensation. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.</span></p> <p><span style="font-weight: 400;">Information regarding the culture at Pinterest and benefits available for this position can be found at </span><a href="https://www.pinterestcareers.com/pinterest-life/"><span style="font-weight: 400;">https://www.pinterestcareers.com/pinterest-life/</span></a><span style="font-weight: 400;">.</span></p> <p>#LI-REMOTE</p> <p>#LI-NB1</p><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.</p></div>
Engineering Manager, Conversion Data
Seattle, WA, US; , WA, US
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p><span style="font-weight: 400;">Pinterest is one of the fastest growing online advertising platforms, and our continued success depends on our ability to enable advertisers to understand the value and return on their advertising investments. Conversion Data, a team within the Measurement org, is a Seattle engineering product team. </span><span style="font-weight: 400;">The Conversion Data team is functioning as custodian of conversion data inside Pinterest. We build tools to make conversion data accessible and usable for consumers with valid business justifications. We are aiming to have conversion data consumed in a privacy-safe and secured way. By providing toolings and support, we reduce friction for consumers to stay compliant with upcoming privacy headwinds.&nbsp;</span></p> <p><strong>What you’ll do</strong></p> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">Manager for the Conversion Data team (5 FTE ICs and 3 contractors) which sits within the Measurement Data Foundations organization in Seattle.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Help to reinvent how conversion data can be utilized for downstream teams in the world while maintaining a high bar for Pinner privacy.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Work closely with cross functional partners in Seattle as measurement is a cross-company cutting initiative.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Drive both short term execution and long term engineering strategy for Pinterest’s conversion data products.</span></li> </ul> <p><strong>What we’re looking for:</strong></p> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">Experience managing product development teams, including working closely with PM and Product Design to identify, shape and grow successful products</span></li> <li style="font-weight: 400;">The ideal candidate will have experience with processing high volumes of data at a scale.</li> <li style="font-weight: 400;">Grit, desire to work in a team, for the betterment of all - correlates to the Pinterest value of “acts like an owner”</li> <li style="font-weight: 400;">2+ years EM experience</li> </ul> <p><span style="font-weight: 400;">At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. This position will pay a base salary of $172,500 to $258,700. The position is also eligible for equity and incentive compensation. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.</span></p> <p><span style="font-weight: 400;">Information regarding the culture at Pinterest and benefits available for this position can be found at </span><a href="https://www.pinterestcareers.com/pinterest-life/"><span style="font-weight: 400;">https://www.pinterestcareers.com/pinterest-life/</span></a><span style="font-weight: 400;">.</span></p> <p>#LI-REMOTE</p> <p>#LI-NB1</p><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.</p></div>
UX Engineer
Warsaw, POL
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p><strong>What you’ll do:</strong></p> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">Work directly with the Motion design team in Warsaw to help bring their dynamic work to life.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Partner with the Design system team to align motion guidelines and build out a motion library.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Help build UI components, guidelines and interactions for the open source design system.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Partner with other teams across the Pinterest product to implement motion assets and promo pages within Pinterest.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Scope and prioritize your work; serve as the technical subject matter expert to build an end to end service culture for the motion team; building its independence and raising its visibility.&nbsp;</span></li> </ul> <p><strong>What we’re looking for:</strong></p> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">3+ years of experience building on the web platform.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Strong background in current web app development practices as well as a strong familiarity with Lottie, Javascript, Typescript and Webpack.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Solid experience with HTML and CSS fundamentals, and CSS Animation.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Experience with React.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Familiarity with accessibility best practices; ideally in the context of motion and animation.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Background and familiarity with modern design processes and tools like Figma and/or Adobe After Effects; working with designers and product managers.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Curiosity, strong communication and collaboration skills, self-awareness, humility, a drive for personal growth, and knowledge sharing.</span></li> </ul> <p><span style="font-weight: 400;">#LI-HYBRID</span></p> <p><span style="font-weight: 400;">#LI-DL2</span></p> <p>&nbsp;</p><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.</p></div>
You may also like