How We Designed Our Continuous Integration System to be More Than 50% Faster

2,167
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Urvashi Reddy | Software Engineer, Engineering Productivity Team


Earlier this year, the Engineering Productivity team at Pinterest published a blog called How a one line change decreased our clone times by 99%. In that post, we described how a simple Git configuration sped up clone times in one of our largest repositories at Pinterest. In this post, we’ll talk about how we significantly decreased our build times in a CI that serves another major repository at Pinterest. Spoiler alert: it was not a one line change!

The content covered in this blog was presented at BazelCon 2020. Check out the presentation Designing a Language Agnostic CI with Bazel Queries.

The Engineering Productivity team’s vision is to “build a developer platform that inspires developers to do their best work.” One of the integral pieces of this platform is the Continuous Integration (CI) pipelines. The CI pipelines are responsible for validating code changes and producing release artifacts that can be deployed to one of our supported Continuous Delivery platforms. With over 1,000 engineers at the company, our team is faced with an interesting challenge of providing reliable and fast CI pipelines that serve the major repositories at scale.

In order to meet those outcomes, our team made a few key design choices:

  • Adopt Bazel as our build tool
  • Use language based monorepos
  • Test and release only the changed code
  • Leverage Bazel’s BUILD file as a contract between CI and developers
  • Create release abstractions with custom Bazel rules
  • Parallelize work as much as possible

Adopting Bazel

Choosing a single build tool that is multi-language allows our team to create CI workflows that can be applied to any repository using Bazel at Pinterest. Since Bazel is hermetic by design, we can run Bazel targets in CI without needing to configure or manage dependencies on the host machines.

Language based monorepos

Our CI pipelines are triggered when code is committed to a repository, which means having a CI pipeline for every repository. In order to limit the number of CI pipelines and repositories we have to manage, we group our services into one repository per language.

If you’re interested in learning more about the above two choices, check out another BazelCon 2020 presentation from our team called Pinterest’s Journey to a Monorepo.

Test and release only changed code

At scale, running every target within a repository is expensive. Even with Bazel’s cache, running all test targets with bazel test… still means spending time fetching and loading dependencies for targets. Those network calls to the cache are time consuming and can be avoided entirely if only the minimal set of targets are run.

Additionally, the services within our monorepos vary in commit frequency. Some of them are contributed to daily while others are more sporadic. By running in CI only what’s affected by a change, we can significantly speed up our build times.

So how do we get the minimal set of targets to run? We created a Golang CLI called build-collector to report the targets to run in CI. The CLI takes in a set of commits and uses Bazel queries to output the list of targets to run. The CLI looks at the files that were changed and runs the appropriate query to find the affected targets. For example, if a couple source code files were changed build-collector would run the following query:

The above command uses the rdeps query function to find the reverse dependencies of the source files. The output is a list of targets we can run in CI. In order to get test targets specifically, build-collector wraps the above with a filter using the kind query function:

Note: Alternatively, the tests query function can be used to filter for test targets

This is just one type of query that build-collector runs. The full list of queries are explained in the Designing a Language Agnostic CI with Bazel Queries presentation. At this point, you might be wondering how we get release targets. We’ll cover that further below when we talk about our custom release rule implementation.

In our CI script, we call build-collector in the following way:

Here’s an example of build-collector’s output file for test targets. The same JSON schema is used for release targets as well.

The CI script parses the JSON files outputted by build-collector and runs the targets in parallel across multiple workers.

Leverage Bazel’s BUILD file as a contract between CI and developers

We want Pinterest engineers to focus on feature work and not have to learn too many things about how our CI is set up. Given that we also want to run the minimal set of targets in CI, we need a way for developers to communicate which targets are for local development and what parts of their service should be tested and released in CI. This is where the Bazel BUILD file comes in handy since it is already the place that developers are defining tests and release artifacts pertaining to their service. Developers follow a few simple conventions in the BUILD file so that our CI can figure out exactly what to build.

Those conventions are:

  • Use test rule types for running tests (standard Bazel practice)
  • Use Pinterest custom release rules for generating release artifacts
  • Use supported tags like no-ci to indicate what should not run in the pipelines

Create release abstractions with custom Bazel rules

At Pinterest, we support a number of different artifact types that are released to various stores. For example, docker images are sent to an internal registry, jars are published to Artifactory, etc. To support these workflows, we implemented custom Bazel rules for the common use cases at Pinterest. The custom rules help us to create an abstraction over the infrastructure. All developers need to do is indicate what they want to publish within the BUILD file using our custom rules.

A common workflow is creating and publishing docker images that are then referenced when deploying to EC2 or Kubernetes. Below is an example of how engineers can use a custom Bazel rule called container_release to get CI to make their release artifacts available for deployment.

In this example BUILD file, this service has created a docker image using the open source container_image rule. Using the custom container_release rule, the service author can publish the docker image to a Pinterest registry as well as specify what deployment artifacts should be made available to our Continuous Delivery platforms (Teletraan for EC2 deployments and Hermez for Kubernetes workloads).

There are a few benefits we get from implementing custom release rules:

  • It abstracts the infrastructure layers for our developers. They don’t have to be aware of where the deployment artifacts end up and how they are consumed by our CD platforms.
  • Developers control what parts of their service are released via version controlled code
  • We can support dev versions of their release artifacts by controlling where the artifacts are released.

That last benefit is made possible with another release abstraction within the custom release rule implementation. Each Pinterest custom release rule is actually a Bazel macro that generates two custom release rules: artifact_ci_release and artifact_dev_release. Our developers don’t see or interact with these rules directly, but they are used by our CI and local development workflows to ensure that they are run in the right context. For instance, below is the Bazel query build-collector runs to obtain release targets for source code changes:

A further optimization we made here was to control the dependency order that the release artifacts are run in within the artifact_*_release implementation. For instance, the docker images are published to the registry before we publish the YAML files that reference them. Doing it this way made the querying logic in build-collector fast and straightforward.

Parallelize as much as possible

In order to make sure our CI is running as fast as possible, we want to parallelize running targets wherever possible. We currently achieve this with what we call the Dispatcher Model. The Dispatcher Model is pretty simple: we figure out what targets need to run in CI and dispatch the execution of those targets to workers that run in parallel.

This has a significant benefit when running release targets. If a developer only cares about releasing artifacts from a few services they contribute to, they shouldn’t have to wait for all the other services in the CI build to be finished. Running release targets independently and in parallel provides developers with their release artifacts as soon as they are ready.

What about Pull Request (PR) builds?

Our PR pipeline kicks off a CI run every time a new pull request is created. We patch the code changes and use a temporary commit to pass to build-collector. This allows us to easily reuse the same CI setup for PR builds as well. The only difference is that we don’t create any release artifacts and instead check that the release targets can compile by running Bazel build.

The Results

At the beginning of this year, we invested in the above design choices in repo called Optimus. Optimus is a monorepo that houses 120+ java services and holds some of our most critical data platforms at Pinterest. Optimus and its CI pipeline serves 300 monthly active contributors.

At the beginning of this year, we didn’t use build-collector and weren’t using the release rule abstractions in Optimus. In place of those things, we were running all the targets within a service when code changes were made, and we had granular release rules for releasing release artifacts. At that time, the P50 time for the CI build was 52mins and the P90 time was 69mins. After migrating to our new CI design, we saw the P50 time drop to 19mins and P90 went to 49mins within a week. That’s 2.7x faster for P50 and 1.4x for P90!

Chart comparing build times the week before and after the CI migration

One month distribution of build times with the old CI

One month distribution of build times with new CI

Learnings

Bazel is a powerful tool we can leverage to create a Continuous Integration Platform that works for a variety of use cases at Pinterest. Optimizations like build-collector and the custom release rules build the foundational layer of the platform from which we can create more enhancements that will further improve the velocity and health of our CI builds. Some of the things we’d like to look into next with Bazel are: remote execution, profiling, and a system for automatically detecting and excluding flaky tests.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Backend Engineer, Measurement User Match
Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our mission is to help advertisers gain a deep understanding of their ad performance and generate helpful insights so they can make good decisions about their ad campaigns. You’d design and build systems and services to help advertisers learn more about conversions, viewability, brand lift, sales lift, offline conversions, etc. We’re building end-to-end Big Data distributed systems using a board mix of leading open source and Cloud technologies and integrating with 3rd party tools that Advertisers already trust.

What you’ll do:

  • Increase visibility and scale of conversion capture to power our measurement, targeting, and auction products
  • Create cutting edge technical solutions to match conversion events to Pinners
  • Design and build conversion tags, APIs, and data processing algorithms around tracking and reporting against conversions

What we’re looking for:

  • 3+ years of software engineering experience
  • Experiences in developing backend large scale distributed services and data processing workflows in Java and Python

#LI-GK1

Engineering Manager, Shopping Content...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data.Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As the Engineering Manager, you’ll be responsible for:
    • Growing this team further in Toronto
    • Driving execution and deliver impact
    • Setting long term technical visions for this area
  • Work with tech leads to provide technical guidance on:
    • Large scale systems that can process billions of products
    • ML models for wrapper induction that require few training examples, NLP models for understanding free-texts
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 7+ years of industry experience, including 2+ years of management experience
  • Experience on large scale machine learning systems (full ML stack from modelling to deployment at scale.)
  • Experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data

Nice to have:

  • PhD in Machine Learning or related areas, publication on top ML conferences
  • Familiarity with information extraction techniques for web-pages and free-texts.
  • Experience working with shopping data is a plus.
  • Experience building internal tools for labeling / diagnosing.

#LI-EA1

Staff Machine Learning Software Engin...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Shopping is at the core of Pinterest’s mission to help people create a life they love. The shopping discovery team at Pinterest is inventing a brand new, more visual and personalized shopping experience for 350M+ users worldwide. The team is responsible for delivering mid-funnel shopping experience on shopping surfaces like Product Detail Page, Shopping Search, Shopping on Board etc. As an engineer of the team you will be working on the most cutting edge recommendation algorithms to develop diverse types of shopping recommendations that will be displayed across different shopping surfaces on Pinterest. 

You’ll also be responsible for optimizing the whole page layout by appropriately selecting and slotting the UI templates and recommendation modules optimizing towards a shopping metric. As an engineer of the team you’ll be running experiments and directly improving the shopping metrics contributing to the bottom line of the company.

If you are excited about large scale machine learning problems in the area of recommendation, search and whole page optimization then you must consider this role

What you'll do: 

  • Develop large scale shopping recommendation algorithms
  • Build data pipelines to do data analysis and collect training data
  • Train deep learning models to improve quality and engagement of shopping recommenders
  • Work on backend and infrastructure to build, deploy and serve machine learning models
  • Develop algorithms to optimize the whole page layout of the shopping surfaces
  • Drive the roadmap for next generation of shopping recommenders

What we're looking for: 

  • 6+ years working experience in the area of applied Machine Learning
  • Interest or experience working on a large-scale search, recommendation and ranking problems
  • Interest and experience in doing full stack ML, including backend and ML infrastructure
  • Experience is any of the following areas
    • Developing large scale recommender systems
    • Contextual bandit algorithms
    • Reinforcement learning

#LI-JY1

Senior Machine Learning Engineer, Sho...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data. Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As a ML engineer, you will design and build large scale ML systems that can process billions of products
  • ML models for wrapper induction that require few training examples, NLP models for understanding free-texts
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 3+ years of industry experience
  • Hands-on experience on large scale machine learning systems (full ML stack from modelling to deployment at scale.)
  • Hands-on experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data
  • Nice to have: PhD in Machine Learning or related areas, publication on top ML conferences, Familiarity with information extraction techniques for web-pages and free-texts, Experience working with shopping data is a plus

#LI-EA1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like