How We Designed Our Continuous Integration System to be More Than 50% Faster

3,681
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Urvashi Reddy | Software Engineer, Engineering Productivity Team


Earlier this year, the Engineering Productivity team at Pinterest published a blog called How a one line change decreased our clone times by 99%. In that post, we described how a simple Git configuration sped up clone times in one of our largest repositories at Pinterest. In this post, we’ll talk about how we significantly decreased our build times in a CI that serves another major repository at Pinterest. Spoiler alert: it was not a one line change!

The content covered in this blog was presented at BazelCon 2020. Check out the presentation Designing a Language Agnostic CI with Bazel Queries.

The Engineering Productivity team’s vision is to “build a developer platform that inspires developers to do their best work.” One of the integral pieces of this platform is the Continuous Integration (CI) pipelines. The CI pipelines are responsible for validating code changes and producing release artifacts that can be deployed to one of our supported Continuous Delivery platforms. With over 1,000 engineers at the company, our team is faced with an interesting challenge of providing reliable and fast CI pipelines that serve the major repositories at scale.

In order to meet those outcomes, our team made a few key design choices:

  • Adopt Bazel as our build tool
  • Use language based monorepos
  • Test and release only the changed code
  • Leverage Bazel’s BUILD file as a contract between CI and developers
  • Create release abstractions with custom Bazel rules
  • Parallelize work as much as possible

Adopting Bazel

Choosing a single build tool that is multi-language allows our team to create CI workflows that can be applied to any repository using Bazel at Pinterest. Since Bazel is hermetic by design, we can run Bazel targets in CI without needing to configure or manage dependencies on the host machines.

Language based monorepos

Our CI pipelines are triggered when code is committed to a repository, which means having a CI pipeline for every repository. In order to limit the number of CI pipelines and repositories we have to manage, we group our services into one repository per language.

If you’re interested in learning more about the above two choices, check out another BazelCon 2020 presentation from our team called Pinterest’s Journey to a Monorepo.

Test and release only changed code

At scale, running every target within a repository is expensive. Even with Bazel’s cache, running all test targets with bazel test… still means spending time fetching and loading dependencies for targets. Those network calls to the cache are time consuming and can be avoided entirely if only the minimal set of targets are run.

Additionally, the services within our monorepos vary in commit frequency. Some of them are contributed to daily while others are more sporadic. By running in CI only what’s affected by a change, we can significantly speed up our build times.

So how do we get the minimal set of targets to run? We created a Golang CLI called build-collector to report the targets to run in CI. The CLI takes in a set of commits and uses Bazel queries to output the list of targets to run. The CLI looks at the files that were changed and runs the appropriate query to find the affected targets. For example, if a couple source code files were changed build-collector would run the following query:

The above command uses the rdeps query function to find the reverse dependencies of the source files. The output is a list of targets we can run in CI. In order to get test targets specifically, build-collector wraps the above with a filter using the kind query function:

Note: Alternatively, the tests query function can be used to filter for test targets

This is just one type of query that build-collector runs. The full list of queries are explained in the Designing a Language Agnostic CI with Bazel Queries presentation. At this point, you might be wondering how we get release targets. We’ll cover that further below when we talk about our custom release rule implementation.

In our CI script, we call build-collector in the following way:

Here’s an example of build-collector’s output file for test targets. The same JSON schema is used for release targets as well.

The CI script parses the JSON files outputted by build-collector and runs the targets in parallel across multiple workers.

Leverage Bazel’s BUILD file as a contract between CI and developers

We want Pinterest engineers to focus on feature work and not have to learn too many things about how our CI is set up. Given that we also want to run the minimal set of targets in CI, we need a way for developers to communicate which targets are for local development and what parts of their service should be tested and released in CI. This is where the Bazel BUILD file comes in handy since it is already the place that developers are defining tests and release artifacts pertaining to their service. Developers follow a few simple conventions in the BUILD file so that our CI can figure out exactly what to build.

Those conventions are:

  • Use test rule types for running tests (standard Bazel practice)
  • Use Pinterest custom release rules for generating release artifacts
  • Use supported tags like no-ci to indicate what should not run in the pipelines

Create release abstractions with custom Bazel rules

At Pinterest, we support a number of different artifact types that are released to various stores. For example, docker images are sent to an internal registry, jars are published to Artifactory, etc. To support these workflows, we implemented custom Bazel rules for the common use cases at Pinterest. The custom rules help us to create an abstraction over the infrastructure. All developers need to do is indicate what they want to publish within the BUILD file using our custom rules.

A common workflow is creating and publishing docker images that are then referenced when deploying to EC2 or Kubernetes. Below is an example of how engineers can use a custom Bazel rule called container_release to get CI to make their release artifacts available for deployment.

In this example BUILD file, this service has created a docker image using the open source container_image rule. Using the custom container_release rule, the service author can publish the docker image to a Pinterest registry as well as specify what deployment artifacts should be made available to our Continuous Delivery platforms (Teletraan for EC2 deployments and Hermez for Kubernetes workloads).

There are a few benefits we get from implementing custom release rules:

  • It abstracts the infrastructure layers for our developers. They don’t have to be aware of where the deployment artifacts end up and how they are consumed by our CD platforms.
  • Developers control what parts of their service are released via version controlled code
  • We can support dev versions of their release artifacts by controlling where the artifacts are released.

That last benefit is made possible with another release abstraction within the custom release rule implementation. Each Pinterest custom release rule is actually a Bazel macro that generates two custom release rules: artifact_ci_release and artifact_dev_release. Our developers don’t see or interact with these rules directly, but they are used by our CI and local development workflows to ensure that they are run in the right context. For instance, below is the Bazel query build-collector runs to obtain release targets for source code changes:

A further optimization we made here was to control the dependency order that the release artifacts are run in within the artifact_*_release implementation. For instance, the docker images are published to the registry before we publish the YAML files that reference them. Doing it this way made the querying logic in build-collector fast and straightforward.

Parallelize as much as possible

In order to make sure our CI is running as fast as possible, we want to parallelize running targets wherever possible. We currently achieve this with what we call the Dispatcher Model. The Dispatcher Model is pretty simple: we figure out what targets need to run in CI and dispatch the execution of those targets to workers that run in parallel.

This has a significant benefit when running release targets. If a developer only cares about releasing artifacts from a few services they contribute to, they shouldn’t have to wait for all the other services in the CI build to be finished. Running release targets independently and in parallel provides developers with their release artifacts as soon as they are ready.

What about Pull Request (PR) builds?

Our PR pipeline kicks off a CI run every time a new pull request is created. We patch the code changes and use a temporary commit to pass to build-collector. This allows us to easily reuse the same CI setup for PR builds as well. The only difference is that we don’t create any release artifacts and instead check that the release targets can compile by running Bazel build.

The Results

At the beginning of this year, we invested in the above design choices in repo called Optimus. Optimus is a monorepo that houses 120+ java services and holds some of our most critical data platforms at Pinterest. Optimus and its CI pipeline serves 300 monthly active contributors.

At the beginning of this year, we didn’t use build-collector and weren’t using the release rule abstractions in Optimus. In place of those things, we were running all the targets within a service when code changes were made, and we had granular release rules for releasing release artifacts. At that time, the P50 time for the CI build was 52mins and the P90 time was 69mins. After migrating to our new CI design, we saw the P50 time drop to 19mins and P90 went to 49mins within a week. That’s 2.7x faster for P50 and 1.4x for P90!

Chart comparing build times the week before and after the CI migration

One month distribution of build times with the old CI

One month distribution of build times with new CI

Learnings

Bazel is a powerful tool we can leverage to create a Continuous Integration Platform that works for a variety of use cases at Pinterest. Optimizations like build-collector and the custom release rules build the foundational layer of the platform from which we can create more enhancements that will further improve the velocity and health of our CI builds. Some of the things we’d like to look into next with Bazel are: remote execution, profiling, and a system for automatically detecting and excluding flaky tests.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Android Engineer
Warsaw, POL

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

What you’ll do:

  • Build product features into existing VOCHI app to enrich it with a lot of video/audio editing tools  (effects, filters, canvas, trim/split/merge, audio effects, speed and other)
  • Knit across teams by collaborating with Product managers and designers and other functions to build smooth Feed and Video editor experience
  • Prototype and create integrative solutions that can be utilized both in VOCHI and Pinterest mobile clients
  • Contribute best-in-class programming skills to develop highly innovative consumer-facing mobile products
  • Contribute to and lead each step of the product development process, from ideation to implementation to release; from rapidly prototyping, running A/B test, to architecting and building solutions that can scale to support millions of users

What we’re looking for:

  • 6+ years of software engineering experience
  • 4+ years of industry experience in developing Android applications
  • Deep understanding of developing on Android in Kotlin and Java
  • Deep understanding of Clean Architecture principles, and MVVM and MVP design patterns 
  • Strong skills and great product sense
  • Knowledge on multi-threading, memory management and caching on mobile application
  • Android Media framework (MediaCode, MediaMuxer, MediaExtractor) and Exoplayer experience is a plus
  • Also GLES and NDK experience is a plus, but not a must have

#LI-DL2

 

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Machine Learning Engineer
Warsaw, POL

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

We search for someone to bring new ideas to improve existing neural networks as well as develop new ones to create new effects for VOCHI mobile applications (which is a part of Pinterest). 

What you’ll do:

  • Support and improve existing neural network architectures and datasets
  • Measure and analyze performance of our CNNs
  • Create new networks and build prototypes with them
  • Read new articles, review existing approaches and discuss them with the team

What we’re looking for:

  • 2+ years of experience in computer vision and machine learning
  • Ability to build effective pipelines for training and evaluating neural networks
  • Experience with different ML frameworks and conversion of models between them
  • Strong communication skills

#LI-DL2

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Software Engineer, Data Warehouse
Dublin, Leinster, IE

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

As an engineer on the core data team you'll provide intelligence to the rest of the company that will enable making better product decisions. You'll make use of the latest advances in large scale data processing to uncover insights in data. You’ll work on building critical data warehouse tables with a world-class team of engineers towards the mission of enabling data-driven products and insights at Pinterest.

What you’ll do:

  • Work with cross functional stakeholders to design and architect data warehouse and analytics solutions
  • Build robust data pipelines that collect, process, and compute business metrics from activity data using Spark, SparkSQL, Hadoop
  • Create critical datasets for machine learning, growth funnels, business forecasting, and many other strategic initiatives
  • Work with business analysts and data engineers to build new analysis tools and metrics for measuring product engagement

What we're looking for:

  • 4+ years of industry experience
  • Degree in computer science or a relevant quantitative field
  • Solid understanding of CS fundamentals
  • Proficient in Python/Scala/Java
  • Experience with one or more of the following frameworks: Spark, SparkSQL, Hadoop

#LI-DL2

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Technical Program Manager, Engineering
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

We are looking for talented Technical Program Managers to manage programs in multiple programs including Pinterest’s Core user-facing products. You will be responsible for a variety of different challenges from broad-based organizational efficiency across large engineering groups to more focused engagements driving development of new functionality and product experiences. But, no matter the challenge, you’ll always be seeking to drive alignment, organize teams, expose risk, increase predictability and make our organizations run more efficiently.

What you’ll do:

  • Proactively identify and drive key strategic explorations 
  • Develop, manage and optimize pivotal processes that govern Product and Engineering life-cycle such as our launch readiness and approval programs
  • Develop programs that improve rigor and governance of product signals and user insights
  • Translate strategic plans into execution plans, and be accountable to drive them to completion
  • Execute at both breadth and depth. At breadth by ensuring goals are met, and at depth by driving complex projects across multiple teams
  • Work cross functionally with engineering, product, design, data analytics, Sales, Business Development, Research, and product marketing to turn ambiguous opportunities into actionable outcomes
  • Define new processes from ground up to streamline collaboration between stakeholders
  • Influence teams and build relationships with key stakeholders across disciplines and organizations
  • Build effective and transparent communication channels to adaptively communicate schedules, priorities, status and risk to various functional stakeholders at all levels of the company 
  • Build relationships across teams including Engineering, Product, Design, Research, etc.

 

What we’re looking for:

  • Experience working closely with engineering and product teams developing consumer-facing web products
  • Deep understanding of modern product development and experimentation practices
  • Prefer candidates with a passion for big data, analytics and processing systems
  • 5+ years of experience as a project manager or program manager with proven ability in managing multiple workflows and project plans
  • Passion for execution, and getting things done
  • Entrepreneurial spirit with strong leadership, who can develop and lead a new initiative from the ground up and thrive in ambiguity
  • Solid background in software, product or design development and ability to quickly conceptually understand technical designs, challenges and risks
  • Strong analytical abilities, able to create & monitor metrics, proactively identify growth opportunities & incidents
  • Experience breaking down complex problems and drive decision making and prioritization discussions
  • Strong written, verbal communication skills and comfort in communicating nuanced, and often technical, concepts to internal stakeholders or external partners 
  • Ability to influence teams & drive alignment among multiple stakeholders 
  • Demonstrated experience in creating and driving efficient processes at scale
  • Prior experience working in ads or measurement space is preferred

#LI-RH1

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Verified by
Software Engineer
Sourcer
Software Engineer
Talent Brand Manager
Tech Lead, Big Data Platform
Security Software Engineer
You may also like