How a One Line Change Decreased Our Clone Times by 99%

1,685
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Urvashi Reddy | Software Engineer, Engineering Productivity Team Adam Berry | Tech Lead, Engineering Productivity Team Rui Li | Software Engineer, Engineering Productivity Team


The Engineering Productivity team at Pinterest came across a small change that had a large impact in reducing build times across pipelines. We found that setting the refspec option during git fetch reduced our cloning step by 99%.

The Engineering Productivity team at Pinterest is responsible for supporting the engineers who build and deploy software at the company. Our team maintains a number of infrastructure services and often works on large scale efforts — migrating all software at Pinterest to Bazel, creating a Continuous Delivery platform called Hermez, and maintaining the monorepos that get committed to a few hundred times a day, to name a few.

All our larger efforts are designed to progress on the goal of making software development and delivery at Pinterest a fast and painless experience. Recently, we were reminded of how small details can also have a large impact on that goal. We discovered an overlooked Git option that significantly reduced the build times in our continuous integration pipelines. In order to understand how this small change had such a large impact, we need to share a little information on our monorepos and our pipelines.

Monorepos and Pipelines

We have six main repositories at Pinterest: Pinboard, Optimus, Cosmos, Magnus, iOS, and Android. Each one is a monorepo and houses a large collection of language-specific services. Pinboard is as old as the company and is the largest monorepo. Pinboard has more than 350K commits and is 20GB in size when cloned fully.

Cloning monorepos that have a lot of code and history is time consuming, and we need to do it frequently throughout the day in our continuous integration pipelines. For Pinboard alone, we do more than 60K git pulls on business days. Most of our Jenkins pipeline configuration scripts (written in Groovy) start with a “Checkout” stage where we clone the repository that the later stages will build and test. Here’s what a typical “Checkout” stage looks like:

If we were using the Git CLI directly, it would translate to:

Even though we’re telling Git to do a shallow clone, to not fetch any tags, and to fetch the last 50 commits, we’re still not running this operation as fast as we could be. That’s because we aren’t setting the refspec option. Notice that by not setting it in the pipeline configuration script, we’re telling Git to fetch all refspecs: +refs/heads/*:refs/remotes/origin/*. In the case of Pinboard, that operation would be fetching more than 2,500 branches.

By simply adding the refspec option and specifying which refs we care about (master, in our case), we can limit the refs to the branch we care about and save a lot of time. Here’s what that looks like in our pipeline code:

This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result. Cloning our largest repo, Pinboard went from 40 minutes to 30 seconds. It goes to show that sometimes our small efforts can have a big impact as well.

Learnings

Like most developer productivity teams, we work on large services that have a big impact on our day-to-day experience. However, it’s sometimes the one line fixes that can make a huge difference. Our work is about understanding that.

If you have a passion for improving developer productivity, join our team! We’re hiring for engineering roles on our Engineering Productivity team.

To learn more about Engineering at Pinterest, check out our Engineering Blog, and visit our StackShare page. To view and apply to open opportunities, visit our Careers page.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Backend Engineer, Measurement User Match
Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our mission is to help advertisers gain a deep understanding of their ad performance and generate helpful insights so they can make good decisions about their ad campaigns. You’d design and build systems and services to help advertisers learn more about conversions, viewability, brand lift, sales lift, offline conversions, etc. We’re building end-to-end Big Data distributed systems using a board mix of leading open source and Cloud technologies and integrating with 3rd party tools that Advertisers already trust.

What you’ll do:

  • Increase visibility and scale of conversion capture to power our measurement, targeting, and auction products
  • Create cutting edge technical solutions to match conversion events to Pinners
  • Design and build conversion tags, APIs, and data processing algorithms around tracking and reporting against conversions

What we’re looking for:

  • 3+ years of software engineering experience
  • Experiences in developing backend large scale distributed services and data processing workflows in Java and Python

#LI-GK1

Engineering Manager, Shopping Content...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data.Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As the Engineering Manager, you’ll be responsible for:
    • Growing this team further in Toronto
    • Driving execution and deliver impact
    • Setting long term technical visions for this area
  • Work with tech leads to provide technical guidance on:
    • Large scale systems that can process billions of products
    • ML models for wrapper induction that require few training examples, NLP models for understanding free-texts
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 7+ years of industry experience, including 2+ years of management experience
  • Experience on large scale machine learning systems (full ML stack from modelling to deployment at scale.)
  • Experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data

Nice to have:

  • PhD in Machine Learning or related areas, publication on top ML conferences
  • Familiarity with information extraction techniques for web-pages and free-texts.
  • Experience working with shopping data is a plus.
  • Experience building internal tools for labeling / diagnosing.

#LI-EA1

Staff Machine Learning Software Engin...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Shopping is at the core of Pinterest’s mission to help people create a life they love. The shopping discovery team at Pinterest is inventing a brand new, more visual and personalized shopping experience for 350M+ users worldwide. The team is responsible for delivering mid-funnel shopping experience on shopping surfaces like Product Detail Page, Shopping Search, Shopping on Board etc. As an engineer of the team you will be working on the most cutting edge recommendation algorithms to develop diverse types of shopping recommendations that will be displayed across different shopping surfaces on Pinterest. 

You’ll also be responsible for optimizing the whole page layout by appropriately selecting and slotting the UI templates and recommendation modules optimizing towards a shopping metric. As an engineer of the team you’ll be running experiments and directly improving the shopping metrics contributing to the bottom line of the company.

If you are excited about large scale machine learning problems in the area of recommendation, search and whole page optimization then you must consider this role

What you'll do: 

  • Develop large scale shopping recommendation algorithms
  • Build data pipelines to do data analysis and collect training data
  • Train deep learning models to improve quality and engagement of shopping recommenders
  • Work on backend and infrastructure to build, deploy and serve machine learning models
  • Develop algorithms to optimize the whole page layout of the shopping surfaces
  • Drive the roadmap for next generation of shopping recommenders

What we're looking for: 

  • 6+ years working experience in the area of applied Machine Learning
  • Interest or experience working on a large-scale search, recommendation and ranking problems
  • Interest and experience in doing full stack ML, including backend and ML infrastructure
  • Experience is any of the following areas
    • Developing large scale recommender systems
    • Contextual bandit algorithms
    • Reinforcement learning

#LI-JY1

Senior Machine Learning Engineer, Sho...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data. Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As a ML engineer, you will design and build large scale ML systems that can process billions of products
  • ML models for wrapper induction that require few training examples, NLP models for understanding free-texts
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 3+ years of industry experience
  • Hands-on experience on large scale machine learning systems (full ML stack from modelling to deployment at scale.)
  • Hands-on experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data
  • Nice to have: PhD in Machine Learning or related areas, publication on top ML conferences, Familiarity with information extraction techniques for web-pages and free-texts, Experience working with shopping data is a plus

#LI-EA1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like