Pinterest Flink Deployment Framework

1,843
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Rainie Li | Software Engineer, Stream Processing Platform Team


Background

At Pinterest, stream processing allows us to unlock value from real time data for pinners and partners. The Stream Processing Platform team is working on building a reliable and scalable platform to support many critical streaming applications including real-time experiment analytics and real time machine learning signals.

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. It provides features including exactly-once guarantees, low latency, high throughput, and powerful computation model. At Pinterest, we adopt Flink as the unified streaming processing engine.

Requirements

Standardize Flink Build

At Pinterest, we use Bazel as a build system. We need a standardized Bazel rule to build all Flink jobs without changing Makefiles. Once build is done, instead of asking users to copy Flink jars to YARN clusters, jars should be automatically uploaded to remote storage.

Deployment and Operations History

Users used to copy Flink jars to YARN clusters and manually run commands. It was hard to track previous execution histories if we needed to recover failed jobs. We need to provide standard Flink operations such as launching, killing, triggering savepoint, and resuming jobs from the most recent savepoint.

Job Deduplication

Flink applications are deployed as services, therefore one instance should be running at a time for each Flink application. We need to prevent cases when users accidentally deploy twice for the same job, meaning both instances might write to the same Kafka topic. This would mean double writes to Kafka and could affect downstream jobs.

Deployment Framework

We built our Flink deployment framework on top of Bazel, Hermez (internal continuous deployment platform), Job Submission Service (internal service), and YARN clusters.

Figure 1. Deployment high level architecture

Create Bazel BUILD file

The BUILD file needs to contain load(“flink_release”). Users also need to insert a Bazel rule like this:

Define Hermez Deployment File

Hermez is the Pinterest Continuous Deployment System. In order to launch a Flink job with Hermez, users need to create a Hermez.yml file. This file contains information including which YARN cluster Flink jobs to run in, what YARN parameters to use, what resources to use, etc. For each instance of Flink job, users should set up a separate YAML file. For example, if users run their jobs in dev, staging, and prod environments, they will need to have three different YAML files (one for each environment).

Here’s an example of yml file:

Automatically Flink Job Building

The following numbers are referring to steps in Figure 1: Deployment high level architecture

Whenever a user lands a change to Git repo, Jenkins job will be triggered to build Flink job JARs (1). Jenkins job will follow flink_relase rules that are described in the BUILD file to build Flink JAR and upload it to the S3 bucket (3). Meanwhile, it will upload deployment related Hermez YAML files to Artifactory (2). Hermez monitors Artifactory; when it sees a new yml file, it will display it on UI to allow users to launch a job using that yml (5).

Flink Job Launching

When users launch a Flink job, Hermez converts the yml file into a JSON and submits it to Job Submission Service (JSS) (6). JSS is a service maintained by Pinterest that has the ability to schedule and launch Flink jobs to YARN clusters.

JSS examines the request and ensures that Flink JARs and Flink job state exist in S3 (7). If everything is alright, JSS will first launch a shell-runner job which will execute a command on a YARN cluster cluster (8). The shell-runner job downloads the Flink job’s JAR from S3 and then kicks off the actual Flink job using the configuration provided by JSS (9). The reason we add a shell-runner job is to keep JSS as a thin layer without dealing with different compute engine clients (Flink, Spark, MapReduce, etc.) and different configurations for each cluster.

JSS Deduplication

When resuming a Flink job, we provide several options including resume from most recent savepoint or checkpoint, fresh state, and specify a savepoint or checkpoint path. Job deduplication features ensure that there is only one instance of your Flink job running at a time.

The way job deduplication works is that each job has a unique name when a job is submitted. If there is already an instance of the job running, JSS will trigger a safepoint and stop it first, then submit the new job. If the stop request fails because savepoint fails, then the submitted request will fail and the running instance remains running. If there is one deployment in progress, the new job submission would be rejected

Flink Job Configuration Hotfix

Due to Flink configuration being packaged together with Flink job binary, users used to check in config changes to Repo and rebuild the package. This whole process could take more than 10 minutes. This can be a problem if we would like to quickly adjust parameters during incidents. For example, when Flink jobs failed in production due to lack of resources, we used to go through the entire build process to rollout resource config changes. After the incidents got resolved, we needed to check in another change to roll back these configs. To speed up this process, we provide a hotfix feature on Hermez to overwrite Flink job configuration without code change. Users can adjust Flink configuration values during deployment. Behind the scenes, Hermez will directly overwrite these values in ymls which Hermez read from Artifactory.

What’s Next

Reducing Deployment Latency

The current approach launches shell-runner first. Then, shell-runner launches Flink jobs to YARN clusters which could increase latency. We plan to improve this process to reduce end-to-end Flink job launch time.

Automatically Job Failover

To further improve platform and Flink application availability, we built YARN clusters in multiple AWS Availability Zones (AZ) to provide backup when one cluster or one AZ become unavailable. We are also building a service that could automatically detect any cluster failure and failover failed jobs to backup clusters in different AZs or detect application failures and restart the application automatically.

Stay tuned!

Acknowledgments

Thanks to Steven Bairos-Novak and Yu Yang for their countless contributions. Thanks Ang Zhang for updating this blog. This project is a joint effort across multiple teams at Pinterest. Thanks to the Engineering Productivity Team for Hermez support.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Machine Learning Engineer
San Francisco, CA, US; Palo Alto, CA, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

With more than 400 million users around the world and 300 billion ideas saved, Pinterest Machine Learning engineers build personalized experiences to help Pinners create a life they love. With just over 3,000 global employees, our teams are small, mighty, and still growing. At Pinterest, you’ll experience hands-on access to an incredible vault of data and contribute large-scale recommendation systems in ways you won’t find anywhere else.

What you’ll do:

  • Build cutting edge technology using the latest advances in deep learning and machine learning to personalize Pinterest
  • Partner closely with teams across Pinterest to experiment and improve ML models for various product surfaces (Homefeed, Ads, Growth, Shopping, and Search), while gaining knowledge of how ML works in different areas
  • Use data driven methods and leverage the unique properties of our data to improve candidates retrieval
  • Work in a high-impact environment with quick experimentation and product launches
  • Keeping up with industry trends in recommendation systems 

 

What we’re looking for:

  • 2+ years of industry experience applying machine learning methods (e.g., user modeling, personalization, recommender systems, search, ranking, natural language processing, reinforcement learning, and graph representation learning)
  • End-to-end hands-on experience with building data processing pipelines, large scale machine learning systems, and big data technologies (e.g., Hadoop/Spark)
  • Nice to have:
    • M.S. or PhD in Machine Learning or related areas
    • Publications at top ML conferences
    • Expertise in scalable realtime systems that process stream data
    • Passion for applied ML and the Pinterest product

 

#LI-HYBRID
#LI-LA1

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Android Engineer, Platform
San Francisco, CA, US; New York City, NY, US; Portland, OR, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

We are looking for inquisitive, well-rounded Android engineers to join our Platform engineering teams. Working closely with product managers, designers, and backend engineers, you’ll play an important role in enabling the newest technologies and experiences. You will build robust frameworks & features. You will empower both developers and Pinners alike. You’ll have the opportunity to find creative solutions to thought-provoking problems. Even better, because we covet the kind of courageous thinking that’s required in order for big bets and smart risks to pay off, you’ll be invited to create and drive new initiatives, seeing them from inception through to technical design, implementation, and release.

What you’ll do:

  • Support millions of users and enable colleagues by ensuring excellence in core pieces that are shared throughout the application
  • Identify app-wide challenges; propose, test, and ship solutions
  • Drive changes that improve the entire app such as modularization, implementing image/video loading, RTL text, dependency injection, and reusable UI components
  • Enable developers to work more effectively by improving app architecture, testing capabilities and release cycles
  • Solve hard-to-see user pain points that often affect the entire app such as performance, monitoring crash rates and solving user metric anomalies
  • Grow as a developer by working with world-class peers on varied and high impact projects

What we’re looking for:

  • Deep understanding of Android development and best practices in Kotlin and/or Java, e.g. Activity Lifecycle, memory management, etc.
  • 2+ years of industry Android application development experience, building consumer or business facing products
  • Experience in following best practices in writing reliable and maintainable code that may be used by many other engineers
  • Ability to keep up-to-date with new technologies to understand what should be incorporated
  • Strong collaboration and communication skills

Platform Android Engineering teams:

Android Platform - Frameworks & Architecture

Android Platform - Mobile Networking

Android Platform - UI Components / Gestalt

Metrics Quality

Video Foundation

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Android Engineer, Product
San Francisco, CA, US; New York City, NY, US; Portland, OR, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

We are looking for inquisitive, well-rounded Android engineers to join our product Engineering teams. Working closely with product managers, designers, and backend engineers, you’ll play an important role in enabling the newest technologies and experiences. You will build robust frameworks & features. You will empower both developers and Pinners alike. You’ll have the opportunity to find creative solutions to thought-provoking problems. Even better, because we covet the kind of courageous thinking that’s required in order for big bets and smart risks to pay off, you’ll be invited to create and drive new initiatives, seeing them from inception through to technical design, implementation, and release.

What you’ll do:

  • Build out Pinner-facing frontend features in Android to power the future of inspiration on Pinterest
  • Contribute to and lead each step of the product development process, from ideation to implementation to release; from rapidly prototyping, running A/B tests, to architecting and building solutions that can scale to support millions of users
  • Partner with design, product, and backend teams to build end to end functionality
  • Put on your Pinner hat to suggest new product ideas and features
  • Employ automated testing to build features with a high degree of technical quality, taking responsibility for the components and features you develop
  • Grow as an engineer by working with world-class peers on varied and high impact projects

What we’re looking for:

  • Deep understanding of Android development and best practices in Kotlin and/or Java, e.g. Activity Lifecycle, memory management, etc.
  • 2+ years of industry Android application development experience, building consumer or business facing products
  • Experience in following best practices in writing reliable and maintainable code that may be used by many other engineers
  • Ability to keep up-to-date with new technologies to understand what should be incorporated
  • Strong collaboration and communication skills

Product Android Engineering teams:

Activation - New User Growth

Closeup Product

Community Engagement 

Creator Incentives 

Creator Monetization

Home Product

Pinterest TV

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Backend Engineer, Monetization
Palo Alto, CA, US; San Francisco, CA, US; New York City, NY, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

We are looking for inquisitive, well-rounded Backend engineers to join our Monetization engineering teams. Working closely with product managers, designers, and backend engineers, you’ll play an important role in enabling the newest technologies and experiences. You will build robust frameworks & features. You will empower both developers and Pinners alike. You’ll have the opportunity to find creative solutions to thought-provoking problems. Even better, because we covet the kind of courageous thinking that’s required in order for big bets and smart risks to pay off, you’ll be invited to create and drive new initiatives, seeing them from inception through to technical design, implementation, and release.

What you’ll do:

  • Build out the backend for Pinner-facing features to power the future of inspiration on Pinterest
  • Contribute to and lead each step of the product development process, from ideation to implementation to release; from rapidly prototyping, running A/B tests, to architecting and building solutions that can scale to support millions of users
  • Partner with design, product, and backend teams to build end-to-end functionality
  • Put on your Pinner hat to suggest new product ideas and features
  • Employ automated testing to build features with a high degree of technical quality, taking responsibility for the components and features you develop
  • Grow as an engineer by working with world-class peers on varied and high impact projects

What we’re looking for:

  • 2+ years of industry backend development experience, building consumer or business facing products
  • Proficiency in common backend tech stacks for RESTful API, storage, caching and data processing
  • Experience in following best practices in writing reliable and maintainable code that may be used by many other engineers
  • Ability to keep up-to-date with new technologies to understand what should be incorporated
  • Strong collaboration and communication skills

Backend Monetization Engineering teams: 

Ads API Platform

Ads Indexing Platform

Ads Retrieval Infra

Ads Serving and ML Infra

Measurement Ingestion

Merchant Infra 

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Verified by
Software Engineer
Sourcer
Software Engineer
Talent Brand Manager
Tech Lead, Big Data Platform
Security Software Engineer
You may also like