Improving Efficiency and Reducing Runtime Using S3 Read Optimization

858
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Bhalchandra Pandit | Software Engineer


Overview

We describe a novel approach we took to improving S3 read throughput and how we used it to improve the efficiency of our production jobs. The results have been very encouraging. A standalone benchmark showed a 12x improvement in S3 read throughput (from 21 MB/s to 269 MB/s). Increased throughput allowed our production jobs to finish sooner. As a result, we saw 22% reduction in vcore-hours, 23% reduction in memory-hours, and similar reduction in run time of a typical production job. Although we are happy with the results, we are exploring additional enhancements in the future. They are briefly described at the end of this blog.

Motivation

We process petabytes of data stored on Amazon S3 every day. If we inspect the relevant metrics of our MapReduce/Cascading/Scalding jobs, one thing stands out: slower than expected mapper speed. In most cases, the observed mapper speed is around 5–7 MB/sec. That speed is orders of magnitude slower compared to the observed throughput of commands such as aws s3 cp, where speeds of around 200+ MB/sec are common (observed on a c5.4xlarge instance in EC2). If we can increase the speed at which our jobs read data, our jobs will finish sooner and save us considerable time and money in the process. Given that processing is costly, these savings can add up quickly to a substantial amount.

S3 read optimization

The Problem: Throughput bottleneck in S3A

If we inspect implementation of the S3AInputStream, it is easy to notice the following potential areas of improvement:

  1. Single threaded reads: Data is read synchronously on a single thread which results in jobs spending most of the time waiting for data to be read over the network.
  2. Multiple unnecessary reopens: The S3 input stream is not seekable. A split has to be closed and reopened repeatedly each time one performs a seek or encounters a read error. The larger the split, the greater the chance of it happening. Each such reopening further slows down the overall throughput.

The Solution: Improving read throughput

Architecture

Figure 1: Components of a prefetching+caching S3 reader

Our approach to addressing the above-mentioned drawbacks includes the following:

  1. We treat a split to be made up of fixed sized blocks. The size defaults to 8 MB but is configurable.
  2. Each block is read asynchronously into memory before it can be accessed by a caller. The size of the prefetch cache (in terms of number of blocks) is configurable.
  3. A caller can only access a block that has already been prefetched into memory. That delinks a client from network flakiness and allows us to have an additional retry layer to increase the overall resiliency.
  4. Each time we encounter a seek outside of the current block, we cache the prefetched blocks in the local file system.

We further enhanced the implementation to make it a mostly lock-free producer-consumer interaction. This enhancement improves read throughput from 20 MB/sec to 269 MB/sec as measured by a standalone benchmark (see details below in Figure 2).

Sequential reads

Any data consumer that processes data sequentially (for example, a mapper) greatly benefits from this approach. While a mapper is processing currently retrieved data, data next in sequence is being prefetched asynchronously. Most of the time, data has already been pre-fetched by the time the mapper is ready for the next block. That results in a mapper spending more time doing useful work and less time waiting for data, thereby effectively increasing CPU utilization.

More efficient Parquet reads

Parquet files require non-sequential access as dictated by their on-disk format. Our initial implementation did not use a local cache. Each time there was a seek outside of the current block, we had to discard any prefetched data. That resulted in worse performance compared to the stock reader when it came to reading from Parquet files.

We observed significant improvement in the read throughput for Parquet files once we introduced the local caching of prefetched data. Currently, our implementation increases Parquet file reading throughput by 5x compared to the stock reader.

Improvement in production jobs

Improved read throughput leads to a number of efficiency improvements in production jobs.

Reduced job runtime

The overall runtime of a job is reduced because mappers spend less time waiting for data and finish sooner.

Potentially reduced number of mappers

If mappers take sufficiently less time to finish, we are able to reduce the number of mappers by increasing the split size. Such reduction in the number of mappers leads to reduced CPU wastage associated with fixed overhead of each mapper. More importantly, it can be done without increasing the run time of a job.

Improved CPU utilization

The overall CPU utilization increases because the mappers are doing the same work in less time.

Results

For now, our implementation (S3E) is in a separate git repository to allow faster iterations over enhancements. We will eventually contribute it back to the community by merging it back into S3A.

Standalone benchmark

Figure 2: Throughput of S3A vs S3E

In each case, we read a 3.5 GB S3 file sequentially and wrote it locally to a temp file. The latter part is used to simulate IO overlap that takes place during a mapper operation. The benchmark was run on a c5.9xlarge instance in EC2. We measured the total time taken to read the file and compute the effective throughput of each method.

Production run

We tested many large production jobs with the S3E implementation. Those jobs typically use tens of thousands of vcores per run. In Figure 3, we present a summary of comparison between metrics obtained with and without S3E enabled.

Measuring resource savings

We use the following method to compute resource savings resulting from this optimization.

Observed results

Figure 3: Comparison of MapReduce job resource consumption

Given the variation in the workload characteristics across production jobs, we saw vcore reduction anywhere between 6% and 45% across 30 of our most expensive jobs. The average saving was a 16% reduction in vcore days.

One thing that is attractive about our approach is that it can be enabled for a job without requiring any change to a job’s code.

Future direction

At present, we have added the enhanced implementation to a separate git repository. In the future, we would likely update the existing S3A implementation and contribute back to the community.

We are in the process of rolling out this optimization across a number of our clusters. We will publish the results in a future blog.

Given that the core implementation of S3E input stream does not depend on any Hadoop code, we can use it in any other system where large amounts of S3 data is accessed. Currently we are using this optimization to target MapReduce, Cascading, and Scalding jobs. However, we have also seen very encouraging results with Spark and Spark SQL in our preliminary evaluation.

The current implementation can use further tuning to improve its efficiency. It is also worth exploring if we can use past execution data to automatically tune the block size and the prefetch cache size used for each job.

To learn more about engineering at Pinterest, check out the rest of our Engineering Blog, and visit our Pinterest Labs site. To view and apply to open opportunities, visit our Careers page.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Video Platform Engineer
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Video is becoming the most important content format on Pinterest ecosystem. This role will act as an architect for Pinterest video platform, which responsible for the whole lifecycle of a video from uploading, transcoding, delivery and playback. The video architect will oversee Pinterest video platform strategy, owns the direction of what will be our next strategic investment to strengthen our video platform, and land the strategy into major initiatives towards the directions.

What you'll do: 

  • Lead the optimization and improvement in video codec efficiency, encoder rate control, transcode speed, video pre/post-processing and error resilience.
  • Improve end-to-end video experiences on lossy networks in various user scenarios.
  • Identify various opportunities to optimize in video codec, pipeline, error resilience.
  • Define the video optimization roadmap for both low-end and high-end network and devices.
  • Lead the definition and implementation of media processing pipeline.

What we're looking for: 

  • Experience with AWS Elemental
  • Solid knowledge in modern video codecs such as H.264, H.265, VP8/VP9 and AV1. 
  • Deep understanding of adaptive streaming technology especially HLS and MPEG-DASH.
  • Experience in architecting end to end video streaming infrastructure.
  • Experience in building media upload and transcoding pipelines.
  • Proficient in FFmpeg command line tools and libraries.
  • Familiar with popular client side media frameworks such as AVFoundation, Exoplayer, HLS.js, and etc.
  • Experience with streaming quality optimization on mobile devices.
  • Experience collaborating cross-functionally between groups with different video technologies and pipelines.

#LI-EA1

Senior Software Engineer, Data Privacy
Dublin, IE

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Data Privacy Engineering team builds platforms and works with engineers across Pinterest to help ensure our handling of customer and partner data meets or exceeds their expectations of privacy and security.  We’re a small, and growing, team based in Dublin.  We own three major engineering projects with company-wide impact: expanding and onboarding teams doing big data processing to a new fine-grained data access platform, tracking how data moves and evolves through our systems, and ensuring data is always handled appropriately.  As a Senior Engineer, you’ll take a driving role on one of these projects and responsibility for working with internal teams to understand their needs, designing solutions, and collaborating with teams in Dublin and the US to successfully execute on your plans.  Your work will help ensure the safety of our users’ and partners’ data and help Pinterest be a source of inspiration for millions of users.

What you’ll do:

  • Consult with engineers, product designers, and security experts to design data-handling solutions
  • Review code and designs from across the company to guide teams to secure and private solutions
  • Onboard customers onto platforms and refine our tools to streamline these processes
  • Mentor and coach engineers and grow your technical leadership skills, with engineers in Dublin and other offices.
  • Grow your engineering skills as you work with a range of open-source technologies and engineers across the company, and code across Pinterest’s stack in a variety of languages

What we’re looking for:

  • 5+ years of experience building enterprise-scale backend services in an object-oriented programing language (Java preferred)
  • Experience mentoring junior engineers and driving an engineering culture
  • The ability to drive ambiguous projects to successful outcomes independently
  • Understanding of big-data processing concepts
  • Experience with data querying and analytics techniques
  • Strong advocacy for the customer and their privacy

#LI-KL1

Software Engineer, Key Value Systems
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest brings millions of Pinners the inspiration to create a life they love for everything; whether that be tonight’s dinner, next summer’s vacation, or a dream house down the road. Our Key Value Systems team is responsible for building and owning the systems that store and serve data that powers Pinterest's business-critical applications. These applications range from user-facing features all the way to being integral components of our machine learning processing systems. The mission of the team is to provide storage and serving systems that are not only highly scalable, performant, and reliable, but also a delight to use. Our systems enable our product engineers to move fast and build awesome features rapidly on top of them.

What you’ll do

  • Build, own, and improve Pinterest's next generation key-value platform that will store petabytes of data, handle tens of millions of QPS, and serve hundreds of use cases powering almost all of Pinterest's business-critical applications
  • Contribute to open-source databases like RocksDB and Rocksplicator
  • Own, improve, and contribute to the main key-value storage platform, streaming write architectures using Kafka, and additional derivative
  • RocksDB-based distributed systems
  • Continually improve operability, scalability, efficiency, performance, and reliability of our storage solutions

What we’re looking for:

  • Deep expertise on online distributed storage and key-value stores at consumer Internet scale
  • Strong ability to work cross-functionally with product teams and with the storage SRE/DBA team
  • Fluent in C/C++ and Java
  • Good communication skills and an excellent team player

#LI-KL1

Head of Ads Delivery Engineering
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is on a mission to help millions of people across the globe to find the inspiration to create a life they love. Within the Ads Quality team, we try to connect the dots between the aspirations of pinners and the products offered by our partners. 

You will lead an ML centric organization that is responsible for the optimization of the ads delivery funnel and Ads marketplace at Pinterest. Using your strong analytical skill sets, thorough understanding of machine learning, online auctions and experience in managing an engineering team you’ll advance the state of the art in ML and auction theory while at the same time unlock Pinterest’s monetization potential.  In short, this is a unique position, where you’ll get the freedom to work across the organization to bring together pinners and partners in this unique marketplace.

What you’ll do: 

  • Manage the ads delivery engineering organization, consisting of managers and engineers with a background in ML, backend development, economics and data science
  • Develop and execute a vision for ads marketplace and ads delivery funnel
  • Build strong XFN relationships with peers in Ads Quality, Monetization and the larger engineering organization, as well as with XFN partners in Product, Data Science, Finance and Sales

What we’re looking for:

  • MSc. or Ph.D. degree in Economics, Statistics, Computer Science or related field
  • 10+ years of relevant industry experience
  • 5+ years of management experience
  • XFN collaborator and a strong communicator
  • Hands-on experience building large-scale ML systems and/or Ads domain knowledge
  • Strong mathematical skills with knowledge of statistical models (RL, DNN)

#LI-TG1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like