Cost Reduction in Goku

688
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Monil Mukesh Sanghavi | Software Engineer, Real Time Analytics Team; Rui Zhang | Software Engineer, Real Time Analytics Team; Hao Jiang | Software Engineer, Real Time Analytics Team; Miao Wang | Software Engineer, Real Time Analytics Team;


In 2018, we launched Goku, a scalable and high performant time series database system, which served as the storage and query serving engine for short term metrics (less than one day old). In early 2020, we launched GokuL (Goku long term), which extended Goku’s capability by supporting long term metrics data (i.e. data older than a day and up to a year). Both of these completely replaced OpenTSDB. For GokuL, we used 3 clusters of i3.4xlarge SSD backed EC2 instances which, over time, we realized are very costly. Reducing this cost was one of our primary aims going into 2021. This blog post will cover the approach we took to achieve our ambition.

Background

We use a tiered approach to segregate the long term data and store it in the form of buckets.

Table 1: table of a tiered approach

Tiers 1–5 contain the data stored on the GokuL (long term) clusters. GokuL uses RocksDB to store its long term data, and the data is ingested in the form of SST files.

Query Analysis

We analyzed the queries going to the long term cluster and observed the following:

  1. There are very few metrics (approximately ~6K) out of a total of 10B for which data points older than three months were queried from GokuL.
  2. More than half of the GokuL queries had specified rollup intervals of one day or more.

Tier 5 Data Analysis

We randomly selected a few shards in GokuL and analyzed the data. We observed the memory consumption of tier 5 data was much more than all the other tiers (1–4) combined. This was despite the fact that tier 5 contains only one hour of rolled up data, whereas the other tiers contained a mix of raw and 15 minute rolled up data.

Table 2: SST File size for each bucket in MiB

Solutions

It was inferred from the query and tier 5 analysis that tier 5 data (which holds six buckets of 64 days of data each) was the least queried as well as the most disk consuming. We planned our solutions to target this tier as it would give us the most benefits. Mentioned below are some of the solutions which were discussed.

Namespace

Implementation of a functionality called namespace would store configurations like ttl, rollup interval, and tier configurations for a set of metrics following that namespace. Uber’s M3 also has a similar solution. This would help us set appropriate configurations for the select sete.g. set a lower ttl for metrics that do not require longer retention, etc). The time to production for this project was longer, and hence we decided to make this a separate project in the future. This is a project being actively worked upon.

Rollup Interval Adjust for Tier 5 Data

We experimented with changing the rollup interval of tier 5 data from one hour to one day and observed the change in the final SST file(s) size for the tier 5 bucket.

Table 3

The savings that came out of this solution were not strong enough to support putting this into production.

On Demand Loading of Tier 5 Data

GokuL clusters would only store data from tiers 1–4 on startup and would load the tier 5 buckets as necessary (based on queries). The cons of this solution were:

  • Users would have to wait and retry the query once the corresponding tier 5 bucket from s3 had been ingested by the GokuL host.
  • Once ingested, the bucket would remain in GokuL unless thrown away by an eviction algorithm.

We decided not to go with this solution because it was not user friendly.

Tiered Storage

We decided to move tier 5 data into a separate HDD based cluster. While there was some notable difference observed in the query latency, it could be ignored because the number of queries hitting this tier was much less. We calculated that tier 5 was consuming approximately 1 TB of each of the 650 hosts in the GokuL cluster. We decided to use the d2.2xlarge instance to store and serve the tier 5 data in GokuL.

Table 4

The cost savings that came out of this solution were huge. We replaced around 325 i3.4xlarge instances with 111 d2.2xlarge instances, and the cost reduction was huge. We reduced nearly 30–35% of our costs with this change.

To support this, we had to design and implement tier-based routing in the goku root cluster, which routes the queries to short term and long term leaf clusters. This was one of the solutions that gave us a huge cost savings.

In the future, we can evaluate if we can reduce the number of replicas and compromise on availability in opposition to the low number of queries.

RocksDB Tuning

As mentioned above, GokuL uses RocksDB to store the long term data. We observed that the RocksDB options we were using were not optimal for Goku’s data that has high volume and low QPS.

We experimented with using a stronger compression algorithm (ZSTD with level 5), and this reduced the disk usage by 40%. In addition to this, we enabled the partitioned index filter wherein only the top level index is loaded into memory. On top of this, we enabled caching with higher priority for filter and index blocks so that they use the same cache as the data blocks and also minimize the performance impact.

With both the above changes, we noticed that the latency difference was not large and the reduction in data space usage was approximately 50%. We immediately put this into production and shrunk the size and cost of our GokuL clusters by another half.

What’s Next

Namespace

As mentioned, we are actively working on the implementation of the namespace feature, which will help us reduce the long term cluster costs even further by reducing the ttl for most of the current metrics that do not need the high retention anyways.

Acknowledgments

Huge thanks to Brian Overstreet, Wei Zhu, and the observability team for providing and supporting solutions on the table.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
iOS Engineer, Product
San Francisco, CA, US; New York City, NY, US; Portland, OR, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

We are looking for inquisitive, well-rounded iOS engineers to join our Product engineering teams. Working closely with product managers, designers, and backend engineers, you’ll play an important role in enabling the newest technologies and experiences. You will build robust frameworks & features. You will empower both developers and Pinners alike. You’ll have the opportunity to find creative solutions to thought-provoking problems. Even better, because we covet the kind of courageous thinking that’s required in order for big bets and smart risks to pay off, you’ll be invited to create and drive new initiatives, seeing them from inception through to technical design, implementation, and release.

What you’ll do:

  • Build out Pinner-facing frontend features in iOS to power the future of inspiration on Pinterest
  • Contribute to and lead each step of the product development process, from ideation to implementation to release; from rapidly prototyping, running A/B tests, to architecting and building solutions that can scale to support millions of users
  • Partner with design, product, and backend teams to build end to end functionality
  • Put on your Pinner hat to suggest new product ideas and features
  • Employ automated testing to build features with a high degree of technical quality, taking responsibility for the components and features you develop
  • Grow as an engineer by working with world-class peers on varied and high impact projects

What we’re looking for:

  • Deep understanding of iOS development and best practices in Objective C and/or Swift, e.g. xCode, app states, memory management, etc
  • 2+ years of industry iOS application development experience, building consumer or business facing products
  • Experience in following best practices in writing reliable and maintainable code that may be used by many other engineers
  • Ability to keep up-to-date with new technologies to understand what should be incorporated
  • Strong collaboration and communication skills

Product iOS Engineering teams: 

Creator Incentives 

Home Product

Native Publishing

Search Product

Social Growth

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Software Engineer, Infrastructure
San Francisco, CA, US; Palo Alto, CA, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

The Pinterest Infrastructure Engineering organization builds, scales, and evolves the systems which the rest of Pinterest Engineering uses to deliver inspiration to the world.  This includes source code management, continuous integration, artifact packaging, continuous deployment, service traffic management, service registration and discovery, as well as holistic observability and the underlying compute runtime and container orchestration.  A collection of platforms and capabilities which accelerate development velocity while protecting Pinterest’s production availability for one of the world’s largest public cloud workloads. 

What you’ll do:

  • Design, develop, and operate large scale, distributed systems and networks
  • Work with Engineering customers to understand new requirements and address them in a scalable and efficient manner
  • Actively work to improve the developer process and experience in all phases from coding to operation

What we’re looking for:

  • 2+ years of industry software engineering experience
  • Experience building & operating large scale distributed systems and/or networks
  • Experience in Python, Java, C++, or Go or another language and a willingness to learn
  • Bonus: Experience deploying and operating large scale workloads on a public cloud footprint

Available Hiring Teams: Cloud Delivery Platform (Infra Eng), Code & Language Runtime (Infra Eng), Traffic (Infra Eng), Cloud Systems (Infra Eng), Online Systems (Data Eng), Key Value Systems (Data Eng), Real Time Analytics (Data Eng), Storage & Caching (Data Eng), ML Serving Platform (Data Eng)

 

#LI-SG1

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

iOS Engineer
Warsaw, POL

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

What you’ll do:

  • Build product features into existing VOCHI app to enrich it with a lot of video/audio editing tools  (effects, filters, canvas, trim/split/merge, audio effects, speed and other)
  • Knit across teams by collaborating with Product managers and designers and other functions to build smooth Feed and Video editor experience
  • Prototype and create integrative solutions that can be utilized both in VOCHI and Pinterest mobile clients
  • Contribute best-in-class programming skills to develop highly innovative consumer-facing mobile products
  • Contribute to and lead each step of the product development process, from ideation to implementation to release; from rapidly prototyping, running A/B test, to architecting and building solutions that can scale to support millions of users

What we’re looking for:

  • 6+ years of software engineering experience
  • 4+ years of industry experience in developing iOS applications
  • Deep understanding of developing on iOS devices in Swift
  • Deep understanding of Clean Architecture principles, and different design patterns
  • Strong skills and great product sense
  • Knowledge on multi-threading, memory management and caching on mobile application
  • Strong communication skills

 

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Head of Monetization Sciences and ML ...
San Francisco, CA, US; Palo Alto, CA, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our PinFlex landing page to learn more. 

You will lead a data science ML engineering organization that is responsible for data driven insights and ML solutions that aim to optimize the Ads marketplace at Pinterest spanning the both advertiser life-cycle and ads delivery funnel. Using your strong analytical skill sets, thorough understanding of machine learning, online auctions and experience in managing large engineering organizations you’ll advance the state of the art in ML and auction theory while at the same time unlocking Pinterest’s monetization potential.  In short, this is a unique position, where you’ll get the freedom to work across the organization to bring together pinners, content creators and advertisers in this unique marketplace.

What you’ll do:

  • Manage a large organization of ML engineers, data scientists and economists responsible for data drive insights and ML solutions that power the monetization efforts at Pinterest
  • Provide technical and organizational leadership to grow the organization in alignment with the business needs
  • Manage and grow managers of managers and principal ML experts
  • Collaborate with engineering and product leadership to define and execute the Monetization vision for Pinterest 
  • Build strong and effective XFN collaborations across the company with Sales, BizOps, Data eng and Core

What we’re looking for:

  • MSc. or Ph.D. degree in Economics, Statistics, Computer Science or related field
  • 12+ years of relevant industry experience
  • 8+ years of management experience
  • XFN collaborator and a strong communicator
  • Bridge builder who drives alignment across the engineering and product organization at Pinterest
  • Hands-on experience building large-scale ML systems and/or Ads domain knowledge
  • Strong mathematical skills with knowledge of statistical models

#LI-TG1

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Verified by
Software Engineer
Sourcer
Software Engineer
Talent Brand Manager
Tech Lead, Big Data Platform
Security Software Engineer
You may also like