Scaling PostgreSQL at Thumbtack: Load Balancing And Health Checks

6,459

By Marco Almeida, Site Reliability Engineer at Thumbtack.


Introduction

Running PostgreSQL on a single primary master node is simple and convenient. There is a single source of truth, one instance to handle all reads and writes, one target for all clients to connect to, and only a single configuration file to maintain. However, such a setup usually does not last forever. As traffic increases, so does the number of concurrent reads and writes, the read/write ratio may become too high, a fast and reliable recovery plan needs to exist, the list goes on…

No single approach solves all possible scaling challenges, but there are quite a few options for scaling PostgreSQL depending on the requirements. When the read/write ratio is high enough, there is fairly straightforward scaling strategy: setup secondary PostgreSQL nodes (replicas) that stream data from the primary node (master) and split SQL traffic by sending all writes (INSERT, DELETE, UPDATE, UPSERT) to the single master node and all reads (SELECT) to the replicas. There can be many replicas, so this strategy scales better with a higher read/write ratio. Replicas are also valuable to implement a disaster recovery plan as it’s possible to promote one to master in the event of a failure.

Context

In 2014, Thumbtack was running PostgreSQL 9.1 on two servers: a basic master – slave setup leveraging PostgreSQL’s built-in streaming replication. Our infrastructure was comprised of a few dozen physical machines on SoftLayer running RHEL 5 and we were using HAproxy with Keepalived for load balancing. The future, already being planned for, would be powered by EC2 instances on AWS, running Debian 7 behind Elastic Load Balancers.

As traffic grew, we knew we would need to scale out PostgreSQL further. Thumbtack’s SQL traffic was (and still is) quite read-intensive, with less than 3% of all queries being executed on the master node. This was good news as it meant we could scale out by sending SELECT statements to a cluster of read-only replicas and leaving the master alone to process DML commands.

In order to properly implement this we would need:

  • an arbitrary number of read-only replicas behind a load balancer;
  • the load balancer itself could not be a single point of failure;
  • a way of performing health checks on each server, executed from the load balancer, so that failed nodes would be taken in and out of rotation automatically;
  • to support SoftLayer and AWS environments during the transition period.

Replication, high-availability, and load-balancing

We knew what we wanted the infrastructure to look like from a high-level perspective and had the tools available to implement almost all of it on both providers (Fig. 1).

Thumbtack Postgres Acrhictecture

One critical detail, however, was far from being a solved problem: health checks.

A basic ping on port 5432 was not enough. Performance and replication lag were (and still are!) very important factors to us — if a given replica is lagging behind by more than N (varying according to the database and the cluster we’re connecting to) seconds, we prefer not to use it until it recovers as it would otherwise lead to stale reads.

Custom health checks

Not having found an open source tool that implements powerful enough health-checks for PostgreSQL, we decided to write our own. These were the requirements:

  1. Work equally well on both environments — RHEL 5/HAproxy on Softlayer and Debian 7/ELBs on AWS
  2. Check basic TCP connectivity, on an arbitrary port, with a configurable timeout
  3. Check server availability by running a test query with a time limit — if a server is under load, it may be responding to TCP but not able to process a simple query (SELECT 1). We need to distinguish between these two scenarios, and potentially take different actions
  4. Check replication lag (time elapsed since the last transaction was replayed)
  5. Support custom health checks in the form of SQL queries — extensible and future-proof
  6. Low memory footprint — avoid “stealing” memory from PostgreSQL
  7. Minimal list of external dependencies

A web service, exposing a simple HTTP endpoint, would work in any environment and easily be able to test TCP connectivity. Simple queries and testing replication lag are just a special case of running arbitrary SQL queries as a health check, so we just focused on this one and implemented the others as a form of syntactic sugar.

Programming languages One important decision for delivering a platform independent solution with low memory footprint and minimal dependencies was the choice of the programming language. We considered a few from Python (there was already a reasonably large Python code base at Thumbtack), to Go (we were taking our first steps with it), and even Rust (too immature at the time).

We ended up writing it in C. It was easy to meet all requirements with only one external dependency for implementing the web server, clearly no challenges running it on any of the Linux distributions we were maintaining, and arguably the implementation with the smallest memory footprint given the choices above.

The final result

We named the project pgDoctor and made it publicly available on our Github repository. It uses microhttpd to implement a very simple web service that listens on port 8071, logs to the local7 syslog facility (configurable), and provides a reasonably rich set of configuration parameters. The behavior is quite simple: an HTTP GET request to :8071 returns 200 if all checks pass, 500 otherwise. All errors are logged.

pgDoctor has been running flawlessly on all our PostgreSQL replicas for roughly 3 years now, having gone through two major upgrades (9.1 –> 9.4 –> 9.6). As of now, there are 18 streaming replicas, all running pgDoctor alongside PostgreSQL, and distributed among 4 clusters. Each cluster supports different use cases and requires slightly different health checks.

PostgreSQL replicas are sometimes taken out of rotation. The most common reasons are temporary high replication lag or some transient issue with the underlying EC2 instance. As expected, they are added back to the cluster without any intervention once normality is restored and the health checks succeed.

Figure 2 shows a diagram of (a downsized version of) our production environment:

  • Three availability zones;
  • One master node and two hot-standby instances on different availability zones;
  • Three clusters of read-only replicas, streaming from the master, each with its own load balancer;
  • Several clients, on all availability zones, reading from one or more clusters and writing to the master.

Thumbtack Postgres Architecture 2

Does this sound interesting? There is a lot more to be done. Join Thumbtack and help us build, scale, and operate a high reliability service!

Related work

http://www.severalnines.com/mysql-load-balancing-haproxy-tutorial#issues https://www.digitalocean.com/community/tutorials/how-to-use-haproxy-to-set-up-mysql-load-balancing--3 http://www.severalnines.com/mysql-load-balancing-haproxy-tutorial#issues


Originally posted on Thumbtack Engineering

Tools mentioned in article
Open jobs at Thumbtack
Staff Front End Engineer
in Bay Area until at least July 2021

Have you ever tried to hire a plumber? How about a house cleaner? If you have, chances are it took you way longer than it should. In the era of instant-everything, you shouldn’t have to waste an entire afternoon researching, calling and vetting local service professionals whenever you need one. The market for hiring them is huge — $1 trillion in the US alone — but the process is inefficient and largely offline.

Thumbtack is transforming this experience end-to-end, building a marketplace that matches millions of people with local pros for almost any project. In making these connections, not only do our customers get more done every day, our pros are able to grow their businesses and make a living doing what they’re great at.

These customers and pros come from all walks of life and every zip code in the country. We want our team to reflect that. If you come from an underrepresented background in tech, we strongly encourage you to apply. We challenge ourselves every day to make this a place where you can thrive just the way you are, so we can build a product that does the same for our customers and pros.

About the Engineering Team

At Thumbtack, engineers at every level build products and systems that directly impact our customers and professionals. Our challenges span a wide variety of areas, ranging from building search and booking experiences to optimizing pricing systems, to building tools to help professionals grow their businesses. We believe in tackling these hard problems together as a team, with strong values around collaboration, ownership, and transparency. To read more about the hard problems that our team is taking on, visit our engineering blog.

About the Role

As a Staff Front End Engineer, you’ll focus on bringing the Thumbtack vision to life. You’ll collaborate with other engineers, designers, and product managers to execute on a broad range of projects. This could include building exciting new user experiences, optimizing website performance to make our pages blazingly fast, or researching new technologies to improve our frontend stack.

Responsibilities

  • Drive engineering projects to completion, with a tenacious focus on the business impact of those projects. Skilled at prioritizing tasks to deliver on goals with a sense of urgency.
  • Work closely with product managers and designers to create useful and polished user experiences.
  • Write incredible new components using React, Redux, CSS Modules, GraphQL, and Typescript.
  • Explore new frameworks, languages, and processes while also being able to discuss when (or when not) to use them.
  • Mentor a growing team of multidisciplinary engineers on FE standards and best-practices.
  • Participate in a culture that values thoughtful code reviews and frequent deploys.
  • Mentor others less experienced with frontend technologies

Must-Have Qualifications

If you don't think you meet all of the criteria below but still are interested in the job, please apply. Nobody checks every box, and we're looking for someone excited to join the team.

  • 6+ years of industry experience in web development
  • A deep understanding of JS beyond libraries or frameworks (i.e. not just React)
  • A familiarity with robust FE tooling to support our growing team
  • You know what it means to write high quality, maintainable, testable code, and you enjoy doing it.
  • You can break down complex problems rigorously and understand the tradeoffs necessary to deliver great, impactful products.
  • You’re curious, you’re data-driven, you love to ask questions, and you think critically about problems.
  • You’re comfortable communicating about your work with both technical and non-technical team members, including fellow engineers, product managers, designers, and analysts.
  • You love delivering value to your users and your teammates through your work.

Nice-to-Have Qualifications

  • An appreciation for good design, and the desire to translate visual ideas into working, beautiful code
  • SEO/SEM experience
  • A love of writing documentation and tests to keep your code maintainable for years to come
  • Detailed knowledge of responsive CSS, the box model, semantic markup, and HTML5
  • You’ve owned and driven the development of complex projects or feature areas over the course of several months or years.
  • You’ve demonstrated your ability to thrive in a fast-paced startup environment.

More About Us

Thumbtack is a local services marketplace – one of the largest in the U.S. – that helps millions of people hire local professionals. With hundreds of unique service categories, customers can find a Thumbtack pro for almost anything: landscapers, DJs, personal trainers, even piano teachers. And in making these connections, we empower local pros too. Helping them get new customers and make a living doing what they’re great at.

Founded in 2008 and headquartered in San Francisco, Thumbtack is backed by over $400 million in investment from Sequoia Capital, CapitalG, Tiger Global Management, Javelin Investment Partners and Baillie Gifford.

Thumbtack embraces diversity. We are proud to be an equal opportunity workplace and do not discriminate on the basis of sex, race, color, age, sexual orientation, gender identity, religion, national origin, citizenship, marital status, veteran status, or disability status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

The California Consumer Privacy Policy Act (the "CCPA") obligates covered businesses to disclose to consumers (including employees and job applicants), at or before the point of collecting personally identifiable information ("PII"), the categories of PI to be collected and the purposes for which the categories of PI shall be used. 

In the course of the job application process, we may collect the following categories of PI for the purposes of evaluating you as a job applicant: 

  • Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, driver’s license number, passport number, social security number, or other similar identifiers;
  • Professional or employment-related information. 

We will not collect any additional categories of personal information or use your personal information collected for any other purpose without providing you with additional notice consistent with the CCPA.

Staff Software Engineer, Search and R...
in Bay Area until at least July 2021

Have you ever tried to hire a plumber? How about a house cleaner? If you have, chances are it took you way longer than it should. In the era of instant-everything, you shouldn’t have to waste an entire afternoon researching, calling and vetting local service professionals whenever you need one. The market for hiring them is huge — $1 trillion in the US alone — but the process is inefficient and largely offline.

Thumbtack is transforming this experience end-to-end, building a marketplace that matches millions of people with local pros for almost any project. In making these connections, not only do our customers get more done every day, our pros are able to grow their businesses and make a living doing what they’re great at.

These customers and pros come from all walks of life and every zip code in the country. We want our team to reflect that. If you come from an underrepresented background in tech, we strongly encourage you to apply. We challenge ourselves every day to make this a place where you can thrive just the way you are, so we can build a product that does the same for our customers and pros.

About the Marketplace Dynamics Team

The Marketplace Dynamics team builds the core data driven systems that power our matching and monetization engines. Our mission is to optimize how we match customers and pros, and monetize the transaction to create a growing marketplace where more and more jobs get done every day.

Our challenges are deeply technical and span a wide range of domains from infrastructure, to data, to product. Examples of recent efforts include: building a low latency search serving stack, optimizing ML systems for ranking and pricing, or crafting product features like search auto-suggest. As an engineer on this team, you will have the opportunity to help us build out these nascent data driven systems from the ground up.

Responsibilities

  • Solve tough technical problems in our marketplace, which may involve challenges across infrastructure, machine learning, or product experience.
  • Own and drive major engineering efforts end to end, starting from analysis, prioritization and systems design, to eventually deploying production quality code, all with a tenacious focus on user impact. 
  • Run experiments to test hypotheses about our marketplace.
  • Mentor junior engineers and actively contribute to shared resources in our engineering community.
  • Work collaboratively with cross-functional teams to plan and execute on engineering projects.

Must-Have Qualifications 

If you don't think you meet all of the criteria below but still are interested in the job, please apply. Nobody checks every box, and we're looking for someone excited to join the team.

  • Minimum of 5 years of industry experience in engineering, with at least two years in a technical leadership role. 
  • Have led major efforts or teams to build large scale data driven systems in production, especially in search ranking, ads quality or other products backed by machine learning.
  • You’re fluent in at least one major programming language and would be able to switch between multiple languages. In our stack, we mainly use Golang, Scala, Python, and some PHP.
  • You know what it means to write high quality, maintainable, testable code, and you enjoy doing it.
  • You can break down complex problems rigorously and understand the tradeoffs necessary to deliver great, impactful products.
  • You’re curious, you’re data-driven, you love to ask questions, and you think critically about problems.
  • You’re comfortable communicating about your work with both technical and non-technical team members, including fellow engineers, product managers, designers, and analysts.

Nice-to-Have Qualifications

  • Experience scaling an ads serving or search serving stack at scale.
  • Strong product sense and experience working on a user facing consumer product. 
  • Is quantitative and has a strong understanding of metrics and analytics, and is able to connect data with end user impact.

More About Us

Thumbtack is a local services marketplace – one of the largest in the U.S. – that helps millions of people hire local professionals. With hundreds of unique service categories, customers can find a Thumbtack pro for almost anything: landscapers, DJs, personal trainers, even piano teachers. And in making these connections, we empower local pros too. Helping them get new customers and make a living doing what they’re great at.

Founded in 2008 and headquartered in San Francisco, Thumbtack is backed by over $400 million in investment from Sequoia Capital, CapitalG, Tiger Global Management, Javelin Investment Partners and Baillie Gifford.

Thumbtack embraces diversity. We are proud to be an equal opportunity workplace and do not discriminate on the basis of sex, race, color, age, sexual orientation, gender identity, religion, national origin, citizenship, marital status, veteran status, or disability status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

The California Consumer Privacy Policy Act (the "CCPA") obligates covered businesses to disclose to consumers (including employees and job applicants), at or before the point of collecting personally identifiable information ("PII"), the categories of PI to be collected and the purposes for which the categories of PI shall be used. 

In the course of the job application process, we may collect the following categories of PI for the purposes of evaluating you as a job applicant: 

  • Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, driver’s license number, passport number, social security number, or other similar identifiers;
  • Professional or employment-related information. 

We will not collect any additional categories of personal information or use your personal information collected for any other purpose without providing you with additional notice consistent with the CCPA.

Sr. Manager, Engineering (Growth)
in Bay Area until at least July 2021

Have you ever tried to hire a plumber? How about a house cleaner? If you have, chances are it took you way longer than it should. In the era of instant-everything, you shouldn’t have to waste an entire afternoon researching, calling and vetting local service professionals whenever you need one. The market for hiring them is huge — $700B in the US alone — but the process is inefficient and largely offline. 

Thumbtack is transforming this experience end-to-end, building a marketplace that matches millions of people with local pros for almost any project. In making these connections, not only do our customers get more done every day, our pros are able to grow their businesses and make a living doing what they’re great at.

About the Pro Growth Team

The Pro Growth team is responsible for bringing more pros to Thumbtack. Pros find Thumbtack through a variety of channels including inbound and outbound sales, organic results in search engines (SEO), search ads (SEM), social ads (e.g. Facebook) or native app installs. Pros might also hear about Thumbtack through offline marketing (e.g. TV, Radio) and come directly to the homepage or native app. The team owns all these frontdoor experiences as well as responsibility for optimizing and continually growing pro acquisition through these channels. The team also works closely with marketing partners on our overall brand and performance marketing strategy. 

About the Role

Growth is the lifeblood of any startup, and in this role you will be leading the engineering team directly responsible for it. The team is highly interdisciplinary, encompassing engineers with a variety of skill sets including frontend engineering, native mobile, backend engineering and machine learning/optimization. In this role, you’ll not only partner with product, design and marketing leads to develop and execute on a growth strategy for Thumbtack, but also help build critical infrastructure and systems that power that growth and enable the team to move fast.

Responsibilities

  • Build and grow an amazing engineering team with a strong culture of balancing excellence with moving fast. The team is currently ~6 engineers and growing quickly.
  • Hire and support senior technical leaders and managers within the team to be able to scale the organization in the coming months to achieve aggressive growth goals across organic search, search ads and other emerging channels.
  • Work with engineering leads within the team as well as cross functional leads (including product, marketing, analytics and design) to craft a compelling vision and long term strategy for growth.
  • Set the team up for excellence in execution and anticipate and plan ahead for infrastructure the team needs to build for long term success. This infrastructure could be backend systems, machine learning models, frontend infrastructure for fantastic pagespeed, or experimental infrastructure to enable rapid testing and prototyping of onboarding flows within our native apps.
  • Work closely with other engineering leaders to evolve and continually improve Thumbtack’s engineering culture in a high paced growth environment.

Must-Have Qualifications 

  • Experience in a senior engineering management role.
  • 5-8+ years experience building software at scale.
  • Good knowledge of growth strategies and experience building and scaling growth teams.
  • Strong leadership, communication, and organizational skills.
  • Passion for leading teams, setting vision, and developing highly functioning organizations.
  • Passionate about technology and not afraid to write code.

Nice-to-Have Qualifications

  • Experience at a growth-stage consumer tech company.
  • Experience with a marketplace business.

More About Us

Thumbtack is a local services marketplace – one of the largest in the U.S. – that helps millions of people hire local professionals. With hundreds of unique service categories, customers can find a Thumbtack pro for almost anything: landscapers, DJs, personal trainers, even piano teachers. And in making these connections, we empower local pros too. Helping them get new customers and make a living doing what they’re great at.

Founded in 2008 and headquartered in San Francisco, Thumbtack is backed by over $400 million in investment from Sequoia Capital, CapitalG, Tiger Global Management, Javelin Investment Partners and Baillie Gifford. 

Thumbtack embraces diversity. We are proud to be an equal opportunity workplace and do not discriminate on the basis of sex, race, color, age, sexual orientation, gender identity, religion, national origin, citizenship, marital status, veteran status, or disability status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

The California Consumer Privacy Policy Act (the "CCPA") obligates covered businesses to disclose to consumers (including employees and job applicants), at or before the point of collecting personally identifiable information ("PII"), the categories of PI to be collected and the purposes for which the categories of PI shall be used. 

In the course of the job application process, we may collect the following categories of PI for the purposes of evaluating you as a job applicant: 

  • Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, driver’s license number, passport number, social security number, or other similar identifiers;
  • Professional or employment-related information. 

We will not collect any additional categories of personal information or use your personal information collected for any other purpose without providing you with additional notice consistent with the CCPA.

Verified by
Infra & Data Eng Manager
You may also like