Redux: Scaling LaunchDarkly From 4 to 200 Billion Feature Flags Daily

6,464
LaunchDarkly
Serving over 20 trillion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.

Written By John Kodumal, CTO and Co-Founder, LaunchDarkly


Background

LaunchDarkly is a feature management platform—we make it easy for software teams to adopt feature flags, helping them eliminate risk in their software development cycles. When we first wrote about our stack, we served about 4 billion feature flags a day. Last month, we averaged over 200 billion flags daily. To me, that's a mind-boggling number, and a testament to the degree to which we're able to change the way teams do software development. Some additional metrics:

  • Our global P99 flag update latency (the time it takes for a feature flag change on our dashboard to be reflected in your application) is under 500ms
  • Our primary Elasticsearch cluster indexes 175M+ docs / day
  • At daily peak, 1.5 million+ mobile devices and browsers and 500k+ servers are connected to our streaming APIs
  • Our event ingestion pipeline processes 40 billion events per day

We've scaled all our services through a process of gradual evolution, with an occasional bit of punctuated equilibrium. We've never re-written a service from scratch, nor have we ever had to completely re-architect any of our services (we did migrate one service from a SaaS provider to a homegrown; more on that later). In fact, from a high level, our stack is very similar to what we described in our earlier post:

  • A Go monolith that serves our REST API and UI (JS / React)
  • A Go microservice that powers our streaming API
  • An event ingestion / transformation pipeline implemented as a set of Go microservices

We use AWS as our cloud provider, and Fastly as our CDN.

Let's talk about some of the changes we've made to scale these systems.

Buy first, build if necessary

Over the past year, we've shifted our philosophy on managed services and have moved several critical parts of our infrastructure away from self-managed options. The most prominent was our shift away from HAProxy to AWS's managed application load balancers (ALBs). As we scaled, managing our HAProxy fleet became a larger and larger burden. We spent a significant amount of time tuning our configuration files and benchmarking different EC2 instance types to maximize throughput. Emerging needs like DDoS protection and auto scaling turned into large projects that we needed to schedule urgently. Instead of continuing this investment, we chose to shift to managed ALB instances. This was a large project, but it quickly paid for itself as we've nearly eliminated the time spent managing load balancers. We also gained DDoS protection and auto scaling "for free".

As we've evolved or added additional infrastructure to our stack, we've biased towards managed services:

  • Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.
  • We also use managed Elasticache instances instead of spinning up EC2 instances to run Redis workloads.
  • In our previous StackShare article, I wrote about a project to incorporate Kafka into our event ingestion pipeline. In keeping with our shift towards managed services, we shifted to Amazon's Kinesis instead of Kafka.

Managed services do have some drawbacks:

  • They're almost never cheaper (in raw dollars) than self-managed alternatives. Pricing is often more opaque, more variable, and hard to predict
  • Much less visibility into the operation, errors, and availability of the service
  • Vendor lock-in

Still, it's a false economy to measure the raw cost of a managed service to an unmanaged service—factor in your team's time and the math is usually pretty clear.

There is one notable case where we've moved from a managed SaaS solution to a homegrown. LaunchDarkly relies on a novel streaming architecture to push feature flag changes out in near real-time. Our SDKs create persistent outbound HTTPS connections to the LaunchDarkly streaming APIs. When you change a feature flag on your dashboard, that change is pushed out using the server-sent events (SSE) protocol. When we initially built our streaming service, we relied heavily on a third-party service, Fanout, to manage persistent connections. Fanout worked well for us, but over time we found that we could introduce domain-specific performance and cost optimizations if we built a custom service for our use case. We created a Go microservice that manages persistent connections and is heavily optimized for the unique workloads associated with feature flag delivery. We use NATS as a message broker to connect our REST API to a fleet of EC2 instances running this microservice. Each of these instances can manage over 50,000 concurrent SSE connections.

At scale, everything is a tight loop

Some of our analytics services receive tens of thousands of requests per second. One of the biggest things we've learned over the past year is that at this scale, there's almost no such thing as premature optimization. Because of the sheer volume of requests, every handler you write is effectively running in a tight loop. We found that to keep meeting our service level objectives and cost goals at scale, we had to do two things repeatedly:

  1. Profile aggressively to identify and address CPU and memory bottlenecks
  2. Apply a set of micro-patterns to handle specific workload

Profiling must be done periodically, as new bottlenecks will constantly emerge as traffic scales and old bottlenecks are eliminated. As an example, at one point, we found that the "front-door" microservice for our analytics pipeline was CPU-bound parsing JSON. We switched from Go's built-in encoding/json package to easyjson, which uses compile-time specialization to eliminate slow runtime reflection in JSON parsing.

We also identified a set of "micro-patterns" that we have extracted as self-contained libraries so they can be applied in appropriate contexts. Some examples:

  • Read coalescing—In a read-heavy workload, expensive calls to fetch data can be queued to await the first read—a kind of memoization. This pattern is encapsulated in Google's singleflight package
  • Write coalescing—The dual of read coalescing. In a write-heavy workload, where last write wins, writes can be queued and discarded in favor of the latest write attempt.
  • Multi-layer caching—In scenarios where an in-process, in-memory cache is necessary for performance, horizontal scaling can reduce cache hit rates. We make our fleet more resilient to this effect by employing multiple layers of caching—for example, backing an in-memory cache with a shared Redis cache before finally falling back to a slower persistent disk-backed store.

These simple patterns improved performance at scale and also helped us deal with bad traffic patterns like reconnection storms.

Get good at managing change

Scaling up isn't just about improving your services and architecture. It requires equal investment in people, processes and tools. One thing we really focused on the process and tools front is understanding change. Better visibility into changes being made to the service had a massively positive impact on service reliability. Here are a few things we did to improve visibility:

  • Internal changelog service: This service catalogues intentional changes being made to the system. This includes deploys, instance type changes, configuration changes, feature flag changes, and more. Anything that could potentially impact the service (either in a positive or negative way) is catalogued here. We couldn't find anything off the shelf here, so we built something ourselves.
  • COGS (cost of goods sold) log: Very similar to our changelog, but focused on price changes to our services. If we scale out a service, or change instance types, or make reserved instance reservations, we add an entry to this log. For us, this is just a Confluence page.
  • Observability / APM: We use a number of services to gain observability into what is happening to our service at runtime. We use a mix of Graphite / Grafana and Honeycomb.io to give us the observability we need. We're big fans of Honeycomb here.
  • Operational and release feature flags: We feature flag most changes using LaunchDarkly. Most new changes are protected by release flags (short-lived flags that are used to protect the initial rollout and rollback of a feature). We also create operational flags—which are long-lived flags that act as control switches to the application. Observability lets us understand change, and feature flags allow us to react to change to maintain availability or improve user experience.
  • Spinnaker / Armory: LaunchDarkly is almost a five year old company, and our methodology for deploying was state of the art... for 2014. We recently undertook a project to modernize the way we deploy our software, moving from Ansible-based deploy scripts that executed on our local machines, to using Spinnaker (along with Terraform and Packer) as the basis of our deployment system. We've been using Armory's enterprise Spinnaker offering to make this project a reality.

Like the sound of this stack? Learn more about LaunchDarkly.

LaunchDarkly
Serving over 20 trillion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.
Tools mentioned in article
Open jobs at LaunchDarkly
Corporate Solutions Engineer
- EMEA
<p data-renderer-start-pos="4780">As a Solutions Engineer, you will educate and guide prospects on the proper implementation of LaunchDarkly's SaaS product and Private Instances.You are passionate about trends and technologies involved in modern application development.&nbsp;You will be the technical voice during our sale and ensure our customers are comfortable with the way our systems work. You are passionate about the developer tools space and helping development teams eliminate risk and deliver value.</p> <p data-renderer-start-pos="5256">LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster.&nbsp;</p> <p data-renderer-start-pos="5456">Software powers the world and LaunchDarkly empowers all teams to deliver and control their software.</p> <h4 id="Responsibilities:" data-renderer-start-pos="5558">Responsibilities:</h4> <ul> <li data-renderer-start-pos="5579">Evangelize and advise customers on the importance and different uses of feature flags and how to administer them</li> <li data-renderer-start-pos="5695">Create solutions to customer's challenges implementing feature flags across large monolith and microservice applications, large organizations, and different technology stacks</li> <li data-renderer-start-pos="5873">Become a domain expert on LaunchDarkly architecture</li> <li data-renderer-start-pos="5928">Demo LaunchDarkly product to technical and business audiences</li> <li data-renderer-start-pos="5993">Become a subject matter expert on LaunchDarkly and communicate our value and features to potential customers</li> <li data-renderer-start-pos="6105">Be the voice of the customer by translating, aggregating, and representing customer feedback to the Product and Engineering teams</li> </ul> <h4 id="Basic-Qualifications:" data-renderer-start-pos="6238">Basic Qualifications:</h4> <ul> <li data-renderer-start-pos="6263">&nbsp;4+ years of experience consulting with enterprise customers and large development teams</li> <li data-renderer-start-pos="6355">You led successful technical proof of concepts&nbsp;</li> <li data-renderer-start-pos="6406">Proven success in building strong customer relationships</li> <li data-renderer-start-pos="6466">Ability to learn and synthesize large amounts of information with little context</li> <li data-renderer-start-pos="6550">Effective communicator with the ability to simplify complex technical concepts</li> <li data-renderer-start-pos="6632">A self‐starter and problem solver, willing to take on hard problems and work independently when necessary.</li> </ul> <h4 id="Preferred-Qualifications:" data-renderer-start-pos="6742">Preferred Qualifications:</h4> <ul> <li data-renderer-start-pos="6771">Experience working with teams that underwent development process transformation</li> <li data-renderer-start-pos="6854">Familiarity with at least one of our supported languages: Java, .NET, GO, JS, Python, PHP, Node, Ruby, Rails, iOS, or Android</li> <li data-renderer-start-pos="6983">Experience with data persistence technologies like Varnish or Redis</li> </ul> <h4><strong>About LaunchDarkly:</strong></h4> <h4><span style="font-weight: 400;">LaunchDarkly is a Feature Management Platform that serves hundreds of billions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.</span></h4> <h4><span style="font-weight: 400;">At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.</span></h4> <h4><span style="font-weight: 400;">Don't let the </span><a href="https://www.theatlantic.com/magazine/archive/2014/05/the-confidence-gap/359815/"><span style="font-weight: 400;">confidence gap</span></a><span style="font-weight: 400;"> get in the way of applying! We'd love to hear from you.</span></h4> <h4><span style="font-weight: 400;">LaunchDarkly is also committed to giving back to our community and is a part of Pledge 1%, an organization that helps companies make this a priority. Through this initiative and its charitable arm, the LaunchDarkly Foundation, the company is committed to such causes as supporting education for the underserved, homelessness relief and moving towards having a net-zero carbon footprint. You can find more about the LaunchDarkly Foundation and the organizations we serve at </span><a href="https://launchdarkly.com/foundation/"><span style="font-weight: 400;">https://launchdarkly.com/foundation/</span></a><span style="font-weight: 400;">. </span></h4> <p><span style="font-weight: 400;">#LI-Remote</span></p>
Backend Engineering Internship, Inter...
- US
<p data-renderer-start-pos="1926"><strong>About the Job:&nbsp;</strong></p> <p data-renderer-start-pos="1926">As a Backend Engineering Intern, you will be joining the Internal Tools Applications Development Team during the summer of 2023. You'll participate in new feature development end-to-end, building applications used throughout the organization. We're looking for someone who thrives on solving difficult backend problems, and putting new features in front of users.&nbsp;</p> <h4 id="What-we-use:" data-renderer-start-pos="2293"><strong data-renderer-mark="true">What we use:</strong></h4> <h4 data-renderer-start-pos="2293">Go, Apex, PostgreSQL, MongoDB, TypeScript, React, Redux, AWS, Terraform, Salesforce, Stripe, Netsuite</h4> <h4 id="What-you'll-have-the-opportunity-to-do:" data-renderer-start-pos="2410"><strong data-renderer-mark="true">What you'll have the opportunity to do:&nbsp;</strong></h4> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="2454">Work in a production environment alongside other Software Engineers writing code that will go live within LaunchDarkly</p> </li> <li> <p data-renderer-start-pos="2576">Craft solutions to high impact problems at the nexus of engineering, finance, and sales</p> </li> <li> <p data-renderer-start-pos="2667">Write well-tested and well-organized production-quality code, with an emphasis on maintainability</p> </li> <li> <p data-renderer-start-pos="2768">Create user-facing features in our API-driven interface</p> </li> <li> <p data-renderer-start-pos="2827">Identify areas of improvement and advocate for best practices</p> </li> <li> <p data-renderer-start-pos="2892">Actively participate in code reviews</p> </li> </ul> <p data-renderer-start-pos="2932"><strong data-renderer-mark="true">Basic Qualifications:</strong></p> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="2957">Majoring in Computer Science/Engineering, or other technical field</p> </li> <li> <p data-renderer-start-pos="3027">Fluency with a server-side web development language (e.g. in Golang, Java / Scala, C++)</p> </li> <li> <p data-renderer-start-pos="3118">Strong computer science fundamentals: data structures, distributed systems, concurrency, and threading</p> </li> <li> <p data-renderer-start-pos="3224">Strong communication and collaboration skills, a positive attitude, and empathy</p> </li> <li> <p data-renderer-start-pos="3307">Self‐starter and problem solver, willing to solve difficult problems and work independently when necessary</p> </li> <li> <p data-renderer-start-pos="3417">You value high code quality, automated testing, and other engineering best practices</p> </li> </ul> <h4><strong>About LaunchDarkly:</strong></h4> <h4><span style="font-weight: 400;">LaunchDarkly is a Feature Management Platform that serves trillions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.</span></h4> <h4><span style="font-weight: 400;">At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.</span></h4> <h4><span style="font-weight: 400;">Don't let the </span><a href="https://www.theatlantic.com/magazine/archive/2014/05/the-confidence-gap/359815/"><span style="font-weight: 400;">confidence gap</span></a><span style="font-weight: 400;"> get in the way of applying! We'd love to hear from you.</span></h4> <h4><span style="font-weight: 400;">LaunchDarkly is also committed to giving back to our community and is a part of Pledge 1%, an organization that helps companies make this a priority. Through this initiative and its charitable arm, the LaunchDarkly Foundation, the company is committed to such causes as supporting education for the underserved, homelessness relief and moving towards having a net-zero carbon footprint. You can find more about the LaunchDarkly Foundation and the organizations we serve at </span><a href="https://launchdarkly.com/foundation/"><span style="font-weight: 400;">https://launchdarkly.com/foundation/</span></a><span style="font-weight: 400;">. </span></h4> <p><span style="font-weight: 400;">#LI-Remote</span></p>
Application Security Engineering Intern
- US
<p data-renderer-start-pos="1029"><strong>About the Job:&nbsp;</strong></p> <p data-renderer-start-pos="1029">As an Application Security Intern, you’re part of a security team during the Summer of 2023, dedicated to ensuring the safety of our customers' data. Your role is to reduce security risks in our platform while enabling the rapid delivery of value by improving the efficiency of our security program. We believe in modern approaches to software security - automate as much as possible, build guardrails not gates, and target security information to the people who can act on it.</p> <h4 id="What-you'll-do:" data-renderer-start-pos="1508">What you'll do:</h4> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="1527">Build automation on top of our platform of security tools to monitor for vulnerabilities and threats</p> </li> <li> <p data-renderer-start-pos="1631">Work with our bug bounty hackers and penetration testers</p> </li> <li> <p data-renderer-start-pos="1691">Build and operate security features in the LaunchDarkly platform</p> </li> <li> <p data-renderer-start-pos="1759">Create secure libraries and tooling as a foundation for our engineering teams</p> </li> <li> <p data-renderer-start-pos="1840">Research and detect new attack vectors</p> </li> </ul> <p><strong>You should have: </strong></p> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="1902">A strong desire to craft secure software</p> </li> <li> <p data-renderer-start-pos="1946">Experience with modern programming languages (e.g. Java, Scala, C#, Ruby, Python, Golang, Node.js, etc.)</p> </li> <li> <p data-renderer-start-pos="2054">Knowledge of HTML and CSS</p> </li> <li> <p data-renderer-start-pos="2083">Strong communication skills, a positive attitude, and empathy</p> </li> <li> <p data-renderer-start-pos="2148">A high bar for quality of code and quality of user experience</p> </li> <li> <p data-renderer-start-pos="2213">Discipline to be a self directed learner</p> </li> <li> <p data-renderer-start-pos="2257">Ability to understand, tackle, and communicate problems from both technical and business perspectives</p> </li> </ul> <h4><strong>About LaunchDarkly:</strong></h4> <h4><span style="font-weight: 400;">LaunchDarkly is a Feature Management Platform that serves trillions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.</span></h4> <h4><span style="font-weight: 400;">At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.</span></h4> <h4><span style="font-weight: 400;">Don't let the </span><a href="https://www.theatlantic.com/magazine/archive/2014/05/the-confidence-gap/359815/"><span style="font-weight: 400;">confidence gap</span></a><span style="font-weight: 400;"> get in the way of applying! We'd love to hear from you.</span></h4> <h4><span style="font-weight: 400;">LaunchDarkly is also committed to giving back to our community and is a part of Pledge 1%, an organization that helps companies make this a priority. Through this initiative and its charitable arm, the LaunchDarkly Foundation, the company is committed to such causes as supporting education for the underserved, homelessness relief and moving towards having a net-zero carbon footprint. You can find more about the LaunchDarkly Foundation and the organizations we serve at </span><a href="https://launchdarkly.com/foundation/"><span style="font-weight: 400;">https://launchdarkly.com/foundation/</span></a><span style="font-weight: 400;">. </span></h4> <p><span style="font-weight: 400;">#LI-Remote</span></p>
Data Engineering Intern
- US
<p data-renderer-start-pos="2343"><strong>About the Job:</strong></p> <p data-renderer-start-pos="2358">LaunchDarkly is seeking a Data Engineering Intern to join our small but rapidly growing data team for Summer 2023. You will work with other data engineers and analytics engineers to shape the direction of our data platform and work with stakeholders across the company to deliver metrics that matter to the business.</p> <p data-renderer-start-pos="2680"><strong>Responsibilities:</strong></p> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="2701">Build the fundamental units of reporting used for solving various business problems</p> </li> <li> <p data-renderer-start-pos="2788">Standardize our data development workflows to alleviate bottlenecks in testing processes</p> </li> <li> <p data-renderer-start-pos="2880">Automate the propagation of warehouse resources to various downstream consumer-facing tools</p> </li> <li> <p data-renderer-start-pos="2975">Design and build scalable ways of consuming source data and identifying anomalous updates on changing dimensions</p> </li> <li> <p data-renderer-start-pos="3091">Scale our existing infrastructure to meet SLAs and refresh targets for operational workflows with data</p> </li> <li> <p data-renderer-start-pos="3197">Form a strong relationship with the members of our team and refine our best practices e.g. naming convention, data modeling, and data quality testing.</p> </li> </ul> <p data-renderer-start-pos="3351"><strong>Qualifications:</strong></p> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="3370">Be pursuing a BS degree or greater in computer science, math, or equivalent experience</p> </li> <li> <p data-renderer-start-pos="3460">A sense of empathy and willingness to understand others' viewpoints</p> </li> <li> <p data-renderer-start-pos="3531">An appetite for learning and teaching</p> </li> <li> <p data-renderer-start-pos="3572">Someone who enjoys wearing multiple hats</p> </li> <li> <p data-renderer-start-pos="3616">Ability to understand, tackle, and communicate problems from both technical and business perspectives</p> </li> <li> <p data-renderer-start-pos="3721">Programming skills in the context of the data world (Python is preferred)</p> </li> <li> <p data-renderer-start-pos="3798">Basic understanding of data modeling</p> </li> </ul> <h4><strong>About LaunchDarkly:</strong></h4> <h4><span style="font-weight: 400;">LaunchDarkly is a Feature Management Platform that serves trillions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.</span></h4> <h4><span style="font-weight: 400;">At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.</span></h4> <h4><span style="font-weight: 400;">Don't let the </span><a href="https://www.theatlantic.com/magazine/archive/2014/05/the-confidence-gap/359815/"><span style="font-weight: 400;">confidence gap</span></a><span style="font-weight: 400;"> get in the way of applying! We'd love to hear from you.</span></h4> <h4><span style="font-weight: 400;">LaunchDarkly is also committed to giving back to our community and is a part of Pledge 1%, an organization that helps companies make this a priority. Through this initiative and its charitable arm, the LaunchDarkly Foundation, the company is committed to such causes as supporting education for the underserved, homelessness relief and moving towards having a net-zero carbon footprint. You can find more about the LaunchDarkly Foundation and the organizations we serve at </span><a href="https://launchdarkly.com/foundation/"><span style="font-weight: 400;">https://launchdarkly.com/foundation/</span></a><span style="font-weight: 400;">. </span></h4> <p><span style="font-weight: 400;">#LI-Remote</span></p>
Verified by
Software Engineer
Computer Science
Physics
Director Marketing
Software Engineer
Engineering Manager
Software Engineer
VP of Product and Engineering
Engineering Lead
Software Engineer
Special Circumstances
Demand Program Manager
You may also like