How LaunchDarkly Serves Over 4 Billion Feature Flags Daily

13,824
LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.

Editor's note: By John Kodumal, CTO, LaunchDarkly



LaunchDarkly Platform


Background

Feature flagging (wrapping a feature in a flag that’s controlled outside of deployment) is a technique for effective continuous delivery. For example, you can wrap a new signup form in a feature flag and then control which users see that form, all without having to redeploy code or modify a database. Engineering-driven companies (think Google, Facebook, Twitter) invest heavily in custom-built feature flag management systems to roll features out to whom they want, when they want. Smaller companies build and maintain their own feature flagging infrastructure or using simple open source projects that often don't even have a UI. I was previously an engineering manager at Atlassian, where I’d seen a team work on an internal feature flagging system, so I was aware of the complexity of the problem and the investment required to build a product that addressed the needs of larger development teams and enterprises. That’s where we saw an opportunity to start LaunchDarkly.


LaunchDarkly Platform


We're currently serving over 4 billion feature flag requests per day for companies like Microsoft, Atlassian, Ten-X, and CircleCI. Many of our customers report that we’ve changed the way they do development-- we de-risk new feature launches, eliminate the need for painful long-lived branches, and empower product managers, QA, and others to use feature flags to improve their users’ experience.

General Architecture

You can think of LaunchDarkly as being split up into three pieces: a monolithic web application, a streaming API that serves feature flags, and an analytics processing pipeline that's structured as a set of microservices. We've written almost all of this in Go.

Go has really worked well for us. We love that our services compile from scratch in seconds, and produce small statically linked binaries that can be deployed easily and run in a small footprint. I'd done a lot with Scala at Atlassian, but I'd grown frustrated with the slow compilation times and overhead of the JVM. Our monolith has about a 6MB memory footprint— try that on the JVM!

I'm generally not a fan of large web frameworks like Django or Rails. Too much "magic" for me. I prefer to build on top of smaller libraries that serve specific needs. To that end, both our monolith and our microservices rely heavily on a home-built framework layer that uses libraries like Gorilla Mux.

Our framework makes it trivial to add a new resource to our REST API and get a ton of essential functionality out of the box-- with a few lines of code, you get authentication, APM with New Relic, metrics pumped to Graphite, CORS support, and more.

The web application monolith has a pretty standard architecture. Some of the technologies we use include:

  • MongoDB -- as our core application data store. It's popular to make fun of Mongo these days, but we've found it to be a great database technology as long as you don't store too many things in it. Anything you can count on your fingers and toes should be fine.
  • ElasticSearch -- handles user search and segmentation.
  • Redis -- caching, of course.
  • HAProxy -- as a load balancer.


LaunchDarkly Architecture


Serving feature flags, fast

One of the cool and novel parts of LaunchDarkly is our streaming architecture, which allows us to serve feature flag changes instantly. Think of it like a real-time, in-memory database containing feature flag settings. The closest comparison would be something like Firebase, except Firebase is really more focused on the client-side web and mobile, whereas we do that and the server-side.

We use several technologies to drive our streaming API. The most important is Pushpin / Fanout. These technologies abstract us away from managing these long-lived streaming connections and focus on building simple REST APIs.

We also use Fastly as a CDN. Fastly is perfect for us-- we can use VCL to write custom caching rules, and can purge content in milliseconds. If you're caching dynamic content (as opposed to say cat GIFs), or you find yourself needing to purge content programmatically, or you want the flexibility of Varnish in addition to the global network of POPs a CDN can provide, Fastly is the best choice out there. Their support team is also fantastic.

When assembled together, these technologies allow our customers to change their feature flag settings on our dashboard and have their new rollout settings streamed to thousands of servers in a hundred milliseconds or less.

Analytics at scale

The other huge component of LaunchDarkly is our analytics processing pipeline. Our customers request over 4 billion feature flags per day, and we use analytics data from these requests to power a lot of the features in our product. A/B testing is an obvious example, but we also do things like determine when a feature flag has stopped being requested, so that you can manage technical debt and clean up old flags.

Our current pipeline involves an HTTP microservice that writes analytics data to DynamoDB. If we need to do any further processing (say, for A/B testing), then we enqueue another job into SQS. Another microservice reads jobs off of the SQS queue and processes them. Right now, we're actively evolving this pipeline. We've found that when we're under heavy load, we need to buffer calls to DynamoDB while we expand capacity instead of trying to process them immediately. Kafka is perfect for this-- so we're splitting that HTTP microservice into a smaller HTTP service that simply queues events to Kafka, and another service that processes Kafka queues.

We actually use LaunchDarkly to control this evolution. We have a feature flag that controls whether a request goes through our old analytics pipeline, or the new Kafka-based pipeline we're rolling out. Once the new pipeline is enabled for all customers, we can clean up the code and switch over completely to the Kafka pipeline. This is a use case that surprises a lot of customers-- they think of feature flags in terms of controlling user-visible features (release toggles), but they are extremely valuable for other use cases like ops toggles, experiments, and permission management.

LaunchDarkly Platform

As we scaled this service out to handle tens of thousands of request per second, we learned an important lesson about microservice construction. When we first built many of these services, we thought in terms of building a separate service per concern. For example, we’d build a service that would read in analytics events and serve the autocomplete functionality on the site. The web application would make a sub-request to this service when it had an autocomplete request from the site.

We quickly learned that the need for fault tolerance and isolation trumps the conceptual neatness of having a service per concern. With fault tolerance in mind, we sliced our services along a different axis-- separating high-throughput analytics writes from the lower-volume read requests coming from the site. This shift dramatically improved the performance of our site, as well as our ability to evolve and scale the huge write load we see on the analytics side.

Infrastructure

As you might have inferred, we use AWS as our hosting provider. We’re fairly conservative when it comes to adopting new technologies-- deployment for us consists of a set of Ansible scripts that spin up EC2 boxes for our various services. We don’t yet use ECS or Docker containers-- which by extension means we don’t use anything for container orchestration. A long while back, we spiked a migration to Mesosphere but we ran into enough issues that we didn’t proceed forward. We do think that these technologies are the future, but that future is not now, at least for us.

So maturity is one issue that prevents us from adopting some of the latest whiz-bang ops technology. There are other technologies that we find interesting, like Amazon’s API Gateway but the pricing models just don’t work for us-- at tens of thousands of requests per second, they’re non-starters.

Other services

For customer communications and support, we use Intercom, Slack, and GrooveHQ. We also recently started using elevio, and we've found it's a great way to turn Intercom questions into trackable support tickets.

We use ReadMe.io for our product and developer API documentation, GitHub holds all our code hostage, and CircleCI helps us integrate continuously.

What’s next?

We’re constantly evolving our service to improve efficiency and scale. Besides the Kafka switchover, we’re looking at using Cassandra for some of the work that DynamoDB is doing right now. We also are keenly interested in Disque as a queuing solution, especially because we’ve had so much positive experience with Redis.

More aspirationally, we might try spiking some of our new services in Rust. I’m a functional programmer at heart, and while I am appreciative of the speed and tooling around Go, it would be nice to regain some of the expressiveness and elegance of a functional language while retaining what we like about Go (the fast compilation times, ease of deployment). If we do try it out, we’ll do so in a cautious manner, and isolate the trial to a new microservice somewhere.

LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.
Tools mentioned in article
Open jobs at LaunchDarkly
Solutions Engineer
Oakland, CA
As a Solutions Engineer, you will educate and guide prospects on the proper implementation of LaunchDarkly's SaaS product and Private Instances. You are passionate about trends and technologies involved in modern application development. You will be the technical voice during our sale and ensure our customers are comfortable with the way our systems work. You are passionate about the developer tools space and helping development teams eliminate risk and deliver value. LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and GitHub, and you'll have an immediate impact on our product and customers. Software powers the world and LaunchDarkly empowers all teams to deliver and control their software.
  • Evangelize and advise customers on the importance and different uses of feature flags and how to administer them
  • Create solutions to customer's challenges implementing feature flags across large monolith and microservice applications, large organizations, and different technology stacks
  • Become a domain expert on LaunchDarkly architecture
  • Demo LaunchDarkly product to technical and business audiences
  • Become a subject matter expert on LaunchDarkly and communicate our value and features to potential customers
  • Be the voice of the customer by translating, aggregating, and representing customer feedback to the Product and Engineering teams

  •  4+ years of experience consulting with enterprise customers and large development teams
  • You led successful technical proof of concepts 
  • Proven success in building strong customer relationships
  • Ability to learn and synthesize large amounts of information with little context
  • Effective communicator with the ability to simplify complex technical concepts
  • A self‐starter and problem solver, willing to take on hard problems and work independently when necessary.
  • Experience working with teams that underwent development process transformation
  • Familiarity with at least one of our supported languages: Java, .NET, GO, JS, Python, PHP, Node, Ruby, Rails, iOS, or Android
  • Experience with data persistence technologies like Varnish or Redis
  • Customer Success Engineer
    Oakland, CA /
    Customer Success Engineers at LaunchDarkly are an elite team who help companies achieve progressive delivery. Customer Success Engineers train users, advise customers on how to integrate LaunchDarkly and create custom solutions for our customers. By joining LaunchDarkly, you will work with software development teams at some of the most advanced companies across industries, including Technology, Finance & Insurance, Pharmaceuticals & Life Science, Entertainment, and more.   LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and GitHub, and you'll have an immediate impact on our product and customers. LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and GitHub, and you'll have an immediate impact on our product and customers. Software powers the world and LaunchDarkly empowers all teams to deliver and control their software.
  • Work with LaunchDarkly’s most strategic customers to ensure their success.
  • Advise our customers on software development best practices and how to leverage LaunchDarkly. 
  • Plan, own, and conduct training for LaunchDarkly’s largest customers. 
  • Actively commit to helping the customer success engineering team iterate to excellence. 
  • Become a subject matter expert on LaunchDarkly.
  • Be the voice of the customer by translating, aggregating, and representing customer feedback to the Product and Engineering teams.
  • You learn and synthesize large amounts of information with little context.
  • You are an effective communicator and you can simplify complex technical concepts.
  • You are a self‐starter and excited to take on hard problems. 
  • You are passionate about helping customers and have a strong sense of ownership.
  • You can effectively communicate with experts from different backgrounds, and build strong stakeholder relationships.
  • You have a technical background and are interested in a customer-facing role.
  • You are familiar with the software development lifecycle. 
  • You have worked with teams that underwent development process transformation.
  • You are comfortable with at least one of our supported languages: Java, .NET, Go, JS, Python, PHP,  NodeJS, Ruby, Rails, iOS, or Android.
  • You are familiar with DevOps, Continuous Integration, and Continuous Delivery. 
  • You have worked with one of the major cloud providers (AWS, Azure, GCP). 
  • You have worked with Linux, Docker, and Virtual Machines.
  • DevOps Engineer
    Oakland, CA
    As a DevOps Engineer, you will help us maintain and scale LaunchDarkly's engineering infrastructure. In addition to our SaaS offering, you will deliver private instances of the LaunchDarkly service for our enterprise customers. You are passionate about system reliability, performance, and security, with an eye toward taking our operations to the next level (from semi-automated to fully automated). LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and Twitter, and you'll have an immediate impact with our product and customers.
  • Deploy and maintain infrastructure hosted in the cloud
  • Research and implement changes to increase site reliability and help us operate more efficiently
  • Participate in an after-hours on-call rotation
  • Practice sustainable incident response and blameless postmortems
  • Work directly with our CTO and development team to refine our architecture
  • You are an effective communicator
  • You are a self‐starter and problem solver, willing to solve hard problems and work independently when necessary. You identify potential problems and nip them in the bud before they surface
  • You play well on a small, tight-knit team
  • You have run large-scale production systems on Linux servers in Amazon Web Services (AWS)
  • You love automating deployment with configuration management tools such as Ansible, Chef, Puppet, Salt, or Terraform. When you want to automate other processes, you reach for Python or bash
  • You have configured and tuned Web proxy servers such as HAProxy, nginx, Apache httpd, or Varnish
  • You can't live without monitoring systems such as Sensu, Nagios, Graphite/Grafana, or Datadog
  • You are familiar with running systems with a microservice-based architecture
  • You have interacted with data persistence technologies such as Elasticsearch, MongoDB, Cassandra, Kafka, or Redis
  • You have written software in Go (Golang), C++, or Java
  • Verified by
    Engineering Lead
    Director Marketing
    VP of Product and Engineering
    You may also like