How imgix Built A Stack To Serve 100,000 Images Per Second

10,738
imgix
imgix is the leading platform for end-to-end visual media processing. With robust APIs, SDKs, and integrations, imgix empowers developers to optimize, transform, manage, and deliver images and videos, at scale through simple URL parameters. imgix reduces development hassle, saves storage cost, and improves web performance.

By Kelly Sutton, ‎Chief Product Officer at Imgix.


What We Do

With over 60% of the average webpage’s weight being image content, serving the best image in the smallest payload is an increasingly critical concern for both businesses and developers. Every additional second of load time for a page will increase its bounce rate by 7%, according to KISSMetrics. Imagine losing 7% of the revenue from an e-commerce website simply because the images were suboptimal!

imgix is a real-time image processing and delivery service that works with your existing images. It was designed from the ground up to empower businesses and developers to serve the most optimal images under any circumstance. With a simple URL API, images are fetched and transformed in real-time then served anywhere in the world via CDN.

Individual customers have been able to eliminate millions of images from storage and generate an infinite number of derivative images on the fly. By resizing, cropping, and adjusting their image content dynamically, they are able to avoid wasting precious bytes and keep their websites snappy.

The Technical Challenge

imgix has more than 80 different URL parameters that can be layered and combined for sophisticated effects. For instance, imgix exposes controls for adjusting lossiness, chroma subsampling rates, color quantization, and more. In addition, imgix can respond to any number of inputs from the various browsers, phones, tablets, and screens displaying an image. The sheer number of parameters and inputs that can affect an image output is growing exponentially.



In designing our infrastructure, we have operated under the presumption that every image request we receive will be uncacheable and must be dynamically created. This fundamentally changes how one thinks about building and scaling their service. Over the years, we have built and rebuilt our infrastructure to be able to handle more than 100,000 images per second with a 90th percentile input filesize of 4.5 MB.  With the current stack, imgix is able to offer the highest quality images at the fastest speeds and the lowest prices.

Engineering at imgix

The engineering team at imgix is highly skilled and proven at working at a large scale. They come from ops and engineering teams at YouTube, Google, Dropbox, Foursquare, and Yahoo. While all of our engineers are polyglots, capable of moving between different languages and frameworks as needed, they trend towards C, Go, Python and Lua when given the opportunity. The vast majority of our technology is written in one of these languages.

The Early Versions

The initial architecture of imgix was built on top of Amazon EC2. After onboarding the initial batch of customers, it became apparent that we would not be able to offer the performance and quality we wanted to target. Most importantly, having worked for large scale Internet companies and managing opex spends for those companies, our architects knew from experience the corners we could be painted into by scaling up in the cloud. We made an early and critical decision to focus on building all core infrastructure in our own metal. The question became what then should we build on.

An early Facebook photos engineer mentioned he had seen remarkable performance and quality coming out of the Apple Core Graphics stack . Though we initially thought it was crazy, our tests quickly showed otherwise. Confronted by this unexpected reality, we made the decision to transition our stack to Apple's Core Graphics. Mac Minis were , at the time, the most effective dollar-per-gigaflop machine on the market specifically for images. So we quickly maxed out our credit cards at the local Apple store on Mac Minis. Some friends with old Linux hardware they were spinning down, donated their servers to us for all of the non-image processing services we would need to run.

In the very early days, a handful of these servers were located in the CEO’s living room. (Don’t worry, they all had redundancies .) Over time, we began to expand into hosted colo facilities in Las Vegas (macminicolo.net) and Dallas (Linode). As we continued to grow, some old colleagues from YouTube offered us some space in their server cabinet at Equinix SV1 in San Jose. We quickly outgrew the 4U of space they gave us there and contracted out two more cabinets of our own. As our traffic continued to grow, we decided we quickly needed to move into our own datacenter space.

The Current Stack

The core infrastructure of imgix is composed of many service layers. There is the origin fetching layer, the origin caching layer, the image processing layer, the load balancing and distribution layer, and the content delivery layer. Additionally, each layer interfaces with omnipresent configuration, logging, monitoring, and supervision services.

Our fetching and caching layers are largely custom built, using MogileFS, nginx, and HAProxy as underlying technologies. The load balancing and distribution layer is based on custom C code and a LuaJIT framework we created called Levee. Levee is capable of servicing 40K requests per second on a single machine. By switching some services from Python to LuaJIT, we have seen a 20x performance increases. We will be open-sourcing it once we feel it’s mature enough. For boundaries, we run a combination of HAProxy, nginx, and OpenResty.

Our image processing layer is our most highly-tuned and custom layer. We run a very high performance custom image processing server that we built using C, Objective-C, and Core Graphics. Since the image operations themselves take a fraction of a millisecond in a GPU, most of our performance work has been in optimizing the path from the network interface/local memory cache into the GPU texture buffer. For images that can fit entirely within the GPU texture buffer, we see performance in the sub-50ms range end-to-end. All changes to the image-processing layer are run through a suite of regression tests to make sure we do not introduce any visual disparities between builds.

For last mile content delivery, we use Fastly  which allows us to hyper-optimize our traffic at the edge using Varnish. With more than 20 global Fastly POPs, all imgix customers receive their images quickly. All of this working in unison means our 90th percentile end-to-end response times is under 700ms for first-fetch, uncached images during peak hours.

Logging each layer is critical to imgix, so we have had to build a comprehensive logging pipeline. We currently use Heka to handle much of the raw aggregation of data, and then feed the data downstream to Riemann, Hosted Graphite and Google BigQuery (real-time data, statistics data, analytics data respectively).

We leverage several open source projects to make managing and monitoring this stack easier. Ansible handles our configuration management while Consul manages service discovery. Prometheus is used for monitoring, which plugs into the company PagerDuty account. We use StatusPage.io to report the current infrastructure status to our customers.

Our web front-end services are completely separate from our core infrastructure. They are built using Angular, Ember, or Tornado depending on the task. These services provide web interfaces to configure and administer your imgix account. We build separate Docker containers for development, testing, and production for each front-end project. We use CircleCI for our internal services and Travis CI for our open-source projects and libraries.

imgix practices continuous integration, and often deploys several times per day. We use GitHub for hosting the repositories for each service. We use GitHub Issues for tracking work in progress, and Trello for planning our roadmap. We practice master-only development and GitHub Flow for iterating on each service.

Discussions around these services happen in Slack or around the proverbial water cooler.

Between the founding of imgix and now, Apple has released hardware which better fits our needs. New image processing nodes are now Mac Pros. We now rack them with a custom 46U rack in our data center. Surprisingly, these passive racks have better space and power utilization than most Linux-based solutions.



In short, imgix is quite a bit more than just ImageMagick running on EC2.

Making Every Image Better

imgix was started to remove the headaches associated with managing images for websites and apps. Creating a performant, fetch-based API is not easy. We are hard at work to level up the current infrastructure to provide some very powerful improvements and to reach new levels of scale. Although this stack is built to handle over 100,000 images per second currently, we are excited to reach for 1M images per second and beyond.

We are hiring.

If you have any questions about our stack, we’re more than happy to answer them in the comments. If you are tired of running your duct tape ImageMagick setup, signing up for imgix is free.


imgix
imgix is the leading platform for end-to-end visual media processing. With robust APIs, SDKs, and integrations, imgix empowers developers to optimize, transform, manage, and deliver images and videos, at scale through simple URL parameters. imgix reduces development hassle, saves storage cost, and improves web performance.
Tools mentioned in article