E-Commerce at Scale: Inside Shopify's Tech Stack

25,535
Shopify
Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of their business, from payments to shipping. The Shopify platform was engineered for reliability and scale, making enterprise-level technology available to businesses of all sizes. Headquartered in Ottawa, Canada, Shopify currently powers over 1,000,000 businesses in approximately 175 countries and is trusted by brands such as Allbirds, Gymshark, PepsiCo, Staples, and many more.

Written by Kir Shatrov, Production Engineer at Shopify


Background

Shopify is a multi-channel commerce platform for small and medium businesses that lets you create a shop and sell products wherever you want: online via web store or social media and offline with a POS card reader. Shopify powers 600K merchants and serves 80K requests per second at peak.

While helping aspiring entrepreneurs to launch their stores, Shopify also holds some of the world's largest sales for the Super Bowl, Kylie Cosmetics, and celebrities like Justin Bieber and Kanye West. These "flash sales" are tricky from an engineering point of view because of their unpredictably large volumes of traffic.

My name is Kir Shatrov and I'm a Senior Production Engineer at Shopify working on the Service Patterns team. Our team owns areas like sharding, scalability and reliability of the platform. We provide guidelines and APIs on how to write software that scales by default, which essentially makes the rest of developers at Shopify our customers. Our team's motto is "make scale invisible for developers".


Engineering at Shopify

Before 2015, we had an Operations and Performance team. Around this time, we decided to create the Production Engineering department and merge the teams. The department is responsible for building and maintaining common infrastructure that allows the rest of product development teams to run their code.

Both Production Engineering and all the product development teams share responsibility for the ongoing operation of our end user applications. This means all technical roles share monitoring and incident response, with escalation happening laterally to bring in any skill set required to restore service in case of problems.


Initial architecture and stack

In 2004, Shopify’s CEO and founder, Tobi Lütke, was building out an e-commerce store for snowboarding products. Unsatisfied with the existing e-commerce products on the market, Tobi decided to build his own SaaS platform using Ruby on Rails.

At that time, Rails wasn't even 1.0 yet, and the only version of the framework was exchanged as a .zip archive by email. Tobi joined Rails creator David Heinemeier Hansson (DHH) and started contributing to Ruby on Rails while building Shopify.

Shopify is now one of the world's largest and oldest Rails apps. It’s never been rewritten and still uses the original codebase, though it has matured considerably over the past decade. All of Tobi’s original commits are still in the version control history.

The bet on Rails greatly shaped how we think at Shopify and empowered us to deliver product as fast as possible. While there are parts of the framework that sometimes make it harder to scale (e.g. ActiveRecord callbacks and code organization), many of us tend to agree with Tobi that Rails is what allowed Shopify to move from a garage startup to a public company.

The core Shopify app has remained a Rails monolith, but we also have hundreds of other Rails apps across the organization. These are not microservices, but domain-specific apps: Shipping (talks with various shipping providers), Identity (single sign on across all Shopify stores), and App Store to name a few. Managing a hundred apps and keeping them up to date with security updates can be tough, so we've developed ServicesDB, an internal app that keeps track of all production services and helps developers to make sure that they don't miss anything important.


ServicesDB ServicesDB in Action


ServicesDB keeps a checklist for each app: ownership, uptime, logs, on-call rotation, exception reporting, and gem security updates. If there are problems with any of those, ServicesDB opens a GitHub issue and pings owners of the app to ask them to address it. ServicesDB also makes it easy to query the infrastructure and answer questions like, “How many apps are on Rails 4.2? How many apps are using an outdated version of gem X? Which apps are calling this service?”.


Our current stack

As is common in the Rails stack, since the very beginning, we've stayed with MySQL as a relational database, memcached for key/value storage and Redis for queues and background jobs.


Shopify Rails Stack


In 2014, we could no longer store all our data in a single MySQL instance - even by buying better hardware. We decided to use sharding and split all of Shopify into dozens of database partitions.

Sharding played nicely for us because Shopify merchants are isolated from each other and we were able to put a subset of merchants on a single shard. It would have been harder if our business assumed shared data between customers.

The sharding project bought us some time regarding database capacity, but as we soon found out, there was a huge single point of failure in our infrastructure. All those shards were still using a single Redis. At one point, the outage of that Redis took down all of Shopify, causing a major disruption we later called “Redismageddon”. This taught us an important lesson to avoid any resources that are shared across all of Shopify.

Over the years, we moved from shards to the concept of "pods". A pod is a fully isolated instance of Shopify with its own datastores like MySQL, Redis, memcached. A pod can be spawned in any region. This approach has helped us eliminate global outages. As of today, we have more than a hundred pods, and since moving to this architecture we haven't had any major outages that affected all of Shopify. An outage today only affects a single pod or region.


Shopify Pods Architecture


As we grew into hundreds of shards and pods, it became clear that we needed a solution to orchestrate those deployments. Today, we use Docker, Kubernetes, and Google Kubernetes Engine to make it easy to bootstrap resources for new Shopify Pods. On the load balancer level we leverage Nginx, Lua and OpenResty which allow us to write scriptable load balancers.

The client-side stack of Shopify Admin has been a long journey. It started with HTML templates, jQuery and prototype.js. We moved to Batman.js, our in-house Single-Page-Application framework (SPA), in 2013. Then, we re-evaluated our approach and moved back to statically rendered HTML and vanilla JavaScript. As the front-end ecosystem matured, we felt that it was time to rethink our approach again. Last year, we started working on moving Shopify Admin to React and TypeScript.

Many things have changed since the days of jQuery and Batman. JavaScript execution is much faster. We can easily render our apps on the server to do less work on the client, and the resources and tooling for developers are substantially better with React than we ever had with Batman.

Another very notable difference is that now we have a much better solution for ensuring business logic does not leak into the client — GraphQL. The Admin becomes just another GraphQL client and follows the same patterns established by the mobile apps: no data persistence, no reliance on the server for anything that needs to be shared between clients, and extremely efficient fetching of resources for a view.


How we build, test, and deploy

The Shopify monolith has around 100K unit tests. Many of those involve heavy ORM calls, so they aren't very fast. To keep the shipping pipeline fast, we've massively invested in our CI infrastructure.

We use BuildKite as a CI platform. What makes BuildKite unique is that it lets you run tests in your own way, on your own hardware while BuildKite orchestrates builds and provides user interface.


Shopify BuildKite


The build of our monolith takes 15-20 minutes and involves hundreds of parallel CI workers to run all 100k tests. Parallel test workers allow us to keep shipping. Otherwise, a single build could take days. We have hundreds of developers shipping new features and improvements every day, and it’s crucial that we keep the continuous integration pipeline fast.

When the build is green, it's time to deploy changes to production. We don't practice staging or canary deploys, instead we rely on feature flags and fast rollbacks in case something goes wrong.


Shopify ShipIt Engine


ShipIt, our deployment tool, is at the heart of Continuous Delivery at Shopify. ShipIt is an orchestrator that runs and tracks progress of any deploy script that you provide for a project. It supports deploying to Rubygems, Pip, Heroku and Capistrano out of the box. For us, it's mostly kubernetes-deploy or Capistrano for legacy projects.


Shopify ShipIt Slack A ShipIt Slack notification sent when your code is being deployed


We use a slightly tweaked GitHub flow, with feature development going in branches and the master branch being the source of truth for the state of things in production. When your PR is ready, you add it to the Merge Queue in ShipIt. The idea behind the Merge Queue is to control the rate of code that is being merged to master branch. In the busy hours, we have many developers who want to merge the PRs, but at the same time we don't want to introduce too many changes to the system at the same time. Merge Queue limits deploys to 5-10 commits at a time, which makes it easier to identify issues and roll back in case we notice any unexpected behaviour after the deploy.

We use a browser extension to make Merge Queue play nicely with the Merge button on GitHub:


Shopify GitHub flow


Both ShipIt and kubernetes-deploy are open source, and we've heard quite a few success stories from companies who have adopted our flow.


Next Challenges

All systems at Shopify have to be designed with the scale in mind. At the same time, it still feels like you're working on a classic Rails app. The amount of engineering efforts put into this is incredible. For a developer writing a database migration, it looks just like it would for any other Rails app, but under the hood that migration would be asynchronously applied to a 100+ database shards with zero downtime. This story is similar for any other aspect of our infrastructure, from CI and tests to deploys.

In Production Engineering, we've put a lot of efforts to migrate our infrastructure to Kubernetes. Some approaches and design decisions had to be evaluated as they were not ready for cloud environments. At the same time, many of those investments into Kubernetes have already started to pay off. What took me days of writing Chef cookbooks before, now is a matter of a couple of changes in Kubernetes' YAML. I expect that our Kubernetes foundation will mature, and unlock us even more possibilities to scale.

With tools like Semian and Toxiproxy, we've done great job at shaping our monolith towards high reliability and resiliency. At the same time, we’re approaching one hundred other production services running at the company — most of them using Rails. With a tool like ServicesDB, we can verify that all of them are using the same patterns as the monolith, spreading the lessons we learned from a decade of operating Rails apps at scale.

Many of these services also need to talk to each other in some way, and how they do it is currently up to them. Some services communicate via a message log like Kafka and some use a REST API over HTTP. Lately, we've been looking into options for Shopify-wide RPC and Service Mesh. I expect that over the next year, we'll define how applications will communicate on our platform in a way that will be resilient and scalable by default.


Like the sound of this stack? Shopify is hiring. Come help us to make commerce better for everyone. Or join Production Engineering, and help us continue to evolve the stack that makes commerce better at Shopify than anywhere else in the world.

Shopify
Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of their business, from payments to shipping. The Shopify platform was engineered for reliability and scale, making enterprise-level technology available to businesses of all sizes. Headquartered in Ottawa, Canada, Shopify currently powers over 1,000,000 businesses in approximately 175 countries and is trusted by brands such as Allbirds, Gymshark, PepsiCo, Staples, and many more.
Tools mentioned in article
Open jobs at Shopify
Plus Technical Certification Program ...
Americas
Shopify Plus is making enterprise commerce simple. We give high growth, high volume merchants the scalability, reliability and flexibility they need. Shopify Plus is helping power commerce for companies like Gymshark, Simba Sleep, Emma Bridgewater, Kanye West, Kylie Cosmetics, and many more. We believe large merchants should love their commerce platform and we work hard each day to make that happen. We are in hyper-growth, and this is where you come in. Shopify Plus can’t make enterprise commerce simple alone. High growth, high volume merchants need an ecosystem of world-class service providers. The Shopify Plus Partnerships team works across the organization to enable scalable growth through relationships with world-class technology and commerce companies.  Using a combination of strong operational acumen and attention to detail, the Plus Technical Certification Program Lead will work with Shopify Plus Partnerships, Technology Services, and Marketing, as well as external companies and stakeholders across the partner ecosystem to deliver and maintain a world-class technical certification program. This is a very hands-on role, operating in a fast-paced environment. Your responsibilities will include: PARTNER CERTIFICATION: You will be a vocal leader and active contributor from ideation and strategy through execution of  foundational projects that will influence how we measure and evaluate our partnerships PROGRAM MANAGEMENT: Contribute to the initial delivery, and then drive the ongoing operations of the technical certification of 3rd-party custom developers in the Shopify Plus ecosystem. This will include the initial build, ongoing management, and metric-driven roadmap development of developer education and evaluation. STAKEHOLDER MANAGEMENT: Nurture strong relationships with internal teams (Sales, Partnerships and Customer Success), external resources (contractors and studios), and target audience (developers in the Shopify Plus ecosystem). Ensure constant alignment of priorities. Effectively navigate the organization to unblock challenges, solicit support, and keep projects moving forward. PROCESS OPTIMIZATION: Use qualitative and quantitative evidence to evolve operations and enhance the effectiveness of education, evaluation, database management. PEOPLE LEADERSHIP: Building and running a world-class certification program is a big effort. You will need to build a high-performing team and coach them to steadily increase their performance.
  • A proven track record of success in driving the execution of projects in high growth environments.
  • Strong understanding of leveraging data to improve program impact
  • Great cross-functional team player, always ready to roll up your sleeves and work in the details.
  • Previous program management experience preferred.
  • A background in enablement operations and success measurement.
  • Excellent verbal and written communication skills, and a relentless attention to detail.
  • Comfort in ambiguity and ability to approach unique challenges.
  • Experience working with developers, preferably in an agency setting. 
  • Experience, training, or exposure to programming languages, custom apps, APIs, etc. You’ll be working with multiple stakeholders and will need to learn to speak their language.
  • Proven ability to coach others.
  • A love for data and how it can help influence strategies and tactics.
  • Knowledge of the ecommerce ecosystem. There are many moving parts and you need to see how they fit together.
  • Exposure to globally dispersed teams. 
  • Software Developer (Remote, China)
    Asia, China
    Shopify is a commerce platform with the mission to make commerce better for everyone. Shopify powers over one million merchants around the world, and we're just getting started. This role will be part of the International Growth team, a segment within Shopify with the mission to make commerce better for everyone... everywhere. Our China team will focus on cross border merchants within China, and related functionalities and features. At Shopify, we ship on quality instead of time. Our teams deploy new code many times a day, and our production scale is massive. Hundreds of thousands of merchants will see your work within minutes – a tough but incredibly rewarding responsibility. We're looking for 3 Software Developers with a passion for solving tough problems with performant code. Developers who want to join a company with a history of contributing to our community through code. This role will work remotely in China, and will be closely working with our R&D teams in APAC and Canada. This role will have the creative freedom to make a real difference in the world of commerce, the support to bring your authentic self to work, and the chance to work with the best in the business.
  • Native speaker of Chinese with high English proficiency.
  • Recent college graduates in relevant majors with great coding skills are also welcome.
  • A product-minded developer who cares about the "Why" - Why build this feature? How will we measure impact?
  • A generalist (or a T-shaped/Full Stack developer) rather than a specialist, excited by problems that require a mix of frontend and backend skills, and unblocking anything that stands in the way of success.
  • Strong foundation of design principles, especially when it relates to platform development in the areas of API, data modelling, and scale
  • Experience in writing automated tests as part of your development workflow (even better if you do it TDD)
  • A passion for efficiency and collaboration, with a history of establishing great relationships with other teams, across offices, and time zones
  • A genuine interest in APAC, emerging markets, cultural nuances and/or product localization; bonus for previous experience working within the ecommerce industry across Asia and in China
  • The curiosity and passion to constantly learn new things; Shopify changes fast, and we need the people who work here to be able to change and learn fast too
  • Familiarity with Ruby and/or Ruby on Rails, or the desire to learn quickly
  • Improve Shopify's reliability and speed
  • Improve Shopify's onboarding experience
  • Improve integration with core marketing platforms 
  • Improve integration with local payment providers, such as PayTM and Cash on Delivery
  • Engaging with teams across Shopify to ensure all new features are built with our international markets in mind
  • Compliance with international laws, such as GDPR and ePrivacy Regulation
  • Focusing on building for the long term, in ways that benefit as many merchants as possible
  • Giving back to the community by open-sourcing some key components
  • Ruby, Ruby on Rails (we also have TypeScript and React)
  • MySQL (some Postgres), Splunk, Datadog, Mode
  • A history of contributing to the developer community through code, documentation, mentoring, teaching, speaking, or organizing events
  • A passion for helping growing development teams and making others better
  • Experience building resilient, scalable services, and have an appreciation for concepts like SLA, and fault tolerance
  • Experience with development on a leading cloud provider (GCP, AWS, Azure, Aliyun, Tencent etc.)
  • A commitment and drive for quality, excellence and results
  • If some of this tech is new to you, that's ok! We know not everyone will come in fully familiar with this stack, and we provide support to learn on the job.
  • Post-Sales Solutions Engineer, Shopif...
    Americas
    The Service Delivery Manager engages with the largest high-growth, high-volume Shopify Plus merchants on short to medium term service deliveries. These service deliveries are focused on a merchant’s overall business plan, and are designed to accelerate their growth and ensure their long term success. Service Delivery Managers act as Shopify advocates. They represent the brand, values, and product offering, while also advocating for the merchant’s needs. The role includes merchant-facing activities through service collaboration, and partnering with several internal teams (including Customer Success, Solutions Engineering, Product and Launch Engineering) to provide front-line context.
  • Planning and executing service deliveries, collaborating closely with Merchant Success Managers and merchant stakeholders, remotely or on-site
  • Managing merchant and internal team’s expectations before, during and after engagements through strong communications
  • Providing ad-hoc technical and product guidance to the Merchant Success, Solutions Engineering and Launch Engineering teams
  • Providing ecommerce expertise through sharing Shopify best practices and industry trends
  • Building relationships at multiple levels within a merchant’s organization to fully understand their business and the merchant’s approach to commerce innovation
  • Opportunistically grow the service catalog to address unmet needs; influenced through merchant context via service engagements, platform growth and change, and merchant segment changes 
  • Supporting, influencing, engaging in and creating content for community events preparation
  • Documenting merchant challenges and solutions, and providing context to influence internal teams
  • Experience engaging with technical and non-technical merchant stakeholders to capture goals and requirements, while demonstrating the value of Shopify as part of their technology stack
  • Experience specifying high level architecture best practices
  • Deep understanding of ecommerce best practices, tooling, digital operations and specialized roles and workflows in ecommerce
  • Exceptional writing and presentation skills to communicate merchant feedback for multiple audiences and all levels of leadership
  • You can take a consultative approach and apply flexible discovery techniques; proven ability to establish yourself as a trusted advisor who prioritizes the merchant experience
  • Ability to write and troubleshoot Liquid templates and Shopify Scripts; experience with REST and GraphQL APIs
  • Ability to prioritize ongoing program delivery alongside multiple projects and requests
  • Note: If some of this tech is new to you, that’s OK! We know not everyone will come in fully familiar with this stack, and we provide support to learn on the job.
  • Experience with integrating or aggregating orders and customers from ecommerce software to/from OMS/WMS/ERP systems
  • Experience with integrating or aggregating product catalogue and inventory data from PIM/ERP/IMS systems to/from ecommerce software
  • Shopify Engineering Talent Community
    Americas
    Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of their business, from payments to shipping. The Shopify platform was engineered for reliability and scale, making enterprise-level technology available to businesses of all sizes. Headquartered in Ottawa, Canada, Shopify currently powers over 1,000,000 businesses in approximately 175 countries and is trusted by brands such as Allbirds, Gymshark, PepsiCo, Staples, and many more. Engineering Talent Community  We’ve launched an exciting new way for you to connect with us directly through the new Engineering Talent Community. This community is part of a new outbound recruiting model for some of our key engineering roles and teams. This move is to ensure we’re constantly improving our hiring process for you and making it best-in-class. The Shopify Talent Acquisition team may reach out to you directly if your experience matches an open role. Becoming a member of the Talent Community does not mean that you are applying for a position, and instead joining a community where our recruiters may reach out for new upcoming/existing opportunities.
  • Learn what it’s like to work in Engineering at Shopify directly from our Developers
  • Ask questions and make connections with others in the community 
  • Get a deeper look at some of our current development projects and how we’re shipping at scale to our 1 million+ merchants
  • Speak with our engineering recruiters about our current vacancies and be considered for open roles/new opportunities
  • You are a software developer, web developer, or mobile developer and you are interested in either an individual contributor, people lead, or technical lead position.
  • You are interested in learning more about what it’s like to work in engineering at Shopify
  • Verified by
    Production Engineering Lead
    Engineering Lead
    You may also like