How Mixmax Uses Node and Go to Process 250M Events a day

3,045
Mixmax
Bringing email into the 21st century

Background

Mixmax is the product that your team uses to communicate with the outside world. What Slack did for internal communication, we’re doing for email and external communication.

Building a communication platform means processing a TON of data. Our backend, built primarily in Node and Go, processes up to 250M events a day with 200k/minute at peak load. As the glue for an organization’s communication, not only are we processing a huge number of internal events, but we’re also processing data from external sources like CRMs and ATSs totalling 3.2 million events and amounting to a data volume exceeding 14 GB each hour. We've already scaled our platform up 2x in the past 3 months and plan to grow another 10x this year, all while maintaining strict "three 9's" uptime that our customers expect, as they rely on Mixmax all day to get their work done.

I’m the Head of Platform Engineering at Mixmax, which means that I spend most of my time supporting and unblocking our engineering teams. I’ve spent most of my time working in SaaS with a stint in security.

Mixmax Engineering

Our engineering team currently consists of 15 engineers with highly varied backgrounds. Today, everyone on the team is a full-stack engineer, although all of us have our own strengths (i.e. Elasticsearch, MongoDB, devops, security, etc). It makes for an amazing mix with everyone bringing their own superpower to the table. Our team is also highly distributed, with engineers in Australia, Canada, Mexico, and the US.

We’re self-organized into a constantly varying number of teams. We have two evergreen teams, our Core and Support teams, which are responsible for the two pillars of engineering departments - stability and quality. Beyond those two teams, we create teams around our product priorities. This means each product team lives for the duration of the development life-cycle, and no longer. This dynamic nature allows us to more seamlessly share and distribute knowledge across the team so that we’re all constantly learning and growing. Teams are also cross-functional, which helps us have consistent and open feedback between everyone in engineering, product and design. Each team defines what their success and failure metrics are, as well as they will measure their own progress (some teams do two week sprints, some do Kanban, etc). The one constant is that every team agrees and publicizes the metrics that they use to monitor their own success.

Orthogonal to our teams, we have two guilds: our web guild and our platform guild. Our guilds are for helping us improve our and develop our individual strengths, not for siloing knowledge. Our guilds focus on elevating best practices for their areas of ownership, as well as helping to mentor and provide safety nets for members outside their guild. One clear distinction that we draw, is that we explicitly ensure that guild members are not the only ones during work that would fall into their areas. First and foremost, all engineers are expected to focus on their team, helping them to achieve their goals - as an example, this means that “platform work” is not meant to be worked on by only platform guild members.

Initial architecture and application evolution

Mixmax was originally built using Meteor as a single monolithic app. As more users began to onboard, we started noticing scaling issues, and so we broke out our first microservice: our Compose service, for writing emails and Sequences, was born as a Node.js service. Soon after that, we broke out all recipient searching and storage functionality to another Node.js microservice, our Contacts service. This practice of breaking out microservices in order to help our system more appropriately scale, by being more explicit about each microservice’s responsibilities, continued as we broke out numerous more microservices.

This resulted in a system with many Node.js microservices and one still fairly large Meteor service. All of these Node.js services did, and still do, run on Elastic Beanstalk in AWS as we optimized for developer velocity by using a managed deployment platform. The Meteor app ran in Galaxy, which had necessitated that we use a subdomain-based microservice approach for that main Meteor app to talk to the other microservices.

As we began to scale super quickly, with more and more customers joining the platform, we started to see that the Meteor app was still having a lot of trouble scaling due to how it tried to provide its reactivity layer. To be honest, this led to a brutal summer of playing Galaxy container whack-a-mole as containers would saturate their CPU and become unresponsive. I’ll never forget hacking away at building a new microservice to relieve the load on the system so that we’d stop getting paged every 30-40 minutes. Luckily, we’ve never had to do that again! After stabilizing the system, we had to build out two more microservices to provide the necessary reactivity and authentication layers as we rebuilt our Meteor app from the ground up in Node. This also had the added benefit of being able to deploy the entire application in the same AWS VPCs. Thankfully, AWS had also released their ALB product so that we didn’t have to build and maintain our own websocket layer in EC2. All of our microservices, except for one special Go one, are now in Node with an Nginx frontend on each instance, all behind ELBs or ALBs running in Elastic Beanstalk.

Data storage at Mixmax

Originally, we had a single Mongo replica set that we stored everything on. As we scaled, we realized two things:

  • A single Mongo replica set wasn’t going to cut it for our many quickly growing collections
  • Analytics and rich searching don’t scale well in Mongo.

To solve for the first item, we now run multiple large scale Mongo deployments with a mix of replica sets and sharded replica sets (depends on the application activity for the given database). In solving for the second item, we now run multiple large Elasticsearch deployments to provide the majority of our rich searching functionality.

We also heavily use Redis across the entire platform for things like distributed locking, caching, and backing part of our job queuing layer. This has led to our most recent (and ongoing!) scaling challenge.

(here’s a screenshot of the tool that we use to administer to our worker queues that live on Redis)

Asynchronous processing at Mixmax

At Mixmax, we have multiple queueing systems running that all exhibit very different behaviours, due to all the different ways that our platform is used. We’ve gone through quite a few Redis-backed job queueing technologies before we arrived at our current place (from Kue to bull-queue to bee-queue to a mix of bee-queue and AWS Kinesis). Our current stack, a mix of bee-queue and AWS Kinesis, allows us to both seamlessly handle our steadily active queues (i.e. for sending emails) and weather the storm of work that powers our CRM syncing engines. This has been a really fun challenge, as part of this system handles in the high hundreds of millions of jobs a day with sporadic spikes of millions of jobs per minute. We’ve made huge progress here, and we still have a lot of progress to make as we continue to scale this asynchronous processing system.

How we ship

Our workflow centers around getting code live ASAP. Our CI pipeline is centered around GitHub as our VCS tied into TravisCI. Our CD pipeline then continues on from there using AWS Elastic Beanstalk to deploy new application versions.

All developers are able to work on a local copy of the entire infrastructure. Once a developer has their code ready, it goes through review on GitHub - side note, we’re loving all the work that they’re putting into their code review tooling. After code is reviewed and good to go, it lands on our staging environment, where we manually QA a few core flows before we’ll elevate the code to be released on our production environment. For running all of our services locally, we currently use a mix of supervisord and a tool built by one of our engineers named custody.

A huge part of our continuous deployment practices is to have granular alerting and monitoring across the platform. To do this, we run Sentry on-premise, inside our VPCs, for our event alerting, and we run an awesome observability and monitoring system consisting of Statsd, Graphite and Grafana. We have dashboards using this system to monitor our core subsystems so that we can know the health of any given subsystem at any moment. This system ties into our PagerDuty rotation, as well as alerts from some of our CloudWatch alarms (we’re looking to migrate all of these to our internal monitoring system soon).

(screenshot of our monitoring cluster monitoring our strongDM gateways)

Security hygiene at distributed scale

Being a distributed team is in our DNA. One challenge that we’ve faced as a part of being such a distributed team is providing auditible, available, secure and stable access to databases in our private networks for engineers that are authorized and need to have access to them. In a distributed world, auditing database access, credential management and rotation, and onboarding can be a nightmare. Someone running a query on a staging DB that’s taking down the test environment for every? Good luck hunting that down. Have a new engineer onboard and they need to run an audit query on the staging DB to see if their new code might break an old schema? Have fun configuring that. Need to run your periodic credential rotation, ...enjoy. This was not only a huge pain point for our team, but me personally, and then strongDM came into the picture.

strongDM acts as a control plane to manage access to every database and server. By centralizing all database credentials & ssh keys in strongDM, onboarding and offboarding becomes much faster. Simply add a user to a group and since the user never has access to the DB credentials (strongDM handles that) you never need to worry about rotating credentials purely due to employee offboarding. For auditing, since strongDM knows and can monitor each users’ connection, you have direct insight into every single query or access that a user makes - a godsend for auditing. When it comes to periodically rotating keys, it’s even simpler, as your rotating credential sets instead of credentials per user, without any action needed from a single other engineer - it simply works. Our engineers have enjoyed strongDM so much that some have even tweeted about it in moments of pure joy.

I seriously cannot imagine working without strongDM now. It’s one of those tools that seamlessly fits into your workflow and you can’t envision work without it.

(screenshots of strongDM in action)

What’s next? Processing all the things.

It’s an exciting time to be at Mixmax, our entire company is scaling quickly (we’ve grown 4x in the last two years, and this trend isn’t slowing down) along with our customer base. This means that we’re processing more data than ever before, and we’re having to get more and more creative to keep up with the amount of data coming in.

We’re currently prototyping our next generation processing systems, building them out in different languages, with different tech - it’s a fantastic time to join to come help us figure out our future direction as an engineering team all while working on a platform that our customers love!

Mixmax
Bringing email into the 21st century
Tools mentioned in article
Open jobs at Mixmax
QA Automation Engineer
<p><strong>The Opportunity</strong></p> <p><span style="font-weight: 400;">Mixmax is the #1 product helping Revenue teams to become automatically proactive by eliminating busywork for a flawless customer experience.</span></p> <p><strong>Please note this role has to be based in Europe and this person will work in CET time zone daily.</strong></p> <p><strong>Salary: up to 60k USD.</strong></p> <h3><strong>What you’ll be doing:</strong></h3> <p>As a QA Engineer at Mixmax, part of the platform engineering/enablement team, you will leverage your testing and coding expertise to design tests, and maintenance procedures, and automate test capabilities. Your role is pivotal in ensuring Mixmax delivers high-quality solutions to its clients. You'll work closely with our development teams to enhance our product's quality, while also contributing to establishing test frameworks or creating new ones to address specific challenges.</p> <h3><strong data-stringify-type="bold">Preferred Skills and Background:</strong></h3> <ul class="p-rich_text_list p-rich_text_list__bullet" data-stringify-type="unordered-list" data-indent="0" data-border="0"> <li data-stringify-indent="0" data-stringify-border="0">Min. 4+ years of relevant QA automation experience</li> <li data-stringify-indent="0" data-stringify-border="0">Proven experience in software quality assurance</li> <li data-stringify-indent="0" data-stringify-border="0">Understanding of agile principles (Scrum, Kanban)</li> <li data-stringify-indent="0" data-stringify-border="0">Experience with test management tools or test case management systems</li> <li data-stringify-indent="0" data-stringify-border="0">Experience in a distributed codebase with microservices, including defining testing protocols and tools for end-to-end testing</li> <li data-stringify-indent="0" data-stringify-border="0">Attention to detail and a passion for bug detection</li> <li data-stringify-indent="0" data-stringify-border="0">Ability to think "destructively" to enhance Mixmax's product quality</li> <li data-stringify-indent="0" data-stringify-border="0">Advocacy for QA practices within the organisation</li> <li data-stringify-indent="0" data-stringify-border="0">Proficiency in one or more programming languages, such as NodeJS, Typescript, or Scala, and industry-standard test frameworks like Cucumber, Selenium, RestAssured, or Playwright</li> <li data-stringify-indent="0" data-stringify-border="0">Proficiency in version control systems, particularly Git</li> <li data-stringify-indent="0" data-stringify-border="0">Proficiency in Linux environments, with strong command-line skills</li> <li data-stringify-indent="0" data-stringify-border="0">Knowledge and experience using CI/CD processes and tools</li> <li data-stringify-indent="0" data-stringify-border="0">Excellent documentation and communication skills</li> <li data-stringify-indent="0" data-stringify-border="0">Time management flexibility</li> </ul> <div class="p-rich_text_section"><strong data-stringify-type="bold">Desirable Skills:</strong></div> <ul class="p-rich_text_list p-rich_text_list__bullet" data-stringify-type="unordered-list" data-indent="0" data-border="0"> <li data-stringify-indent="0" data-stringify-border="0">Strong problem-solving and debugging skills</li> <li data-stringify-indent="0" data-stringify-border="0">Proficiency in utilizing test automation frameworks for creating scalable and maintainable automated test suites</li> <li data-stringify-indent="0" data-stringify-border="0">Knowledge of DevOps principles, continuous integration/continuous deployment (CI/CD) pipelines</li> <li data-stringify-indent="0" data-stringify-border="0">Experience with API Testing</li> </ul> <div class="p-rich_text_section"><strong data-stringify-type="bold">Additional Assets:</strong></div> <ul class="p-rich_text_list p-rich_text_list__bullet" data-stringify-type="unordered-list" data-indent="0" data-border="0"> <li data-stringify-indent="0" data-stringify-border="0">&nbsp;Experience with performance testing tools such as JMeter, LoadRunner, or Gatling, and understanding of performance testing methodologies for ensuring application scalability and reliability</li> <li data-stringify-indent="0" data-stringify-border="0">Familiarity with security testing concepts, tools like OWASP ZAP or Burp Suite, and knowledge of common security vulnerabilities and mitigation strategies</li> </ul> <div class="p-rich_text_section">We're seeking a strong QA professional who can contribute effectively to our team. If you're passionate about quality assurance and thrive in a collaborative environment, we'd love to hear from you!</div> <p><strong>Our Commitment to Diversity and Inclusion</strong></p> <p><span style="font-weight: 400;">At Mixmax, we know that nobody's perfect and that no one ever matches 100% with a job description. That's okay–we're human after all! Diversity and inclusion are core to our culture, and we're actively committed to building a more inclusive and open workplace. No matter your background or how you identify, if you're excited about this role, please apply today!</span></p> <h3><strong>The Mixmax Story</strong></h3> <p><span style="font-weight: 400;">At Mixmax, our vision is a world without busywork. Since our launch in 2015, Mixmax has become the product of choice for over 10,000 Sales and Customer Success teams to eliminate and automate repetitive work. This means more time to focus on what matters: engaging and serving the needs of customers.</span></p> <p><span style="font-weight: 400;">We’re extraordinarily proud of the company we’ve built. We’re a driven, passionate, responsible group that values personal and professional growth equally. We take care of ourselves, our families, our customers, and one another. We believe in sustainable and diverse approaches to work and life, because optimizing for the long-term is the best path to success.</span></p> <p><span style="font-weight: 400;">Our company is distributed, with remote team members worldwide.&nbsp; And you get to work on a product people absolutely love!</span></p> <p>&nbsp;</p>
VP of Engineering
<p><strong>The Opportunity</strong></p> <p>Mixmax is a leading sales engagement platform for over 10,000 Sales and Customer Success teams, on a mission to create a world without busywork. We are committed to pushing the boundaries of technology and delivering exceptional solutions to our customers. We are 70+ people working as a fully distributed team across the US, LatAm, and Europe. We are profitable and fund growth with our own revenue, which is the best way to create a generational company. As we continue to grow and evolve, we seek a highly skilled and visionary leader to join our team as the VP of Engineering.</p> <p>The VP of Engineering will be pivotal in shaping Mixmax's technological landscape. Reporting directly to the CEO, this leadership position requires a strategic thinker with a proven track record of building and scaling engineering teams. The successful candidate will drive technical excellence, foster innovation, and deliver high-quality products that align with the company's goals and objectives.</p> <p>You have ridiculously high expectations of your teams and love firing them up, no matter what needs to be done. You’re also an active listener, supportive, and empathetic. You love recruiting and building ever-stronger, diverse teams.&nbsp;</p> <p>Last, we hope to excite you about our mission: eliminating busy work for people in customer-facing roles! This is an underserved market; we have the easiest-to-use product in the space. We hope you’re as thrilled about building a category-defining SaaS product as we are!</p> <p>This is a unique opportunity to shape the future of a profitable and fully distributed company at an inflection point in its growth.&nbsp;</p> <p><em>Areas that energize you and where we want you to have an outsize impact:</em></p> <p><strong># Execution &amp; development process&nbsp;</strong></p> <p>We’re the innovation leader in our market, so the product experience is critical to our success. Help take us to the next level: How do we narrow our focus to improve our velocity? How do we structure our teams and modify processes to execute better? How do we ensure more predictable delivery and quantify our progress?&nbsp;</p> <p><strong>#&nbsp; Quality &amp; reliability</strong></p> <p>We power business-critical workflows in companies with thousands of employees, so quality and reliability are paramount. How do we reduce our bug volume, bring down tech debt judiciously, and prevent regressions? What metrics can we show demonstrable progress on? Can quality unlock faster development and growth? Help us evolve our culture and practices. You know how to strike the appropriate balance between scrappiness and durability.</p> <p><strong># Growing the team&nbsp;</strong></p> <p>Help us scale and level up by bringing in new, diverse talent and growing leaders and domain experts internally. We wouldn’t be surprised if you have a following because people know they accelerate their careers under your leadership. On the flip side, you graciously manage low performance. We have the basics of leveling matrices and mentorship/feedback processes and would like your help developing these further. One of our company values is:&nbsp;</p> <p><strong><em>Why not know? Deliver value over perfection. Iterate quickly. Narrow your scope: focus on what matters. Bias to decision and action.¨</em></strong></p> <p>To do this, you need the right people!&nbsp;</p> <p><strong>Ideal background</strong></p> <ul> <li>7+ years of management experience, with at least 2 years of executive leadership experience.&nbsp;</li> <li>Experience working at early-stage startups (&lt;100 people).</li> <li>Experience scaling engineering teams of up to 50 people. Bonus points for doing it in a fully distributed/remote model across time zones.</li> <li>Expertise in building B2B SaaS products</li> <li>Experience building the “easiest-to-use” and/or SMB product in your space.&nbsp;</li> <li>Reported into CEO / worked on an exec team.&nbsp;&nbsp;</li> </ul> <p><strong>Additional Information:</strong> </p> <ul> <li>As a distributed team, we mainly rely on asynchronous communication. That being said, sometimes live collaboration is critical, and you'll want to make yourself available to overlap with teammates across EMEA to US Pacific time zones (probably ~3h per day).&nbsp;</li> <li>Our stack is ES6 Javascript, Node.js, React, MongoDB, Elasticsearch, Redis, AWS (e.g. EC2, RDS, S3, SQS). All in a distributed codebase with microservices and shared modules. Prior experience with this stack is a nice-to-have.</li> </ul>
Verified by
You may also like