Rust at OneSignal

OneSignal is a high volume mobile push, web push, email, and in-app messaging service.

This post is by Joe Wilm of OneSignal

Earlier last year, we announced OnePush, our notification delivery system written in Rust.

In this post, we will cover improvements in our delivery capabilities since then, an interactive tour of OnePush’s subsystems and reflections of our experience shipping production Rust code. We hope you'll find it insightful!

Delivery Stats

OnePush was built to scale deliveries at OneSignal. To know whether this endeavor was a success, we collect metrics such as historical delivery counts and delivery throughput. Here's how OnePush is performing:

  • OneSignal had ~10,000 users at the start of 2016 and now has over 110,000 at the time of publishing this post. (Over 10x growth!)
  • We've increased the number of daily notifications sent by 20x in the same period.
  • OnePush delivers over 2 billion notifications per week.
  • OnePush is fast - We've observed sustained deliveries up to 125,000/second and spikes up to 175,000/second.

The title image on this post is a screenshot from our live delivery monitoring. Each bar represents deliveries occurring in that second, and each vertical division denotes 5,000 deliveries. The colors represent different platforms like iOS, Android, Chrome WebPush, etc. Every single one of them was delivered by OnePush.


OnePush is comprised of several subsystems for loading notifications, delivering notifications across HTTP/1.1 and HTTP/2, and for processing events and results.

Choosing Rust

Choosing the programming language for a core system is a big decision. If not careful, one could end up with months of time invested and get stuck writing library code instead of the application itself. This is less of a concern with programming languages that have a mature ecosystem, but that's not exactly Rust just yet. On the other hand, Rust enables one to build robust, complex systems quickly and fearlessly thanks to its powerful type system and ownership rules.

Given that we now have a production system written in Rust, it's obvious which side of this trade we landed on. Our experience has been positive overall and indeed we have had fantastic results. The following sections discuss the specific pros and cons we considered for building OnePush in Rust, what risks we accepted on the outset, the successes we had, and issues we ran into.

Reasons to not use Rust

The Rust ecosystem is young. Even if there exists a library for your purpose, it's not guaranteed to be robust enough for a production deployment. Additionally, many libraries today have a "truck factor" of 1. If the library's developer gets hit by a truck, it's going to be on you to maintain it.

Next, Rust's tooling story is weak. You can use tools like Racer and YCM to get pretty far, but they fail in a lot of cases. Good tooling is a necessity, especially for developers that are getting up-to-speed.

Having team members (who may be unfamiliar with Rust) contribute to the project may take a lot of "ramp-up" time. This risk has turned out to be quite real, but it hasn't stopped other members of our team from contributing patches to the project. Mentoring from team members more proficient with the language and familiar with the code base helped a lot here.

Finally, iteration times can be long. This wasn't something we anticipated up front, but build times have become onerous for us. A build from scratch now falls into the category of "go make coffee and play some ping-pong." Recompiling a couple of changes isn't exactly quick either.

Before settling on Rust, we considered writing OnePush in Go. Go has a lot going for it for this sort of application - its concurrency model is perfectly suited for managing many async TCP connections, and the ecosystem has good libraries for HTTP requests, Redis and PostgreSQL clients, and serialization. Go is also more approachable for someone unfamiliar with the language; this makes the code base more accessible to the rest of your team. Go's developer tools have also had more time to mature than Rust's.

Why choose Rust

Despite the negatives and the presence of a good alternative, Rust has a lot going for it that makes it a good choice for us. As mentioned earlier,

Rust enables one to build robust, complex systems quickly and fearlessly thanks to its powerful type system and ownership rules

This is huge. Being able to encode constraints of your application in the type system makes it possible to refactor, modify, or replace large swaths of code with confidence. The type system is our ultimate "move quickly and don't break things" secret weapon.

Rust's error handling model forces developers to handle every corner case. Even if there is a system with the potential to panic, it can be moved into its own thread for isolation. More recently, it has become possible to catch panics within a thread instead of only at the boundary. Languages like Go make it too easy to ignore errors.

Next, OnePush needed to be fast. Rust makes writing multithreaded programs quite easy. The Send and Sync traits work together to ensure such programs are free from data races.

At the end of the day, our OnePush service is just a program optimized for sending a lot of HTTP requests. The library ecosystem offered everything we needed to build this system: An async HTTP/2 client, an async HTTP/1.1 client, a Redis client library and a PostgreSQL client library. We are fortunate that the Rust community is full of talented and ambitious developers who have already published a great deal of quality libraries that suit our specific needs.

Finally, the developer leading the effort had experience and a strong preference for Rust. There are plenty of technologies that would have met our requirements, but deferring to a personal preference made a lot of sense. Having engineers excited about what they are working on is incredibly valuable. Such intrinsic motivation increases developer happiness and reduces burnout. Imagine going to work every day and getting to work on something you're excited about! Developer happiness is important to us as a company. Being able to provide so much by going with one technology versus another was a no-brainer.


Aside from risks associated with not choosing Rust, we had a few additional concerns for this particular project.

As a glorified HTTP client, OnePush needed to be able to send lots of HTTP/1.1 requests very quickly. In the beginning, this wasn't quite as true because of our scale and because Android notifications could be batched into single requests. Going forward, we expected a huge increase in HTTP/1.1 outgoing request volume due to growth and the new WebPush specification with encrypted payloads. Hyper (Rust's HTTP library), had an async branch that was just a prototype when we started. We hoped that, by the time we truly needed an async client, it would be ready.

As it turned out, the initial async Rotor-based branch of Hyper never stabilized since tokio and futures were announced in August 2016. By the time we really needed the async branch, we ended up having to spend a week or two debugging, stress-testing and fixing the Rotor-based hyper::Client. This turned out to be ok since it was a chance to give back to the Rust community.

Since we would be on the nightly channel for serde derive and clippy lints, another risk was spending a lot of time doing rustc upgrades. We avoided this situation by pinning to specific versions of the compiler and upgrading infrequently. When we did upgrade, the process required finding a recent rustc that was supported by both libraries. This will become less of an issue very soon with the advent of Macros 1.1.

Finally, Solicit (Rust's HTTP/2 library) uses three threads per connection. Although this is fine in isolation, having 20,000 connections quickly becomes expensive. We've mitigated this issue by using a short keep-alive to limit the number of active connections and by taking advantage of the Apple's HTTP/2 provider API (APNs), which allows 500 requests in-flight per connection.

Unexpected Issues

For the most part, we knew what we were getting into building such a system in Rust. However, one thorn in our side that we didn't anticipate was rust-openssl upgrades. We are stuck on an earlier version of rust-openssl since the Solicit library depends on an API that has been removed since v0.8.0. This means that we are unable to upgrade other dependencies which rely on rust-openssl until we fix the Solicit issue.

Another minor issue at one time was the limited test framework. A common feature for test frameworks is to have some setup and teardown steps that run before and after a test. We say this issue was minor because we were able to work around its absence by generating many tests declaratively with macros (discussed below).


Writing OnePush in Rust has been hugely successful for us. We've been able to easily meet our performance and scaling goals with the application. OnePush is capable of delivering over 100k notifications per second and efficiently maximizes the use of system resources. Despite being highly multithreaded, race conditions have not been an issue for us. Even better, OnePush needs very little attention. We were able to leave it running without any issues through the holiday break.

Regressions are very infrequent. There's a huge class of bugs in languages like Ruby that just aren't possible in Rust. When combined with good test coverage, it becomes difficult to break things - all thanks to Rust's fantastic type system. This isn't just about regressions either. The compiler and type system make refactoring basically fool-proof. We like to say that Rust enables belligerent refactoring - making dramatic changes and then working with the compiler to bring your project back to a working state.

The macro system has been another big win. Our favorite example of how this saves us engineering time is using macros for writing tests declaratively. For example, a large set of tests we have are for the Terminal. Each test takes some Events as input, and then the state of Redis and Postgres are checked to be correct after processing the event. The macro system enabled us to remove all of the boilerplate for these tests and declaratively say what the event is and what the expected outcome should be. Writing a test for this system today looks like this:

// Invoking terminal test-writing macro
push_test! {
    // The part before the arrow ends up being the test name.
    // The `response` describes an `Event`, and the rest describes the system
    // state after processing it. There are more parameters that can be
    // specified, but the default values are acceptable in this case.
    apns_success => {
        response: apns::Response::Success,
        success: 1,
        sending_done: true
    // .. and so on

Writing a lot of similar tests in this fashion enables us to get a lot of coverage without a lot of work. It also helps us work around the lack of features in the Rust test system (such as before/after hooks).

The final thing we want to comment on here is serde. This library enables adding a #[derive(Deserialize)] attribute to a struct and getting a deserialize implementation. Combined with our serde-redis library, this makes it possible to load data out of Redis like so:

/// A person has a name and an ID.
/// This is just some data with a derived
/// Deserialize implementation
struct Person {
    name: String,
    id: u64

// Gets a `Person` out of redis
let person: Person = redis.hgetall("person")?;

On the left hand side of the line fetching person, there's a binding name with a type annotation. On the right hand side, there's a call to Redis with HGETALL, and a ?. The ? is a bit of error handling; if the request is successful and deserialization works, person will be a valid Person, and the name and id fields can be used directly with knowledge that they were returned from Redis. If something goes wrong, like Redis is unreachable or there is data missing for the Person (such as a missing id), an error is returned from the current function.

This is really powerful! We can just describe our data, add this derive attribute and then safely load the data out of Redis. To get the same effect in a dynamic language, one would need to load this dictionary out of Redis and write a bunch of boilerplate to validate that the returned fields are correct. This sort of thing makes Rust more expressive than many high-level languages.

Open Source

Early adoption in an ecosystem means there are lots of opportunities for open source contributions. The most notable of our contributions is a project called serde-redis, a Redis deserialization backend for serde. We've also had the opportunity to contribute several patches to Hyper's Rotor-based async client. We use that client in OnePush and have made billions of HTTP requests with it.

What's next

We've come far with OnePush, but there's still more work to do! Here's just a few of our upcoming projects related to OnePush:

  • Upgrade to Hyper's Tokio-based async implementation. We probably won't be super early adopters here since we've got an HTTP client with a lot of production miles on it right now.
  • Rework result processing to use futures. The Terminal's concurrency from threads is limited, whereas something backed by mio could have much higher throughput. This would require futures compatible Redis and Postgres clients.
  • Replace Solicit's thread-based async client with a mio-based one. We've actually got a prototype of something from earlier in 2016.

We also have a new internal application written in Rust which we hope to blog about soon! It's a core piece of our monitoring which is responsible for collecting statistics from our production systems and storing them in InfluxDB.


We've had fantastic results building one of our core systems in Rust. It has delivered many billions of notifications, and it's delivering more and more each day. We hope that sharing our experience as early adopters in the Rust ecosystem will be helpful to others when making similar decisions. We've certainly found Rust to be a secret weapon for quickly building robust systems.

Like what we're doing? We're hiring!

OneSignal is a high volume mobile push, web push, email, and in-app messaging service.
Tools mentioned in article
Open jobs at OneSignal
Site Reliability Engineer
San Mateo, California
OneSignal has grown rapidly to where we are today serving billions of HTTP requests daily and sending upwards of 5 billion messages daily. We achieved this scale by leveraging bare metal cloud and writing scale sensitive components in languages like Rust and Go. This potent combination of high performance, low cost hardware with efficient resource utilization has given us an incredible competitive edge. We are hiring SREs to help us continue to scale by operating and engineering the future of our infrastructure. We are maintaining 99.95% uptime today, and we are investing to ensure we maintain that as then business continues to grow and as the product evolves. Your primary task will be software engineering with a focus on infrastructure, operations, and automation. You'll be building systems to run our product, improving internal services, and advising product teams on architecture as it relates to the operability of the service. The systems you'll be responsible include all of the services which power our product. This ranges from off-the-shelf services like haproxy, nginx, Redis, PostgreSQL, Kafka, and etc. to our in-house services such as the Rails web app, various Rust backend services, and our high performance API layer written in Go. You'll be working with Kubernetes to automate our datacenter operations and writing operational services to automate database operations. One of the key challenges in this role is to not only understand systems to the point of being able to manually operate by hand, but also to understand in sufficient detail to write software systems to automate such operations. For some additional context on how we think about SRE, please see the introductory chapter of the Google SRE book.
  • At least 3 years experience working as a software engineer
  • Experience operating reliable production systems at scale
  • Knowledge of Linux systems internals
  • Experience writing networking applications
  • Easily bored running tasks by hand and the ability to automate such tasks
  • Experience with PostgreSQL
  • Operational experience deploying and managing Kubernetes on bare metal
  • Experience writing Kubernetes controllers and operators
  • Recent experience writing Go and/or Rust
  • Past experience as an SRE
  • Experience working with Layers 1-3 of the OSI networking model
  • Experience with any of Redis, Kafka, etcd, ZooKeeper, nginx, haproxy
  • Full Stack Developer
    San Mateo, California
    We’re seeking an experienced Full Stack developer to lead the development of improvements to OneSignal’s dashboard and API. Every day, thousands of clients visit our website and dashboard, and thousands more use our API. Our clients love what we’ve built so far and we can’t wait to make it even better. Your responsibilities will include working closely with a product designer and our clients to help build new features and improve our existing ones. You will primarily program in Ruby on Rails, Typescript, and React. You will also contribute to improving OneSignal’s API and bare-metal infrastructure alongside our backend development team.
  • Get excited about the idea of joining a small but fast growing startup.
  • Enjoy rapid iteration. We ship code multiple times per day.
  • Have at least 2 years of prior experience in roles that include front-end and back-end development at a small or mid-sized business.
  • Know Ruby on Rails, Django, or similar MVC framework.
  • Fluent in Javascript, HTML, and CSS.
  • Experience writing complex queries with MySQL or PostgreSQL.
  • Have worked in an environment where developers have written tests and shared ownership of code.
  • Experience with Webpack, Redux, CSS grid
  • Know Ruby on Rails or similar MVC framework.
  • Experience building and integrating REST API's.
  • Have experience writing queries with MySQL or PostgreSQL.
  • Business Intelligence Engineer
    San Mateo, California
    800K+ mobile app developers and marketing teams use OneSignal to send push notifications, in-app messages, and emails. We started as a YCombinator-backed company. Our founders were frustrated with existing push notification tools, so we built our own system. Our customers can design banners, pop-ups, and interstitials without a single line of code.  When you pick up your smartphone, the first thing you will see are push notifications - maybe there’s a breaking news alert, a football game reminder, a promo from your favorite retailer. Whatever it is, chances are the message you are reading was sent using OneSignal. Now we help businesses send over 5 billion push notifications every single day.  The company has been growing quickly both in terms of revenue and employees. We have raised a total of $34M from investors including SignalFire, Y Combinator, and Rakuten Ventures. OneSignal customers include Volkswagen, Verizon, Burger King, 7 Eleven, Zynga, Virgin Mobile, KFC, and many more. Join us in scaling the business!
  • Help query data from our systems to build reports and analysis to derive actionable insight for the sales team, customer success, marketing, support, product 
  • Run SQL queries for teams to better acquire and retain customers, develop marketing strategies, bill our customers, as well as inform related product decisions
  • Develop in depth reports and dashboards for individual groups across the organization
  • Help evaluate and develop and build automated tracking of KPIs across the business as well 
  • Create automated cohort analysis and revenue bridges to monitor acquisition, expansion, and churn and other Saas metrics
  • Evaluate ways to increase the efficiency of internal data flows and centralize sources of truth including generating a universal customer ID that can span across the organization
  • Build a tool that will allow people across the company to have access to data that will scale with the company growth
  • Connect SaaS tool data into a data warehouse. This could include data from Salesforce, NetSuite, Recurly, and backend entitlement data
  • Assist in architecting and designing a scalable data warehouse that can be connected to a business intelligence tool. 
  • Connect SaaS tool data into a data warehouse. This could include data from Salesforce, NetSuite, Recurly, and backend entitlement data
  • Evaluate ways to increase the efficiency of internal data flows and centralize sources of truth
  • Minimum of 2+ years of experience
  • Skilled at querying relational databases (SQL) and ability to create ETL pipelines
  • Proficiency with programming languages such as Python, Ruby, Java, etc.
  • In-depth experience with business intelligence and analytics tools
  • Strong critical thinking skills and attention to detail
  • Knowledge of database systems such as Postgresql, Hadoop, Hive, Spark, Kafka, etc
  • Experience working at a SaaS company is helpful
  • General Application
    San Mateo, California
    If you don't see a job that you are interested in, please submit your resume for future consideration. 
    Verified by
    Cofounder & CEO, OneSignal
    You may also like