Serverless vs Apache Spark: What are the differences?
Serverless: The most widely-adopted toolkit for building serverless applications. Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more; Apache Spark: Fast and general engine for large-scale data processing. Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
Serverless can be classified as a tool in the "Serverless / Task Processing" category, while Apache Spark is grouped under "Big Data Tools".
"API integration " is the primary reason why developers consider Serverless over the competitors, whereas "Open-source" was stated as the key factor in picking Apache Spark.
Serverless and Apache Spark are both open source tools. Serverless with 30.5K GitHub stars and 3.38K forks on GitHub appears to be more popular than Apache Spark with 22.3K GitHub stars and 19.3K GitHub forks.
Slack, Shopify, and SendGrid are some of the popular companies that use Apache Spark, whereas Serverless is used by Droplr, Plista GmbH, and Hammerhead. Apache Spark has a broader approval, being mentioned in 263 company stacks & 111 developers stacks; compared to Serverless, which is listed in 112 company stacks and 43 developer stacks.
When adding a new feature to Checkly rearchitecting some older piece, I tend to pick Heroku for rolling it out. But not always, because sometimes I pick AWS Lambda . The short story:
- Developer Experience trumps everything.
- AWS Lambda is cheap. Up to a limit though. This impact not only your wallet.
- If you need geographic spread, AWS is lonely at the top.
Recently, I was doing a brainstorm at a startup here in Berlin on the future of their infrastructure. They were ready to move on from their initial, almost 100% Ec2 + Chef based setup. Everything was on the table. But we crossed out a lot quite quickly:
- Pure, uncut, self hosted Kubernetes — way too much complexity
- Managed Kubernetes in various flavors — still too much complexity
- Zeit — Maybe, but no Docker support
- Elastic Beanstalk — Maybe, bit old but does the job
It became clear a mix of PaaS and FaaS was the way to go. What a surprise! That is exactly what I use for Checkly! But when do you pick which model?
I chopped that question up into the following categories:
- Developer Experience / DX 🤓
- Ops Experience / OX 🐂 (?)
- Cost 💵
- Lock in 🔐
Read the full post linked below for all details
Sign up to add or upvote prosMake informed product decisions
Sign up to add or upvote consMake informed product decisions
What is Serverless?
What is Apache Spark?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions