Need advice about which tool to choose?Ask the StackShare community!

Serverless

971
878
+ 1
21
Apache Spark

2.1K
2.3K
+ 1
131
Add tool

Serverless vs Apache Spark: What are the differences?

Serverless: The most widely-adopted toolkit for building serverless applications. Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more; Apache Spark: Fast and general engine for large-scale data processing. Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Serverless can be classified as a tool in the "Serverless / Task Processing" category, while Apache Spark is grouped under "Big Data Tools".

"API integration " is the primary reason why developers consider Serverless over the competitors, whereas "Open-source" was stated as the key factor in picking Apache Spark.

Serverless and Apache Spark are both open source tools. Serverless with 30.5K GitHub stars and 3.38K forks on GitHub appears to be more popular than Apache Spark with 22.3K GitHub stars and 19.3K GitHub forks.

Slack, Shopify, and SendGrid are some of the popular companies that use Apache Spark, whereas Serverless is used by Droplr, Plista GmbH, and Hammerhead. Apache Spark has a broader approval, being mentioned in 263 company stacks & 111 developers stacks; compared to Serverless, which is listed in 112 company stacks and 43 developer stacks.

Decisions about Serverless and Apache Spark

When adding a new feature to Checkly rearchitecting some older piece, I tend to pick Heroku for rolling it out. But not always, because sometimes I pick AWS Lambda . The short story:

  • Developer Experience trumps everything.
  • AWS Lambda is cheap. Up to a limit though. This impact not only your wallet.
  • If you need geographic spread, AWS is lonely at the top.
The setup

Recently, I was doing a brainstorm at a startup here in Berlin on the future of their infrastructure. They were ready to move on from their initial, almost 100% Ec2 + Chef based setup. Everything was on the table. But we crossed out a lot quite quickly:

  • Pure, uncut, self hosted Kubernetes — way too much complexity
  • Managed Kubernetes in various flavors — still too much complexity
  • Zeit — Maybe, but no Docker support
  • Elastic Beanstalk — Maybe, bit old but does the job
  • Heroku
  • Lambda

It became clear a mix of PaaS and FaaS was the way to go. What a surprise! That is exactly what I use for Checkly! But when do you pick which model?

I chopped that question up into the following categories:

  • Developer Experience / DX 🤓
  • Ops Experience / OX 🐂 (?)
  • Cost 💵
  • Lock in 🔐

Read the full post linked below for all details

See more
Pros of Serverless
Pros of Apache Spark
  • 12
    API integration
  • 6
    Supports cloud functions for Google, Azure, and IBM
  • 2
    Lower cost
  • 1
    Auto scale
  • 0
    Openwhisk
  • 58
    Open-source
  • 47
    Fast and Flexible
  • 7
    One platform for every big data problem
  • 6
    Easy to install and to use
  • 6
    Great for distributed SQL like applications
  • 3
    Works well for most Datascience usecases
  • 2
    Machine learning libratimery, Streaming in real
  • 2
    In memory Computation
  • 0
    Interactive Query

Sign up to add or upvote prosMake informed product decisions

Cons of Serverless
Cons of Apache Spark
    Be the first to leave a con
    • 2
      Speed

    Sign up to add or upvote consMake informed product decisions

    What is Serverless?

    Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more.

    What is Apache Spark?

    Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

    Need advice about which tool to choose?Ask the StackShare community!

    What companies use Serverless?
    What companies use Apache Spark?

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Serverless?
    What tools integrate with Apache Spark?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    MySQLKafkaApache Spark+6
    2
    1337
    Aug 28 2019 at 3:10AM
    https://img.stackshare.io/stack/505487/default_e35b8bd5e615e01dc9b420dbd2a444fcbaeff755.png logo

    Segment

    PythonJavaAmazon S3+16
    5
    1891
    What are some alternatives to Serverless and Apache Spark?
    AWS Lambda
    AWS Lambda is a compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security.
    Terraform
    With Terraform, you describe your complete infrastructure as code, even as it spans multiple service providers. Your servers may come from AWS, your DNS may come from CloudFlare, and your database may come from Heroku. Terraform will build all these resources across all these providers in parallel.
    Zappa
    Zappa makes it super easy to deploy all Python WSGI applications on AWS Lambda + API Gateway. Think of it as "serverless" web hosting for your Python web apps. That means infinite scaling, zero downtime, zero maintenance - and at a fraction of the cost of your current deployments!
    Kubernetes
    Kubernetes is an open source orchestration system for Docker containers. It handles scheduling onto nodes in a compute cluster and actively manages workloads to ensure that their state matches the users declared intentions.
    Azure Functions
    Azure Functions is an event driven, compute-on-demand experience that extends the existing Azure application platform with capabilities to implement code triggered by events occurring in virtually any Azure or 3rd party service as well as on-premises systems.
    See all alternatives
    Interest over time