Need advice about which tool to choose?Ask the StackShare community!

AWS Lambda

23.9K
18.6K
+ 1
432
Hadoop

2.5K
2.3K
+ 1
56
Add tool

AWS Lambda vs Hadoop: What are the differences?

Introduction

This markdown code provides a comparison between AWS Lambda and Hadoop, highlighting their key differences.

  1. Scalability: AWS Lambda is a serverless computing platform that automatically scales based on demand. It can handle individual requests or events in parallel and can scale up or down as needed. On the other hand, Hadoop is a distributed computing framework that allows for the storage and processing of large datasets across a cluster of computers. It provides parallel processing and fault tolerance for data-intensive applications.

  2. Execution Model: AWS Lambda follows an event-driven execution model, where functions are triggered by events and run in response to those events. It supports a wide range of event sources, including data changes, API calls, or scheduled events. In contrast, Hadoop runs on a batch processing model, where data is processed in a batch fashion. It requires explicit submission and processing of jobs.

  3. Managed Service vs Distributed Framework: AWS Lambda is a fully managed service provided by Amazon Web Services. It abstracts away server management and infrastructure concerns, allowing developers to focus on writing code. Hadoop, on the other hand, is a distributed framework that requires manual setup and configuration of a cluster of machines to operate.

  4. Use Cases: AWS Lambda is often used for serverless application development, event-driven computing, and building serverless microservices. It provides a flexible and cost-effective solution for executing small, isolated tasks in response to events. Hadoop, on the other hand, is designed for big data processing, including tasks such as data ingestion, data preparation, and data analysis. It excels in processing large volumes of data in parallel.

  5. Programming Models: AWS Lambda supports multiple programming languages, including Node.js, Python, Java, and C#. Developers can choose the language they are most comfortable with to write their functions. Hadoop, on the other hand, primarily supports Java for writing MapReduce jobs. It also provides support for other programming languages through high-level interfaces like Pig Latin and Hive Query Language.

  6. Data Storage: AWS Lambda does not provide built-in data storage. It is designed to integrate with other AWS services like S3, DynamoDB, and RDS to store and retrieve data. Hadoop, on the other hand, comes with its own distributed file system called Hadoop Distributed File System (HDFS). HDFS provides fault-tolerant storage for large datasets across a cluster of machines.

In summary, AWS Lambda is a managed service that provides serverless computing capabilities with an event-driven execution model, while Hadoop is a distributed computing framework suitable for big data processing with a batch processing model.

Advice on AWS Lambda and Hadoop

Need advice on what platform, systems and tools to use.

Evaluating whether to start a new digital business for which we will need to build a website that handles all traffic. Website only right now. May add smartphone apps later. No desktop app will ever be added. Website to serve various countries and languages. B2B and B2C type customers. Need to handle heavy traffic, be low cost, and scale well.

We are open to either build it on AWS or on Microsoft Azure.

Apologies if I'm leaving out some info. My first post. :) Thanks in advance!

See more
Replies (2)
Anis Zehani

I recommend this : -Spring reactive for back end : the fact it's reactive (async) it consumes half of the resources that a sync platform needs (so less CPU -> less money). -Angular : Web Front end ; it's gives you the possibility to use PWA which is a cheap replacement for a mobile app (but more less popular). -Docker images. -Kubernetes to orchestrate all the containers. -I Use Jenkins / blueocean, ansible for my CI/CD (with Github of course) -AWS of course : u can run a K8S cluster there, make it multi AZ (availability zones) to be highly available, use a load balancer and an auto scaler and ur good to go. -You can store data by taking any managed DB or u can deploy ur own (cheap but risky).

You pay less money, but u need some technical 2 - 3 guys to make that done.

Good luck

See more

My advice will be Front end: React Backend: Language: Java, Kotlin. Database: SQL: Postgres, MySQL, Aurora NOSQL: Mongo db. Caching: Redis. Public : Spring Webflux for async public facing operation. Admin api: Spring boot, Hibrernate, Rest API. Build Container image. Kuberenetes: AWS EKS, AWS ECS, Google GKE. Use Jenkins for CI/CD pipeline. Buddy works is good for AWS. Static content: Host on AWS S3 bucket, Use Cloudfront or Cloudflare as CDN.

Serverless Solution: Api gateway Lambda, Serveless Aurora (SQL). AWS S3 bucket.

See more
Needs advice
on
HadoopHadoopInfluxDBInfluxDB
and
KafkaKafka

I have a lot of data that's currently sitting in a MariaDB database, a lot of tables that weigh 200gb with indexes. Most of the large tables have a date column which is always filtered, but there are usually 4-6 additional columns that are filtered and used for statistics. I'm trying to figure out the best tool for storing and analyzing large amounts of data. Preferably self-hosted or a cheap solution. The current problem I'm running into is speed. Even with pretty good indexes, if I'm trying to load a large dataset, it's pretty slow.

See more
Replies (1)
Recommends
on
DruidDruid

Druid Could be an amazing solution for your use case, My understanding, and the assumption is you are looking to export your data from MariaDB for Analytical workload. It can be used for time series database as well as a data warehouse and can be scaled horizontally once your data increases. It's pretty easy to set up on any environment (Cloud, Kubernetes, or Self-hosted nix system). Some important features which make it a perfect solution for your use case. 1. It can do streaming ingestion (Kafka, Kinesis) as well as batch ingestion (Files from Local & Cloud Storage or Databases like MySQL, Postgres). In your case MariaDB (which has the same drivers to MySQL) 2. Columnar Database, So you can query just the fields which are required, and that runs your query faster automatically. 3. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. 4. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures 5. Gives ana amazing centralized UI to manage data sources, query, tasks.

See more
Decisions about AWS Lambda and Hadoop

When adding a new feature to Checkly rearchitecting some older piece, I tend to pick Heroku for rolling it out. But not always, because sometimes I pick AWS Lambda . The short story:

  • Developer Experience trumps everything.
  • AWS Lambda is cheap. Up to a limit though. This impact not only your wallet.
  • If you need geographic spread, AWS is lonely at the top.
The setup

Recently, I was doing a brainstorm at a startup here in Berlin on the future of their infrastructure. They were ready to move on from their initial, almost 100% Ec2 + Chef based setup. Everything was on the table. But we crossed out a lot quite quickly:

  • Pure, uncut, self hosted Kubernetes — way too much complexity
  • Managed Kubernetes in various flavors — still too much complexity
  • Zeit — Maybe, but no Docker support
  • Elastic Beanstalk — Maybe, bit old but does the job
  • Heroku
  • Lambda

It became clear a mix of PaaS and FaaS was the way to go. What a surprise! That is exactly what I use for Checkly! But when do you pick which model?

I chopped that question up into the following categories:

  • Developer Experience / DX 🤓
  • Ops Experience / OX 🐂 (?)
  • Cost 💵
  • Lock in 🔐

Read the full post linked below for all details

See more
Manage your open source components, licenses, and vulnerabilities
Learn More
Pros of AWS Lambda
Pros of Hadoop
  • 129
    No infrastructure
  • 83
    Cheap
  • 70
    Quick
  • 59
    Stateless
  • 47
    No deploy, no server, great sleep
  • 12
    AWS Lambda went down taking many sites with it
  • 6
    Event Driven Governance
  • 6
    Extensive API
  • 6
    Auto scale and cost effective
  • 6
    Easy to deploy
  • 5
    VPC Support
  • 3
    Integrated with various AWS services
  • 39
    Great ecosystem
  • 11
    One stack to rule them all
  • 4
    Great load balancer
  • 1
    Amazon aws
  • 1
    Java syntax

Sign up to add or upvote prosMake informed product decisions

Cons of AWS Lambda
Cons of Hadoop
  • 7
    Cant execute ruby or go
  • 3
    Compute time limited
  • 1
    Can't execute PHP w/o significant effort
    Be the first to leave a con

    Sign up to add or upvote consMake informed product decisions

    - No public GitHub repository available -

    What is AWS Lambda?

    AWS Lambda is a compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security.

    What is Hadoop?

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

    Need advice about which tool to choose?Ask the StackShare community!

    What companies use AWS Lambda?
    What companies use Hadoop?
    Manage your open source components, licenses, and vulnerabilities
    Learn More

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with AWS Lambda?
    What tools integrate with Hadoop?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    MySQLKafkaApache Spark+6
    4
    2046
    Aug 28 2019 at 3:10AM

    Segment

    PythonJavaAmazon S3+16
    7
    2603
    GitHubPythonNode.js+47
    55
    72666
    What are some alternatives to AWS Lambda and Hadoop?
    Serverless
    Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more.
    Azure Functions
    Azure Functions is an event driven, compute-on-demand experience that extends the existing Azure application platform with capabilities to implement code triggered by events occurring in virtually any Azure or 3rd party service as well as on-premises systems.
    AWS Elastic Beanstalk
    Once you upload your application, Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring.
    AWS Step Functions
    AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.
    Google App Engine
    Google has a reputation for highly reliable, high performance infrastructure. With App Engine you can take advantage of the 10 years of knowledge Google has in running massively scalable, performance driven systems. App Engine applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage needs grow.
    See all alternatives