Need advice about which tool to choose?Ask the StackShare community!
AWS Lambda vs Hadoop: What are the differences?
Introduction
This markdown code provides a comparison between AWS Lambda and Hadoop, highlighting their key differences.
Scalability: AWS Lambda is a serverless computing platform that automatically scales based on demand. It can handle individual requests or events in parallel and can scale up or down as needed. On the other hand, Hadoop is a distributed computing framework that allows for the storage and processing of large datasets across a cluster of computers. It provides parallel processing and fault tolerance for data-intensive applications.
Execution Model: AWS Lambda follows an event-driven execution model, where functions are triggered by events and run in response to those events. It supports a wide range of event sources, including data changes, API calls, or scheduled events. In contrast, Hadoop runs on a batch processing model, where data is processed in a batch fashion. It requires explicit submission and processing of jobs.
Managed Service vs Distributed Framework: AWS Lambda is a fully managed service provided by Amazon Web Services. It abstracts away server management and infrastructure concerns, allowing developers to focus on writing code. Hadoop, on the other hand, is a distributed framework that requires manual setup and configuration of a cluster of machines to operate.
Use Cases: AWS Lambda is often used for serverless application development, event-driven computing, and building serverless microservices. It provides a flexible and cost-effective solution for executing small, isolated tasks in response to events. Hadoop, on the other hand, is designed for big data processing, including tasks such as data ingestion, data preparation, and data analysis. It excels in processing large volumes of data in parallel.
Programming Models: AWS Lambda supports multiple programming languages, including Node.js, Python, Java, and C#. Developers can choose the language they are most comfortable with to write their functions. Hadoop, on the other hand, primarily supports Java for writing MapReduce jobs. It also provides support for other programming languages through high-level interfaces like Pig Latin and Hive Query Language.
Data Storage: AWS Lambda does not provide built-in data storage. It is designed to integrate with other AWS services like S3, DynamoDB, and RDS to store and retrieve data. Hadoop, on the other hand, comes with its own distributed file system called Hadoop Distributed File System (HDFS). HDFS provides fault-tolerant storage for large datasets across a cluster of machines.
In summary, AWS Lambda is a managed service that provides serverless computing capabilities with an event-driven execution model, while Hadoop is a distributed computing framework suitable for big data processing with a batch processing model.
Need advice on what platform, systems and tools to use.
Evaluating whether to start a new digital business for which we will need to build a website that handles all traffic. Website only right now. May add smartphone apps later. No desktop app will ever be added. Website to serve various countries and languages. B2B and B2C type customers. Need to handle heavy traffic, be low cost, and scale well.
We are open to either build it on AWS or on Microsoft Azure.
Apologies if I'm leaving out some info. My first post. :) Thanks in advance!
I recommend this : -Spring reactive for back end : the fact it's reactive (async) it consumes half of the resources that a sync platform needs (so less CPU -> less money). -Angular : Web Front end ; it's gives you the possibility to use PWA which is a cheap replacement for a mobile app (but more less popular). -Docker images. -Kubernetes to orchestrate all the containers. -I Use Jenkins / blueocean, ansible for my CI/CD (with Github of course) -AWS of course : u can run a K8S cluster there, make it multi AZ (availability zones) to be highly available, use a load balancer and an auto scaler and ur good to go. -You can store data by taking any managed DB or u can deploy ur own (cheap but risky).
You pay less money, but u need some technical 2 - 3 guys to make that done.
Good luck
My advice will be Front end: React Backend: Language: Java, Kotlin. Database: SQL: Postgres, MySQL, Aurora NOSQL: Mongo db. Caching: Redis. Public : Spring Webflux for async public facing operation. Admin api: Spring boot, Hibrernate, Rest API. Build Container image. Kuberenetes: AWS EKS, AWS ECS, Google GKE. Use Jenkins for CI/CD pipeline. Buddy works is good for AWS. Static content: Host on AWS S3 bucket, Use Cloudfront or Cloudflare as CDN.
Serverless Solution: Api gateway Lambda, Serveless Aurora (SQL). AWS S3 bucket.
I have a lot of data that's currently sitting in a MariaDB database, a lot of tables that weigh 200gb with indexes. Most of the large tables have a date column which is always filtered, but there are usually 4-6 additional columns that are filtered and used for statistics. I'm trying to figure out the best tool for storing and analyzing large amounts of data. Preferably self-hosted or a cheap solution. The current problem I'm running into is speed. Even with pretty good indexes, if I'm trying to load a large dataset, it's pretty slow.
Druid Could be an amazing solution for your use case, My understanding, and the assumption is you are looking to export your data from MariaDB for Analytical workload. It can be used for time series database as well as a data warehouse and can be scaled horizontally once your data increases. It's pretty easy to set up on any environment (Cloud, Kubernetes, or Self-hosted nix system). Some important features which make it a perfect solution for your use case. 1. It can do streaming ingestion (Kafka, Kinesis) as well as batch ingestion (Files from Local & Cloud Storage or Databases like MySQL, Postgres). In your case MariaDB (which has the same drivers to MySQL) 2. Columnar Database, So you can query just the fields which are required, and that runs your query faster automatically. 3. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. 4. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures 5. Gives ana amazing centralized UI to manage data sources, query, tasks.
When adding a new feature to Checkly rearchitecting some older piece, I tend to pick Heroku for rolling it out. But not always, because sometimes I pick AWS Lambda . The short story:
- Developer Experience trumps everything.
- AWS Lambda is cheap. Up to a limit though. This impact not only your wallet.
- If you need geographic spread, AWS is lonely at the top.
Recently, I was doing a brainstorm at a startup here in Berlin on the future of their infrastructure. They were ready to move on from their initial, almost 100% Ec2 + Chef based setup. Everything was on the table. But we crossed out a lot quite quickly:
- Pure, uncut, self hosted Kubernetes — way too much complexity
- Managed Kubernetes in various flavors — still too much complexity
- Zeit — Maybe, but no Docker support
- Elastic Beanstalk — Maybe, bit old but does the job
- Heroku
- Lambda
It became clear a mix of PaaS and FaaS was the way to go. What a surprise! That is exactly what I use for Checkly! But when do you pick which model?
I chopped that question up into the following categories:
- Developer Experience / DX 🤓
- Ops Experience / OX 🐂 (?)
- Cost 💵
- Lock in 🔐
Read the full post linked below for all details
Pros of AWS Lambda
- No infrastructure129
- Cheap83
- Quick70
- Stateless59
- No deploy, no server, great sleep47
- AWS Lambda went down taking many sites with it12
- Event Driven Governance6
- Extensive API6
- Auto scale and cost effective6
- Easy to deploy6
- VPC Support5
- Integrated with various AWS services3
Pros of Hadoop
- Great ecosystem39
- One stack to rule them all11
- Great load balancer4
- Amazon aws1
- Java syntax1
Sign up to add or upvote prosMake informed product decisions
Cons of AWS Lambda
- Cant execute ruby or go7
- Compute time limited3
- Can't execute PHP w/o significant effort1