Amazon S3 logo

Amazon S3

Store and retrieve any amount of data, at any time, from anywhere on the web
30.9K
20.4K
+ 1
2K

What is Amazon S3?

Amazon Simple Storage Service provides a fully redundant data storage infrastructure for storing and retrieving any amount of data, at any time, from anywhere on the web
Amazon S3 is a tool in the Cloud Storage category of a tech stack.

Who uses Amazon S3?

Companies
5879 companies reportedly use Amazon S3 in their tech stacks, including Airbnb, Pinterest, and Netflix.

Developers
24264 developers on StackShare have stated that they use Amazon S3.

Amazon S3 Integrations

Travis CI, Gatsby, Auth0, Fastly, and Papertrail are some of the popular tools that integrate with Amazon S3. Here's a list of all 143 tools that integrate with Amazon S3.
Public Decisions about Amazon S3

Here are some stack decisions, common use cases and reviews by companies and developers who chose Amazon S3 in their tech stack.

Priit Kaasik
Engineering Lead at Katana MRP | 7 upvotes 路 55.2K views

Sometimes #ad-blocking addons can cause a real headache when working with JavaScript apps. Onboarding assistants (Appcues + elevio ), chat (Intercom) and product usage insight (Hotjar) have all landed on their blacklists. I guess there is a perfectly good reason for this that I just don't know.

In order to fix this, we had to set up our own content delivery service. We chose Amazon CloudFront and Amazon S3 to do the job because it has a good synergy with Heroku PaaS we are already using.

See more
Johnny Bell
Software Engineer at Weedmaps | 6 upvotes 路 7.6K views
Shared insights
on
Netlify
Buddy
Amazon S3

So if you look through my decisions you will see I recently wrote a decision about moving from Netlify to Buddy and Amazon S3.

I want to write another decision saying that I tried this out and actually moved back to Netlify. Buddy was great until they deleted my account and all my pipelines I setup without warning me because I didn't login for a month.

Netlify is amazing and way easier to setup, support is great and they have so many amazing options... I did learn things about Amazon S3 by moving over to there but I'm sticking with Netlify for the long run now.

See more
Praveen Mooli
Engineering Manager at Taylor and Francis | 13 upvotes 路 1.5M views

We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.

To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas

To build #Webapps we decided to use Angular 2 with RxJS

#Devops - GitHub , Travis CI , Terraform , Docker , Serverless

See more
Ashish Singh
Tech Lead, Big Data Platform at Pinterest | 34 upvotes 路 579.1K views

To provide employees with the critical need of interactive querying, we鈥檝e worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest鈥檚 scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.

Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.

We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.

Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.

Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.

#BigData #AWS #DataScience #DataEngineering

See more

Hi, I'm building a machine learning pipelines to store image bytes and image vectors in the backend.

So, when users query for the random access image data (key), we return the image bytes and perform machine learning model operations on it.

I'm currently considering going with Amazon S3 (in the future, maybe add Redis caching layer) as the backend system to store the information (s3 buckets with sharded prefixes).

As the latency of S3 is 100-200ms (get/put) and it has a high throughput of 3500 puts/sec and 5500 gets/sec for a given bucker/prefix. In the future I need to reduce the latency, I can add Redis cache.

Also, s3 costs are way fewer than HBase (on Amazon EC2 instances with 3x replication factor)

I have not personally used HBase before, so can someone help me if I'm making the right choice here? I'm not aware of Hbase latencies and I have learned that the MOB feature on Hbase has to be turned on if we have store image bytes on of the column families as the avg image bytes are 240Kb.

See more
Simon Reymann
Senior Fullstack Developer at QUANTUSflow Software GmbH | 27 upvotes 路 1.8M views

Our whole DevOps stack consists of the following tools:

  • GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
  • Respectively Git as revision control system
  • SourceTree as Git GUI
  • Visual Studio Code as IDE
  • CircleCI for continuous integration (automatize development process)
  • Prettier / TSLint / ESLint as code linter
  • SonarQube as quality gate
  • Docker as container management (incl. Docker Compose for multi-container application management)
  • VirtualBox for operating system simulation tests
  • Kubernetes as cluster management for docker containers
  • Heroku for deploying in test environments
  • nginx as web server (preferably used as facade server in production environment)
  • SSLMate (using OpenSSL) for certificate management
  • Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
  • PostgreSQL as preferred database system
  • Redis as preferred in-memory database/store (great for caching)

The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:

  • Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
  • Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
  • Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
  • Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
  • Scalability: All-in-one framework for distributed systems.
  • Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
See more

Blog Posts

Jun 24, 2020 at 4:42PM
https://img.stackshare.io/stack/673323/default_9dbdf311e98bc0550cc0b928a4f7144ba95c89fc.png logo

Pinterest

4
842
Aug 28, 2019 at 3:10AM
https://img.stackshare.io/stack/505487/default_e35b8bd5e615e01dc9b420dbd2a444fcbaeff755.png logo

Segment

5
1784
Jul 2, 2019 at 9:34PM
https://img.stackshare.io/stack/374725/default_11e3fd22f9363f9e6b2bd10a6a73e7a93b3a221c.png logo

Segment

10
4831

Amazon S3's Features

  • Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. The number of objects you can store is unlimited.
  • Each object is stored in a bucket and retrieved via a unique, developer-assigned key.
  • A bucket can be stored in one of several Regions. You can choose a Region to optimize for latency, minimize costs, or address regulatory requirements. Amazon S3 is currently available in the US Standard, US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (Sao Paulo), and GovCloud (US) Regions. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps.
  • Objects stored in a Region never leave the Region unless you transfer them out. For example, objects stored in the EU (Ireland) Region never leave the EU.
  • Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.
  • Options for secure data upload/download and encryption of data at rest are provided for additional data protection.
  • Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.
  • Built to be flexible so that protocol or functional layers can easily be added. The default download protocol is HTTP. A BitTorrent protocol interface is provided to lower costs for high-scale distribution.
  • Provides functionality to simplify manageability of data through its lifetime. Includes options for segregating data by buckets, monitoring and controlling spend, and automatically archiving data to even lower cost storage options. These options can be easily administered from the Amazon S3 Management Console.
  • Reliability backed with the Amazon S3 Service Level Agreement.

Amazon S3 Alternatives & Comparisons

What are some alternatives to Amazon S3?
Amazon Glacier
In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. With Amazon Glacier, customers can reliably store large or small amounts of data for as little as $0.01 per gigabyte per month, a significant savings compared to on-premises solutions.
Amazon EBS
Amazon EBS volumes are network-attached, and persist independently from the life of an instance. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.
Amazon EC2
It is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.
Google Drive
Keep photos, stories, designs, drawings, recordings, videos, and more. Your first 15 GB of storage are free with a Google Account. Your files in Drive can be reached from any smartphone, tablet, or computer.
Microsoft Azure
Azure is an open and flexible cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. You can build applications using any language, tool or framework. And you can integrate your public cloud applications with your existing IT environment.
See all alternatives

Amazon S3's Followers
20355 developers follow Amazon S3 to keep up with related blogs and decisions.