What is Amazon S3 and what are its top alternatives?
Top Alternatives to Amazon S3
In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. With Amazon Glacier, customers can reliably store large or small amounts of data for as little as $0.01 per gigabyte per month, a significant savings compared to on-premises solutions. ...
Amazon EBS volumes are network-attached, and persist independently from the life of an instance. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage. ...
It is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. ...
Keep photos, stories, designs, drawings, recordings, videos, and more. Your first 15 GB of storage are free with a Google Account. Your files in Drive can be reached from any smartphone, tablet, or computer. ...
Azure is an open and flexible cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. You can build applications using any language, tool or framework. And you can integrate your public cloud applications with your existing IT environment. ...
It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions. ...
Amazon RDS gives you access to the capabilities of a familiar MySQL, Oracle or Microsoft SQL Server database engine. This means that the code, applications, and tools you already use today with your existing databases can be used with Amazon RDS. Amazon RDS automatically patches the database software and backs up your database, storing the backups for a user-defined retention period and enabling point-in-time recovery. You benefit from the flexibility of being able to scale the compute resources or storage capacity associated with your Database Instance (DB Instance) via a single API call. ...
Harness the power of Dropbox. Connect to an account, upload, download, search, and more. ...
Amazon S3 alternatives & related posts
related Amazon Glacier posts
related Amazon EBS posts
We are looking for a centralised monitoring solution for our application deployed on Amazon EKS. We would like to monitor using metrics from Kubernetes, AWS services (NeptuneDB, AWS Elastic Load Balancing (ELB), Amazon EBS, Amazon S3, etc) and application microservice's custom metrics.
We are expected to use around 80 microservices (not replicas). I think a total of 200-250 microservices will be there in the system with 10-12 slave nodes.
We tried Prometheus but it looks like maintenance is a big issue. We need to manage scaling, maintaining the storage, and dealing with multiple exporters and Grafana. I felt this itself needs few dedicated resources (at least 2-3 people) to manage. Not sure if I am thinking in the correct direction. Please confirm.
You mentioned Datadog and Sysdig charges per host. Does it charge per slave node?
related Amazon EC2 posts
To provide employees with the critical need of interactive querying, we’ve worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest’s scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.
Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.
We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.
Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.
Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.
#BigData #AWS #DataScience #DataEngineering
Our whole DevOps stack consists of the following tools:
- GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
- Respectively Git as revision control system
- SourceTree as Git GUI
- Visual Studio Code as IDE
- CircleCI for continuous integration (automatize development process)
- Prettier / TSLint / ESLint as code linter
- SonarQube as quality gate
- Docker as container management (incl. Docker Compose for multi-container application management)
- VirtualBox for operating system simulation tests
- Kubernetes as cluster management for docker containers
- Heroku for deploying in test environments
- nginx as web server (preferably used as facade server in production environment)
- SSLMate (using OpenSSL) for certificate management
- Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
- PostgreSQL as preferred database system
- Redis as preferred in-memory database/store (great for caching)
The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:
- Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
- Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
- Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
- Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
- Scalability: All-in-one framework for distributed systems.
- Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
related Google Drive posts
We use Nextcloud for company-file-management, personal work-documents and for collaborative work (through collabora), organize our #TODOs, that are not covered by the Bugtracker. Existing solutions either were very expensive ( Google Drive ), missed a lot of features ( Trello ) or were pretty much overloaded with features ( Wekan within Sandstorm ).
That made Nextcloud ud our natural fit for our company management and we're convinced of its integrations and flexibility.
We originally used Dropbox as an easy way to store and share documents, but moved to the much more powerful and convenient Google Drive, although we still use Dropbox occasionally.
related Microsoft Azure posts
We are hardcore Kubernetes users and contributors. We loved the automation it provides. However, as our team grew and added more clusters and microservices, capacity and resources management becomes a massive pain to us. We started suffering from a lot of outages and unexpected behavior as we promote our code from dev to production environments. Luckily we were working on our AI-powered tools to understand different dependencies, predict usage, and calculate the right resources and configurations that should be applied to our infrastructure and microservices. We dogfooded our agent (http://github.com/magalixcorp/magalix-agent) and were able to stabilize as the #autopilot continuously recovered any miscalculations we made or because of unexpected changes in workloads. We are open sourcing our agent in a few days. Check it out and let us know what you think! We run workloads on Microsoft Azure Google Kubernetes Engine and Amazon EC2 and we're all about Go and Python!
CodeFactor being a #SAAS product, our goal was to run on a cloud-native infrastructure since day one. We wanted to stay product focused, rather than having to work on the infrastructure that supports the application. We needed a cloud-hosting provider that would be reliable, economical and most efficient for our product.
CodeFactor.io aims to provide an automated and frictionless code review service for software developers. That requires agility, instant provisioning, autoscaling, security, availability and compliance management features. We looked at the top three #IAAS providers that take up the majority of market share: Amazon's Amazon EC2 , Microsoft's Microsoft Azure, and Google Compute Engine.
AWS has been available since 2006 and has developed the most extensive services ant tools variety at a massive scale. Azure and GCP are about half the AWS age, but also satisfied our technical requirements.
It is worth noting that even though all three providers support Docker containerization services, GCP has the most robust offering due to their investments in Kubernetes. Also, if you are a Microsoft shop, and develop in .NET - Visual Studio Azure shines at integration there and all your existing .NET code works seamlessly on Azure. All three providers have serverless computing offerings (AWS Lambda, Azure Functions, and Google Cloud Functions). Additionally, all three providers have machine learning tools, but GCP appears to be the most developer-friendly, intuitive and complete when it comes to #Machinelearning and #AI.
The prices between providers are competitive across the board. For our requirements, AWS would have been the most expensive, GCP the least expensive and Azure was in the middle. Plus, if you #Autoscale frequently with large deltas, note that Azure and GCP have per minute billing, where AWS bills you per hour. We also applied for the #Startup programs with all three providers, and this is where Azure shined. While AWS and GCP for startups would have covered us for about one year of infrastructure costs, Azure Sponsorship would cover about two years of CodeFactor's hosting costs. Moreover, Azure Team was terrific - I felt that they wanted to work with us where for AWS and GCP we were just another startup.
In summary, we were leaning towards GCP. GCP's advantages in containerization, automation toolset, #Devops mindset, and pricing were the driving factors there. Nevertheless, we could not say no to Azure's financial incentives and a strong sense of partnership and support throughout the process.
Bottom line is, IAAS offerings with AWS, Azure, and GCP are evolving fast. At CodeFactor, we aim to be platform agnostic where it is practical and retain the flexibility to cherry-pick the best products across providers.
related Amazon Redshift posts
Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.
I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.
For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.
Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.
Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.
Future improvements / technology decisions included:
Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic
As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.
One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.
Looker , Stitch , Amazon Redshift , dbt
We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.
For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.
related Amazon RDS posts
I'm planning to create a web application and also a mobile application to provide a very good shopping experience to the end customers. Shortly, my application will be aggregate the product details from difference sources and giving a clear picture to the user that when and where to buy that product with best in Quality and cost.
I have planned to develop this in many milestones for adding N number of features and I have picked my first part to complete the core part (aggregate the product details from different sources).
As per my work experience and knowledge, I have chosen the followings stacks to this mission.
Service: I have planned to use Java as the main business layer language as I have 7+ years of experience on this I believe I can do better work using Java than other languages. In addition, I'm thinking to use the stacks Node.js.
Database and ORM: I'm gonna pick MySQL as DB and Hibernate as ORM since I have a piece of good knowledge and also work experience on this combination.
Search Engine: I need to deal with a large amount of product data and it's in-detailed info to provide enough details to end user at the same time I need to focus on the performance area too. so I have decided to use Solr as a search engine for product search and suggestions. In addition, I'm thinking to replace Solr by Elasticsearch once explored/reviewed enough about Elasticsearch.
Host: As of now, my plan to complete the application with decent features first and deploy it in a free hosting environment like Docker and Heroku and then once it is stable then I have planned to use the AWS products Amazon S3, EC2, Amazon RDS and Amazon Route 53. I'm not sure about Microsoft Azure that what is the specialty in it than Heroku and Amazon EC2 Container Service. Anyhow, I will do explore these once again and pick the best suite one for my requirement once I reached this level.
Build and Repositories: I have decided to choose Apache Maven and Git as these are my favorites and also so popular on respectively build and repositories.
Additional Utilities :) - I would like to choose Codacy for code review as their Startup plan will be very helpful to this application. I'm already experienced with Google CheckStyle and SonarQube even I'm looking something on Codacy.
Happy Coding! Suggestions are welcome! :)
As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.
We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.
related Dropbox posts
We originally used Dropbox as an easy way to store and share documents, but moved to the much more powerful and convenient Google Drive, although we still use Dropbox occasionally.
I'm looking for a tool or set of tools to enable searching across all of our platforms including Confluence and Jira, Zoho CRM, Gmail, Gdrive for business, Dropbox and iCloud.
Any ideas. Something like X1? IBM Watson Discovery?
(And local Disk of course)