Hello! I have a mobile app with nearly 100k MAU, and I want to add a cloud file storage service to my app.
My app will allow users to store their image, video, and audio files and retrieve them to their device when necessary.
I have already decided to use PHP & Laravel as my backend, and I use Contabo VPS. Now, I need an object storage service for my app, and my options are:
Amazon S3 : It sounds to me like the best option but the most expensive. Closest to my users (MENA Region) for other services, I will have to go to Europe. Not sure how important this is?
DigitalOcean Spaces : Seems like my best option for price/service, but I am still not sure
Wasabi: the best price (6 USD/MONTH/TB) and free bandwidth, but I am not sure if it fits my needs as I want to allow my users to preview audio and video files. They don't recommend their service for streaming videos.
Backblaze B2 Cloud Storage: Good price but not sure about them.
There is also the self-hosted s3 compatible option, but I am not sure about that.
Any thoughts will be helpful. Also, if you think I should post in a different sub, please tell me.
We need to perform ETL from several databases into a data warehouse or data lake. We want to
- keep raw and transformed data available to users to draft their own queries efficiently
- give users the ability to give custom permissions and SSO
- move between open-source on-premises development and cloud-based production environments
We want to use inexpensive Amazon EC2 instances only on medium-sized data set 16GB to 32GB feeding into Tableau Server or PowerBI for reporting and data analysis purposes.
You could also use AWS Lambda and use Cloudwatch event schedule if you know when the function should be triggered. The benefit is that you could use any language and use the respective database client.
But if you orchestrate ETLs then it makes sense to use Apache Airflow. This requires Python knowledge.
Though we have always built something custom, Apache airflow (https://airflow.apache.org/) stood out as a key contender/alternative when it comes to open sources. On the commercial offering, Amazon Redshift combined with Amazon Kinesis (for complex manipulations) is great for BI, though Redshift as such is expensive.
You may want to look into a Data Virtualization product called Conduit. It connects to disparate data sources in AWS, on prem, Azure, GCP, and exposes them as a single unified Spark SQL view to PowerBI (direct query) or Tableau. Allows auto query and caching policies to enhance query speeds and experience. Has a GPU query engine and optimized Spark for fallback. Can be deployed on your AWS VM or on prem, scales up and out. Sounds like the ideal solution to your needs.
Minio is a free and open source object storage system. It can be self-hosted and is S3 compatible. During the early stage it would save cost and allow us to move to a different object storage when we scale up. It is also fast and easy to set up. This is very useful during development since it can be run on localhost.
Cloud Data-warehouse is the centerpiece of modern Data platform. The choice of the most suitable solution is therefore fundamental.
Our benchmark was conducted over BigQuery and Snowflake. These solutions seem to match our goals but they have very different approaches.
BigQuery is notably the only 100% serverless cloud data-warehouse, which requires absolutely NO maintenance: no re-clustering, no compression, no index optimization, no storage management, no performance management. Snowflake requires to set up (paid) reclustering processes, to manage the performance allocated to each profile, etc. We can also mention Redshift, which we have eliminated because this technology requires even more ops operation.
BigQuery can therefore be set up with almost zero cost of human resources. Its on-demand pricing is particularly adapted to small workloads. 0 cost when the solution is not used, only pay for the query you're running. But quickly the use of slots (with monthly or per-minute commitment) will drastically reduce the cost of use. We've reduced by 10 the cost of our nightly batches by using flex slots.
Finally, a major advantage of BigQuery is its almost perfect integration with Google Cloud Platform services: Cloud functions, Dataflow, Data Studio, etc.
BigQuery is still evolving very quickly. The next milestone, BigQuery Omni, will allow to run queries over data stored in an external Cloud platform (Amazon S3 for example). It will be a major breakthrough in the history of cloud data-warehouses. Omni will compensate a weakness of BigQuery: transferring data in near real time from S3 to BQ is not easy today. It was even simpler to implement via Snowflake's Snowpipe solution.
We also plan to use the Machine Learning features built into BigQuery to accelerate our deployment of Data-Science-based projects. An opportunity only offered by the BigQuery solution
We offer our customer HIPAA compliant storage. After analyzing the market, we decided to go with Google Storage. The Nodejs API is ok, still not ES6 and can be very confusing to use. For each new customer, we created a different bucket so they can have individual data and not have to worry about data loss. After 1000+ customers we started seeing many problems with the creation of new buckets, with saving or retrieving a new file. Many false positive: the Promise returned ok, but in reality, it failed.
That's why we switched to S3 that just works.
Sign up to add or upvote prosMake informed product decisions
Sign up to add or upvote consMake informed product decisions
What is Amazon RDS?
What is Amazon Redshift?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Red Hat, Inc.
Insanely low prices, quite easy to use, and they're fast. Plus they provide great support. And they're integrated with other AWS services, like CloudFront.
Seriously, this is the best service of it's kind out there.
While we initially started off running our own Postgres cluster, we evaluated RDS and found it to be an excellent fit for us.
The failovers, manual scaling, replication, Postgres upgrades, and pretty much everything else has been super smooth and reliable.
We'll probably need something a little more complex in the future, but RDS performs admirably for now.
We are using RDS for managing PostgreSQL and legacy MSSQL databases.
Unfortunately while RDS works great for managing the PostgreSQL systems, MSSQL is very much a second class citizen and they don't offer very much capability. Infact, in order to upgrade instance storage for MSSQL we actually have to spin up a new cluster and migrate the data over.
We store the software components that CloudRepo stores for its customers here for the following reasons:
- Data is Encrypted at Rest
- Data is stored across multiple physical locations
- Pricing is competitive
- Reliability is industry leading and our customers need to be able to access their data at all times list text here
In October 2008 we moved to using scribe (now a custom branch), which has served us very well over the past 5+ years that we’ve been using it. We take the logs scribe aggregates and move them into Amazon S3 for storage, which makes using EMR on AWS seamless.
S3 serves as zero-knowledge temporary storage. Files are encrypted in the browser before being uploaded in chunks to S3. When the target recipient downloads them the chunks are reassembled and decrypted in the browser. Files expire after a week and the encrypted chunks are permanently deleted from S3.
Since we generate a static website for our website, AWS S3 provides hosting for us so that we don't have to run our own servers just to serve up static content.
The pricing is great as you only pay for what you use.
This object storage is always evolving and getting harder to explain. We use it for 1) hosting every static websites, 2) datalake to store every transaction and 3) query with Athena / S3 Select.
Aggressive archiving of historical data to keep the production database as small as possible. Using our in-house soon-to-be-open-sourced ETL library, SharpShifter.
Our PostgreSQL servers, where we keep the bulk of Wirkn data, are hosted on the fantastically easy and reliable AWS RDS platform.
We use Aurora for our OLTP database, it provides significant speed increases on top of MySQL without the need to manage it
RDS allows us to replicate the development databases locally as well as making it available to CircleCI.