Technical Lead at Incred Financial Solutions·
Needs advice
on
Amazon S3Amazon S3MetabaseMetabase
and
PrestoPresto

Hi,

We are currently storing the data in Amazon S3 using Apache Parquet format. We are using Presto to query the data from S3 and catalog it using AWS Glue catalog. We have Metabase sitting on top of Presto, where our reports are present. Currently, Presto is becoming too costly for us, and we are looking for alternatives for it but want to use the remaining setup (S3, Metabase) as much as possible. Please suggest alternative approaches.

READ LESS
6 upvotes·100.5K views
Replies (1)
Co-founder at Transloadit·

Hey there, the trick to keeping costs under control is to partition. This means you split up your source files by date, and also query within dates, so that Athena only scans the few files necessary for those dates. I hope that makes sense (and I also hope I understood your question right). This article explains better https://aws.amazon.com/blogs/big-data/analyze-your-amazon-cloudfront-access-logs-at-scale/.

READ MORE
Analyze your Amazon CloudFront access logs at scale | AWS Big Data Blog (aws.amazon.com)
4 upvotes·4.7K views
Avatar of Kevin van Zonneveld

Kevin van Zonneveld

Co-founder at Transloadit