Aditya Tyagi's Stack Decision

Pavithra Nagaraj

Mar 12, 2020

Needs advice

Amazon Athena

Amazon Redshift

and

AWS Glue

Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

READ LESS

3 upvotes·520.4K views

Replies (4)

Aditya Tyagi

Mar 13, 2021

you can change your PSV fomat data to parquet file format with AWS GLUE and then your query performance will be improved

1 upvote·217.9K views

Recommends

you can use aws glue service to convert you pipe format data to parquet format , and thus you can achieve data compression . Now you should choose Redshift to copy your data as it is very huge. To manage your data, you should partition your data in S3 bucket and also divide your data across the redshift cluster

7 upvotes·218.6K views

View all (4)