Amazon Athena vs Impala: What are the differences?
What is Amazon Athena? Query S3 Using SQL. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
What is Impala? Real-time Query for Hadoop. Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.
Amazon Athena and Impala can be primarily classified as "Big Data" tools.
"Use SQL to analyze CSV files" is the primary reason why developers consider Amazon Athena over the competitors, whereas "Super fast" was stated as the key factor in picking Impala.
Impala is an open source tool with 2.18K GitHub stars and 824 GitHub forks. Here's a link to Impala's open source repository on GitHub.
According to the StackShare community, Amazon Athena has a broader approval, being mentioned in 50 company stacks & 18 developers stacks; compared to Impala, which is listed in 15 company stacks and 5 developer stacks.