Amazon Athena vs Azure Cosmos DB

Overview

Azure Cosmos DB

Stacks594

Followers1.1K

Votes130

Amazon Athena

Stacks521

Followers840

Votes49

Amazon Athena vs Azure Cosmos DB: What are the differences?

Introduction: In this comparison, we will explore the key differences between Amazon Athena and Azure Cosmos DB.

Querying and Data Types: Amazon Athena is used for running SQL queries on data stored in S3, making it suitable for structured and semi-structured data. Azure Cosmos DB, on the other hand, is a globally distributed database service that supports multiple data models such as key-value, document, and graph, allowing for more flexibility in data types and structures.
Database Architecture: Amazon Athena is a serverless interactive query service that executes queries directly on data stored in S3 without the need for infrastructure management. In contrast, Azure Cosmos DB is a fully managed NoSQL database service that provides automatic scaling, high availability, and low latency access to data globally, making it suitable for mission-critical applications with varied workloads.
Consistency Models: Amazon Athena does not offer consistency models as it is primarily a query service for analyzing data. Azure Cosmos DB, however, supports multiple consistency levels including strong, bounded staleness, session, consistent prefix, and eventual consistency, allowing users to choose the level that best fits their application requirements.
Scalability and Pricing: Amazon Athena pricing is based on the amount of data scanned by queries, making it cost-effective for occasional or sporadic usage. Azure Cosmos DB, on the other hand, offers flexible pricing options based on throughput, storage, and the number of regions, enabling users to scale up or down based on their performance and cost needs.
Integration and Ecosystem: Amazon Athena integrates seamlessly with other AWS services such as Glue for data cataloging and QuickSight for visualization, leveraging the broader AWS ecosystem for data analytics. In comparison, Azure Cosmos DB can be integrated with various Azure services like Azure Functions, Logic Apps, and Power BI, providing a comprehensive platform for building modern cloud-native applications.
Consistency in Query Performance: While Amazon Athena might experience slower query performance when dealing with large datasets due to its on-demand nature, Azure Cosmos DB ensures consistent low latency and high throughput for read and write operations across globally distributed data with guaranteed SLAs.

In Summary, Amazon Athena is more suited for ad-hoc querying of structured and semi-structured data in S3, while Azure Cosmos DB provides a globally distributed, multi-model database service with flexible data types, consistency models, scalability options, and deeper integration with Azure services for modern cloud-native applications.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Azure Cosmos DB, Amazon Athena

Pavithra

Mar 12, 2020

Needs adviceon

Amazon S3

Amazon Athena

Amazon Redshift

Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

522k views522k

Comments

Detailed Comparison

Azure Cosmos DB	Amazon Athena
Azure DocumentDB is a fully managed NoSQL database service built for fast and predictable performance, high availability, elastic scaling, global distribution, and ease of development.	Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Fully managed with 99.99% Availability SLA;Elastically and highly scalable (both throughput and storage);Predictable low latency: <10ms @ P99 reads and <15ms @ P99 fully-indexed writes;Globally distributed with multi-region replication;Rich SQL queries over schema-agnostic automatic indexing;JavaScript language integrated multi-record ACID transactions with snapshot isolation;Well-defined tunable consistency models: Strong, Bounded Staleness, Session, and Eventual	-
Statistics
Stacks 594	Stacks 521
Followers 1.1K	Followers 840
Votes 130	Votes 49
Pros & Cons
Pros 28 Best-of-breed NoSQL features 22 High scalability 15 Globally distributed 14 Automatic indexing over flexible json data model 10 Tunable consistency Cons 18 Pricing 4 Poor No SQL query support	Pros 16 Use SQL to analyze CSV files 8 Glue crawlers gives easy Data catalogue 7 Cheap 6 Query all my data without running servers 24x7 4 No data base servers yay
Integrations
Azure Machine Learning MongoDB Hadoop Java Azure Functions Azure Container Service Azure Storage Azure Websites Apache Spark Python	Amazon S3 Presto

What are some alternatives to Azure Cosmos DB, Amazon Athena?

Amazon DynamoDB

With it , you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use.

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Cloud Firestore

Cloud Firestore is a NoSQL document database that lets you easily store, sync, and query data for your mobile and web apps - at global scale.

Presto

Distributed SQL Query Engine for Big Data

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

lakeFS

It is an open-source data version control system for data lakes. It provides a “Git for data” platform enabling you to implement best practices from software engineering on your data lake, including branching and merging, CI/CD, and production-like dev/test environments.

Druid

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

Cloudant

Cloudant’s distributed database as a service (DBaaS) allows developers of fast-growing web and mobile apps to focus on building and improving their products, instead of worrying about scaling and managing databases on their own.

Google Cloud Bigtable

Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years—it's the database driving major applications such as Google Analytics and Gmail.

Apache Kylin

Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets, originally contributed from eBay Inc.

Related Comparisons

Amazon Athena vs Azure Cosmos DB: What are the differences?

Introduction: In this comparison, we will explore the key differences between Amazon Athena and Azure Cosmos DB.

Querying and Data Types: Amazon Athena is used for running SQL queries on data stored in S3, making it suitable for structured and semi-structured data. Azure Cosmos DB, on the other hand, is a globally distributed database service that supports multiple data models such as key-value, document, and graph, allowing for more flexibility in data types and structures.
Database Architecture: Amazon Athena is a serverless interactive query service that executes queries directly on data stored in S3 without the need for infrastructure management. In contrast, Azure Cosmos DB is a fully managed NoSQL database service that provides automatic scaling, high availability, and low latency access to data globally, making it suitable for mission-critical applications with varied workloads.
Consistency Models: Amazon Athena does not offer consistency models as it is primarily a query service for analyzing data. Azure Cosmos DB, however, supports multiple consistency levels including strong, bounded staleness, session, consistent prefix, and eventual consistency, allowing users to choose the level that best fits their application requirements.
Scalability and Pricing: Amazon Athena pricing is based on the amount of data scanned by queries, making it cost-effective for occasional or sporadic usage. Azure Cosmos DB, on the other hand, offers flexible pricing options based on throughput, storage, and the number of regions, enabling users to scale up or down based on their performance and cost needs.
Integration and Ecosystem: Amazon Athena integrates seamlessly with other AWS services such as Glue for data cataloging and QuickSight for visualization, leveraging the broader AWS ecosystem for data analytics. In comparison, Azure Cosmos DB can be integrated with various Azure services like Azure Functions, Logic Apps, and Power BI, providing a comprehensive platform for building modern cloud-native applications.
Consistency in Query Performance: While Amazon Athena might experience slower query performance when dealing with large datasets due to its on-demand nature, Azure Cosmos DB ensures consistent low latency and high throughput for read and write operations across globally distributed data with guaranteed SLAs.

Amazon Athena vs Azure Cosmos DB

Overview