Need advice about which tool to choose?Ask the StackShare community!
Amazon Redshift vs Microsoft SQL Server: What are the differences?
Introduction
In this article, we will explore the key differences between Amazon Redshift and Microsoft SQL Server. Both Amazon Redshift and Microsoft SQL Server are popular data warehousing solutions used by many organizations. However, there are some significant differences between the two.
Architecture and Scalability: Amazon Redshift is built on a massively parallel processing architecture, which allows it to process large amounts of data quickly. It scales horizontally by adding more nodes to the cluster, providing high performance for data warehousing workloads. In contrast, Microsoft SQL Server uses a traditional client-server architecture, and while it can scale vertically by adding more powerful hardware, it does not offer the same level of parallel processing as Redshift.
Data Storage: Redshift stores data in a columnar fashion, which makes it highly optimized for analytical queries. This columnar storage allows for efficient compression, reducing the amount of storage required. SQL Server, on the other hand, uses a row-based storage model by default, which is better suited for transactional workloads. While SQL Server does offer columnstore indexes for columnar storage, it may not perform as well as Redshift for analytical workloads.
Data Loading: Redshift supports bulk data loading using its COPY command, which can efficiently load large amounts of data from various sources such as Amazon S3, Amazon DynamoDB, or other relational databases. SQL Server also provides options for bulk data loading, but the process may be more complex and require additional configurations.
Query Optimization: Redshift's query optimizer is designed specifically for analytical workloads and can efficiently handle complex queries involving large datasets. It incorporates various optimization techniques such as columnar storage, query compilation, and parallel execution. SQL Server, on the other hand, has a query optimizer optimized for transactional workloads and may not perform as well as Redshift for complex analytical queries.
Pricing Model: Redshift offers a pay-as-you-go pricing model, allowing users to scale their resources up or down based on their needs. It also provides options for reserved instances to reduce costs for long-term usage. SQL Server, on the other hand, follows a traditional licensing model, which may require upfront investments for hardware and software licenses.
Integration with Ecosystem: Amazon Redshift integrates well with other AWS services, such as Amazon S3 for data storage, Amazon EMR for big data processing, and Amazon QuickSight for data visualization. It also supports various third-party tools and technologies. SQL Server, on the other hand, is tightly integrated with the Microsoft ecosystem, providing seamless integration with other Microsoft products such as Azure, Power BI, and Visual Studio.
In summary, Amazon Redshift and Microsoft SQL Server differ in their architecture, scalability, data storage model, data loading mechanisms, query optimization capabilities, pricing models, and ecosystem integration. The choice between the two depends on the specific requirements and use cases of the organization.
We need to perform ETL from several databases into a data warehouse or data lake. We want to
- keep raw and transformed data available to users to draft their own queries efficiently
- give users the ability to give custom permissions and SSO
- move between open-source on-premises development and cloud-based production environments
We want to use inexpensive Amazon EC2 instances only on medium-sized data set 16GB to 32GB feeding into Tableau Server or PowerBI for reporting and data analysis purposes.
You could also use AWS Lambda and use Cloudwatch event schedule if you know when the function should be triggered. The benefit is that you could use any language and use the respective database client.
But if you orchestrate ETLs then it makes sense to use Apache Airflow. This requires Python knowledge.
Though we have always built something custom, Apache airflow (https://airflow.apache.org/) stood out as a key contender/alternative when it comes to open sources. On the commercial offering, Amazon Redshift combined with Amazon Kinesis (for complex manipulations) is great for BI, though Redshift as such is expensive.
You may want to look into a Data Virtualization product called Conduit. It connects to disparate data sources in AWS, on prem, Azure, GCP, and exposes them as a single unified Spark SQL view to PowerBI (direct query) or Tableau. Allows auto query and caching policies to enhance query speeds and experience. Has a GPU query engine and optimized Spark for fallback. Can be deployed on your AWS VM or on prem, scales up and out. Sounds like the ideal solution to your needs.
I am a Microsoft SQL Server programmer who is a bit out of practice. I have been asked to assist on a new project. The overall purpose is to organize a large number of recordings so that they can be searched. I have an enormous music library but my songs are several hours long. I need to include things like time, date and location of the recording. I don't have a problem with the general database design. I have two primary questions:
- I need to use either MySQL or PostgreSQL on a Linux based OS. Which would be better for this application?
- I have not dealt with a sound based data type before. How do I store that and put it in a table? Thank you.
Hi Erin,
Honestly both databases will do the job just fine. I personally prefer Postgres.
Much more important is how you store the audio. While you could technically use a blob type column, it's really not ideal to be storing audio files which are "several hours long" in a database row. Instead consider storing the audio files in an object store (hosted options include backblaze b2 or aws s3) and persisting the key (which references that object) in your database column.
Hi Erin, Chances are you would want to store the files in a blob type. Both MySQL and Postgres support this. Can you explain a little more about your need to store the files in the database? I may be more effective to store the files on a file system or something like S3. To answer your qustion based on what you are descibing I would slighly lean towards PostgreSQL since it tends to be a little better on the data warehousing side.
Hey Erin! I would recommend checking out Directus before you start work on building your own app for them. I just stumbled upon it, and so far extremely happy with the functionalities. If your client is just looking for a simple web app for their own data, then Directus may be a great option. It offers "database mirroring", so that you can connect it to any database and set up functionality around it!
Hi Erin! First of all, you'd probably want to go with a managed service. Don't spin up your own MySQL installation on your own Linux box. If you are on AWS, thet have different offerings for database services. Standard RDS vs. Aurora. Aurora would be my preferred choice given the benefits it offers, storage optimizations it comes with... etc. Such managed services easily allow you to apply new security patches and upgrades, set up backups, replication... etc. Doing this on your own would either be risky, inefficient, or you might just give up. As far as which database to chose, you'll have the choice between Postgresql, MySQL, Maria DB, SQL Server... etc. I personally would recommend MySQL (latest version available), as the official tooling for it (MySQL Workbench) is great, stable, and moreover free. Other database services exist, I'd recommend you also explore Dynamo DB.
Regardless, you'd certainly only keep high-level records, meta data in Database, and the actual files, most-likely in S3, so that you can keep all options open in terms of what you'll do with them.
Hi Erin,
- Coming from "Big" DB engines, such as Oracle or MSSQL, go for PostgreSQL. You'll get all the features you need with PostgreSQL.
- Your case seems to point to a "NoSQL" or Document Database use case. Since you get covered on this with PostgreSQL which achieves excellent performances on JSON based objects, this is a second reason to choose PostgreSQL. MongoDB might be an excellent option as well if you need "sharding" and excellent map-reduce mechanisms for very massive data sets. You really should investigate the NoSQL option for your use case.
- Starting with AWS Aurora is an excellent advise. since "vendor lock-in" is limited, but I did not check for JSON based object / NoSQL features.
- If you stick to Linux server, the PostgreSQL or MySQL provided with your distribution are straightforward to install (i.e. apt install postgresql). For PostgreSQL, make sure you're comfortable with the pg_hba.conf, especially for IP restrictions & accesses.
Regards,
I recommend Postgres as well. Superior performance overall and a more robust architecture.
Cloud Data-warehouse is the centerpiece of modern Data platform. The choice of the most suitable solution is therefore fundamental.
Our benchmark was conducted over BigQuery and Snowflake. These solutions seem to match our goals but they have very different approaches.
BigQuery is notably the only 100% serverless cloud data-warehouse, which requires absolutely NO maintenance: no re-clustering, no compression, no index optimization, no storage management, no performance management. Snowflake requires to set up (paid) reclustering processes, to manage the performance allocated to each profile, etc. We can also mention Redshift, which we have eliminated because this technology requires even more ops operation.
BigQuery can therefore be set up with almost zero cost of human resources. Its on-demand pricing is particularly adapted to small workloads. 0 cost when the solution is not used, only pay for the query you're running. But quickly the use of slots (with monthly or per-minute commitment) will drastically reduce the cost of use. We've reduced by 10 the cost of our nightly batches by using flex slots.
Finally, a major advantage of BigQuery is its almost perfect integration with Google Cloud Platform services: Cloud functions, Dataflow, Data Studio, etc.
BigQuery is still evolving very quickly. The next milestone, BigQuery Omni, will allow to run queries over data stored in an external Cloud platform (Amazon S3 for example). It will be a major breakthrough in the history of cloud data-warehouses. Omni will compensate a weakness of BigQuery: transferring data in near real time from S3 to BQ is not easy today. It was even simpler to implement via Snowflake's Snowpipe solution.
We also plan to use the Machine Learning features built into BigQuery to accelerate our deployment of Data-Science-based projects. An opportunity only offered by the BigQuery solution
Pros of Amazon Redshift
- Data Warehousing41
- Scalable27
- SQL17
- Backed by Amazon14
- Encryption5
- Cheap and reliable1
- Isolation1
- Best Cloud DW Performance1
- Fast columnar storage1
Pros of Microsoft SQL Server
- Reliable and easy to use139
- High performance101
- Great with .net95
- Works well with .net65
- Easy to maintain56
- Azure support21
- Full Index Support17
- Always on17
- Enterprise manager is fantastic10
- In-Memory OLTP Engine9
- Security is forefront2
- Easy to setup and configure2
- Docker Delivery1
- Columnstore indexes1
- Great documentation1
- Faster Than Oracle1
- Decent management tools1
Sign up to add or upvote prosMake informed product decisions
Cons of Amazon Redshift
Cons of Microsoft SQL Server
- Expensive Licensing4
- Microsoft2