Hadoop vs Microsoft SQL Server vs Oracle

Need advice about which tool to choose?Ask the StackShare community!

Hadoop

2.5K
2.3K
+ 1
56
Microsoft SQL Server

19.9K
15.3K
+ 1
540
Oracle

2.3K
1.8K
+ 1
113

Hadoop vs Microsoft SQL Server vs Oracle: What are the differences?

  1. Scalability: Hadoop is designed to handle massive amounts of data and can easily scale by adding more nodes to the cluster. On the other hand, Microsoft SQL Server and Oracle have limitations in terms of scalability, especially when dealing with petabytes of data.
  2. Cost: Hadoop is an open-source framework, making it more cost-effective compared to proprietary solutions like Microsoft SQL Server and Oracle, which require substantial licensing fees and ongoing maintenance costs.
  3. Data Processing Model: Hadoop uses a distributed processing model, enabling parallel processing of data across multiple nodes in a cluster. In contrast, Microsoft SQL Server and Oracle primarily rely on a centralized processing model, which can be a bottleneck for handling large volumes of data.
  4. Data Type Support: Hadoop is well-suited for processing unstructured and semi-structured data, such as text documents and log files, while Microsoft SQL Server and Oracle excel in handling structured data stored in relational databases.
  5. Ecosystem: Hadoop offers a rich ecosystem of tools and technologies for data processing and analytics, including Apache Spark, Hive, and HBase. In comparison, the ecosystem around Microsoft SQL Server and Oracle is more tightly integrated with their respective platforms, providing a more streamlined but less flexible environment.

In Summary, Hadoop excels in scalability, cost-effectiveness, data processing model, handling unstructured data, and its diverse ecosystem compared to Microsoft SQL Server and Oracle.

Advice on Hadoop, Microsoft SQL Server, and Oracle
Needs advice
on
HadoopHadoopMarkLogicMarkLogic
and
SnowflakeSnowflake

For a property and casualty insurance company, we currently use MarkLogic and Hadoop for our raw data lake. Trying to figure out how snowflake fits in the picture. Does anybody have some good suggestions/best practices for when to use and what data to store in Mark logic versus Snowflake versus a hadoop or all three of these platforms redundant with one another?

See more
Needs advice
on
HadoopHadoopMarkLogicMarkLogic
and
SnowflakeSnowflake

for property and casualty insurance company we current Use marklogic and Hadoop for our raw data lake. Trying to figure out how snowflake fits in the picture. Does anybody have some good suggestions/best practices for when to use and what data to store in Mark logic versus snowflake versus a hadoop or all three of these platforms redundant with one another?

See more
Replies (1)
Ivo Dinis Rodrigues
none of you bussines at Marklogic · | 1 upvotes · 21.1K views
Recommends

As i see it, you can use Snowflake as your data warehouse and marklogic as a data lake. You can add all your raw data to ML and curate it to a company data model to then supply this to Snowflake. You could try to implement the dw functionality on marklogic but it will just cost you alot of time. If you are using Aws version of Snowflake you can use ML spark connector to access the data. As an extra you can use the ML also as an Operational report system if you join it with a Reporting tool lie PowerBi. With extra apis you can also provide data to other systems with ML as source.

See more
Needs advice
on
HadoopHadoopInfluxDBInfluxDB
and
KafkaKafka

I have a lot of data that's currently sitting in a MariaDB database, a lot of tables that weigh 200gb with indexes. Most of the large tables have a date column which is always filtered, but there are usually 4-6 additional columns that are filtered and used for statistics. I'm trying to figure out the best tool for storing and analyzing large amounts of data. Preferably self-hosted or a cheap solution. The current problem I'm running into is speed. Even with pretty good indexes, if I'm trying to load a large dataset, it's pretty slow.

See more
Replies (1)
Recommends
on
DruidDruid

Druid Could be an amazing solution for your use case, My understanding, and the assumption is you are looking to export your data from MariaDB for Analytical workload. It can be used for time series database as well as a data warehouse and can be scaled horizontally once your data increases. It's pretty easy to set up on any environment (Cloud, Kubernetes, or Self-hosted nix system). Some important features which make it a perfect solution for your use case. 1. It can do streaming ingestion (Kafka, Kinesis) as well as batch ingestion (Files from Local & Cloud Storage or Databases like MySQL, Postgres). In your case MariaDB (which has the same drivers to MySQL) 2. Columnar Database, So you can query just the fields which are required, and that runs your query faster automatically. 3. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. 4. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures 5. Gives ana amazing centralized UI to manage data sources, query, tasks.

See more

I am a Microsoft SQL Server programmer who is a bit out of practice. I have been asked to assist on a new project. The overall purpose is to organize a large number of recordings so that they can be searched. I have an enormous music library but my songs are several hours long. I need to include things like time, date and location of the recording. I don't have a problem with the general database design. I have two primary questions:

  1. I need to use either MySQL or PostgreSQL on a Linux based OS. Which would be better for this application?
  2. I have not dealt with a sound based data type before. How do I store that and put it in a table? Thank you.
See more
Replies (6)

Hi Erin,

Honestly both databases will do the job just fine. I personally prefer Postgres.

Much more important is how you store the audio. While you could technically use a blob type column, it's really not ideal to be storing audio files which are "several hours long" in a database row. Instead consider storing the audio files in an object store (hosted options include backblaze b2 or aws s3) and persisting the key (which references that object) in your database column.

See more
Aaron Westley
Recommends
on
PostgreSQLPostgreSQL

Hi Erin, Chances are you would want to store the files in a blob type. Both MySQL and Postgres support this. Can you explain a little more about your need to store the files in the database? I may be more effective to store the files on a file system or something like S3. To answer your qustion based on what you are descibing I would slighly lean towards PostgreSQL since it tends to be a little better on the data warehousing side.

See more
Julien DeFrance
Principal Software Engineer at Tophatter · | 3 upvotes · 479.9K views
Recommends
on
Amazon AuroraAmazon Aurora

Hi Erin! First of all, you'd probably want to go with a managed service. Don't spin up your own MySQL installation on your own Linux box. If you are on AWS, thet have different offerings for database services. Standard RDS vs. Aurora. Aurora would be my preferred choice given the benefits it offers, storage optimizations it comes with... etc. Such managed services easily allow you to apply new security patches and upgrades, set up backups, replication... etc. Doing this on your own would either be risky, inefficient, or you might just give up. As far as which database to chose, you'll have the choice between Postgresql, MySQL, Maria DB, SQL Server... etc. I personally would recommend MySQL (latest version available), as the official tooling for it (MySQL Workbench) is great, stable, and moreover free. Other database services exist, I'd recommend you also explore Dynamo DB.

Regardless, you'd certainly only keep high-level records, meta data in Database, and the actual files, most-likely in S3, so that you can keep all options open in terms of what you'll do with them.

See more
Christopher Wray
Web Developer at Soltech LLC · | 3 upvotes · 480.3K views
Recommends
on
DirectusDirectus
at

Hey Erin! I would recommend checking out Directus before you start work on building your own app for them. I just stumbled upon it, and so far extremely happy with the functionalities. If your client is just looking for a simple web app for their own data, then Directus may be a great option. It offers "database mirroring", so that you can connect it to any database and set up functionality around it!

See more
Recommends
on
PostgreSQLPostgreSQL

Hi Erin,

  • Coming from "Big" DB engines, such as Oracle or MSSQL, go for PostgreSQL. You'll get all the features you need with PostgreSQL.
  • Your case seems to point to a "NoSQL" or Document Database use case. Since you get covered on this with PostgreSQL which achieves excellent performances on JSON based objects, this is a second reason to choose PostgreSQL. MongoDB might be an excellent option as well if you need "sharding" and excellent map-reduce mechanisms for very massive data sets. You really should investigate the NoSQL option for your use case.
  • Starting with AWS Aurora is an excellent advise. since "vendor lock-in" is limited, but I did not check for JSON based object / NoSQL features.
  • If you stick to Linux server, the PostgreSQL or MySQL provided with your distribution are straightforward to install (i.e. apt install postgresql). For PostgreSQL, make sure you're comfortable with the pg_hba.conf, especially for IP restrictions & accesses.

Regards,

See more
Klaus Nji
Staff Software Engineer at SailPoint Technologies · | 1 upvotes · 480K views
Recommends
on
PostgreSQLPostgreSQL

I recommend Postgres as well. Superior performance overall and a more robust architecture.

See more
Decisions about Hadoop, Microsoft SQL Server, and Oracle
Daniel Moya
Data Engineer at Dimensigon · | 4 upvotes · 464.6K views

We have chosen Tibero over Oracle because we want to offer a PL/SQL-as-a-Service that the users can deploy in any Cloud without concerns from our website at some standard cost. With Oracle Database, developers would have to worry about what they implement and the related costs of each feature but the licensing model from Tibero is just 1 price and we have all features included, so we don't have to worry and developers using our SQLaaS neither. PostgreSQL would be open source. We have chosen Tibero over Oracle because we want to offer a PL/SQL that you can deploy in any Cloud without concerns. PostgreSQL would be the open source option but we need to offer an SQLaaS with encryption and more enterprise features in the background and best value option we have found, it was Tibero Database for PL/SQL-based applications.

See more

We wanted a JSON datastore that could save the state of our bioinformatics visualizations without destructive normalization. As a leading NoSQL data storage technology, MongoDB has been a perfect fit for our needs. Plus it's open source, and has an enterprise SLA scale-out path, with support of hosted solutions like Atlas. Mongo has been an absolute champ. So much so that SQL and Oracle have begun shipping JSON column types as a new feature for their databases. And when Fast Healthcare Interoperability Resources (FHIR) announced support for JSON, we basically had our FHIR datalake technology.

See more

In the field of bioinformatics, we regularly work with hierarchical and unstructured document data. Unstructured text data from PDFs, image data from radiographs, phylogenetic trees and cladograms, network graphs, streaming ECG data... none of it fits into a traditional SQL database particularly well. As such, we prefer to use document oriented databases.

MongoDB is probably the oldest component in our stack besides Javascript, having been in it for over 5 years. At the time, we were looking for a technology that could simply cache our data visualization state (stored in JSON) in a database as-is without any destructive normalization. MongoDB was the perfect tool; and has been exceeding expectations ever since.

Trivia fact: some of the earliest electronic medical records (EMRs) used a document oriented database called MUMPS as early as the 1960s, prior to the invention of SQL. MUMPS is still in use today in systems like Epic and VistA, and stores upwards of 40% of all medical records at hospitals. So, we saw MongoDB as something as a 21st century version of the MUMPS database.

See more
Manage your open source components, licenses, and vulnerabilities
Learn More
Pros of Hadoop
Pros of Microsoft SQL Server
Pros of Oracle
  • 39
    Great ecosystem
  • 11
    One stack to rule them all
  • 4
    Great load balancer
  • 1
    Amazon aws
  • 1
    Java syntax
  • 139
    Reliable and easy to use
  • 101
    High performance
  • 95
    Great with .net
  • 65
    Works well with .net
  • 56
    Easy to maintain
  • 21
    Azure support
  • 17
    Always on
  • 17
    Full Index Support
  • 10
    Enterprise manager is fantastic
  • 9
    In-Memory OLTP Engine
  • 2
    Easy to setup and configure
  • 2
    Security is forefront
  • 1
    Great documentation
  • 1
    Faster Than Oracle
  • 1
    Columnstore indexes
  • 1
    Decent management tools
  • 1
    Docker Delivery
  • 1
    Max numar of connection is 14000
  • 44
    Reliable
  • 33
    Enterprise
  • 15
    High Availability
  • 5
    Hard to maintain
  • 5
    Expensive
  • 4
    Maintainable
  • 4
    Hard to use
  • 3
    High complexity

Sign up to add or upvote prosMake informed product decisions

Cons of Hadoop
Cons of Microsoft SQL Server
Cons of Oracle
    Be the first to leave a con
    • 4
      Expensive Licensing
    • 2
      Microsoft
    • 1
      Data pages is only 8k
    • 1
      Allwayon can loose data in asycronious mode
    • 1
      Replication can loose the data
    • 1
      The maximum number of connections is only 14000 connect
    • 14
      Expensive

    Sign up to add or upvote consMake informed product decisions

    - No public GitHub repository available -
    - No public GitHub repository available -

    What is Hadoop?

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

    What is Microsoft SQL Server?

    Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

    What is Oracle?

    Oracle Database is an RDBMS. An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism is called an object-relational database management system (ORDBMS). Oracle Database has extended the relational model to an object-relational model, making it possible to store complex business models in a relational database.

    Need advice about which tool to choose?Ask the StackShare community!

    What companies use Hadoop?
    What companies use Microsoft SQL Server?
    What companies use Oracle?

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Hadoop?
    What tools integrate with Microsoft SQL Server?
    What tools integrate with Oracle?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    MySQLKafkaApache Spark+6
    2
    2065
    Aug 28 2019 at 3:10AM

    Segment

    PythonJavaAmazon S3+16
    7
    2627
    What are some alternatives to Hadoop, Microsoft SQL Server, and Oracle?
    Cassandra
    Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
    MongoDB
    MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
    Elasticsearch
    Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
    Splunk
    It provides the leading platform for Operational Intelligence. Customers use it to search, monitor, analyze and visualize machine data.
    Snowflake
    Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.
    See all alternatives