StackShareStackShare
Follow on
StackShare

Discover and share technology stacks from companies around the world.

Follow on

© 2025 StackShare. All rights reserved.

Product

  • Stacks
  • Tools
  • Feed

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  1. Stackups
  2. Application & Data
  3. Databases
  4. Databases
  5. Apache Parquet vs MySQL

Apache Parquet vs MySQL

OverviewDecisionsComparisonAlternatives

Overview

MySQL
MySQL
Stacks129.6K
Followers108.6K
Votes3.8K
GitHub Stars11.8K
Forks4.1K
Apache Parquet
Apache Parquet
Stacks97
Followers190
Votes0

Apache Parquet vs MySQL: What are the differences?

Introduction

Apache Parquet and MySQL are two popular technologies used for managing and analyzing data. While they both serve the purpose of storing and retrieving data, there are several key differences between them. This markdown code will provide a concise summary of these differences in a clear and organized manner.

  1. Data Structure and Organization: Apache Parquet is a columnar storage file format, whereas MySQL is a relational database management system. Parquet organizes data by column, making it efficient for analytical queries on large datasets. MySQL, on the other hand, stores data in tables with rows and columns, designed for transactional processing.

  2. Compression: Parquet uses efficient compression algorithms, such as Snappy or Gzip, to reduce storage space and improve query performance. MySQL supports various compression techniques, but unlike Parquet, it doesn't offer native columnar compression, which can limit storage efficiency and query speed.

  3. Schema Evolution: Parquet allows for schema evolution, enabling the addition, removal, and modification of columns to an existing dataset without affecting compatibility with older versions of the schema. MySQL requires explicit schema modifications and migrations to accommodate changes, making it less flexible in terms of schema evolution.

  4. Performance: Parquet's columnar storage and compression techniques contribute to faster query execution and improved performance for analytical workloads. On the other hand, MySQL's row-based storage and indexing mechanisms make it more suitable for transactional processing, where read and write operations occur frequently.

  5. Data Volume Flexibility: Parquet is designed to handle large volumes of data efficiently, making it well-suited for big data analytics and processing. MySQL, while capable of storing large amounts of data, may face performance challenges when dealing with extremely high volumes, leading to potential scalability issues.

  6. Data Processing Capabilities: Parquet integrates well with distributed data processing frameworks like Apache Spark, enabling parallel and distributed processing for large-scale data analytics. MySQL provides rich functionality and support for complex transactions, making it a reliable choice for transactional processing and maintaining data integrity.

In summary, Apache Parquet and MySQL differ in their data structure, compression techniques, schema evolution flexibility, performance characteristics, scalability for large data volumes, and compatibility with data processing frameworks. The choice between them depends on specific requirements, with Parquet being favorable for analytical processing and MySQL for transactional processing.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs
CLI (Node.js)
or
Manual

Advice on MySQL, Apache Parquet

Kyle
Kyle

Web Application Developer at Redacted DevWorks

Dec 3, 2019

DecidedonPostGISPostGIS

While there's been some very clever techniques that has allowed non-natively supported geo querying to be performed, it is incredibly slow in the long game and error prone at best.

MySQL finally introduced it's own GEO functions and special indexing operations for GIS type data. I prototyped with this, as MySQL is the most familiar database to me. But no matter what I did with it, how much tuning i'd give it, how much I played with it, the results would come back inconsistent.

It was very disappointing.

I figured, at this point, that SQL Server, being an enterprise solution authored by one of the biggest worldwide software developers in the world, Microsoft, might contain some decent GIS in it.

I was very disappointed.

Postgres is a Database solution i'm still getting familiar with, but I noticed it had no built in support for GIS. So I hilariously didn't pay it too much attention. That was until I stumbled upon PostGIS and my world changed forever.

449k views449k
Comments
Ido
Ido

Mar 6, 2020

Decided

My data was inherently hierarchical, but there was not enough content in each level of the hierarchy to justify a relational DB (SQL) with a one-to-many approach. It was also far easier to share data between the frontend (Angular), backend (Node.js) and DB (MongoDB) as they all pass around JSON natively. This allowed me to skip the translation layer from relational to hierarchical. You do need to think about correct indexes in MongoDB, and make sure the objects have finite size. For instance, an object in your DB shouldn't have a property which is an array that grows over time, without limit. In addition, I did use MySQL for other types of data, such as a catalog of products which (a) has a lot of data, (b) flat and not hierarchical, (c) needed very fast queries.

575k views575k
Comments
Navraj
Navraj

CEO at SuPragma

Apr 16, 2020

Needs adviceonMySQLMySQLPostgreSQLPostgreSQL

I asked my last question incorrectly. Rephrasing it here.

I am looking for the most secure open source database for my project I'm starting: https://github.com/SuPragma/SuPragma/wiki

Which database is more secure? MySQL or PostgreSQL? Are there others I should be considering? Is it possible to change the encryption keys dynamically?

Thanks,

Raj

401k views401k
Comments

Detailed Comparison

MySQL
MySQL
Apache Parquet
Apache Parquet

The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

It is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

-
Columnar storage format;Type-specific encoding; Pig integration; Cascading integration; Crunch integration; Apache Arrow integration; Apache Scrooge integration;Adaptive dictionary encoding; Predicate pushdown; Column stats
Statistics
GitHub Stars
11.8K
GitHub Stars
-
GitHub Forks
4.1K
GitHub Forks
-
Stacks
129.6K
Stacks
97
Followers
108.6K
Followers
190
Votes
3.8K
Votes
0
Pros & Cons
Pros
  • 800
    Sql
  • 679
    Free
  • 562
    Easy
  • 528
    Widely used
  • 490
    Open source
Cons
  • 16
    Owned by a company with their own agenda
  • 3
    Can't roll back schema changes
No community feedback yet
Integrations
No integrations available
Hadoop
Hadoop
Java
Java
Apache Impala
Apache Impala
Apache Thrift
Apache Thrift
Apache Hive
Apache Hive
Pig
Pig

What are some alternatives to MySQL, Apache Parquet?

MongoDB

MongoDB

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

PostgreSQL

PostgreSQL

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

Microsoft SQL Server

Microsoft SQL Server

Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

SQLite

SQLite

SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Cassandra

Cassandra

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Memcached

Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

MariaDB

MariaDB

Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry. MariaDB is designed as a drop-in replacement of MySQL(R) with more features, new storage engines, fewer bugs, and better performance.

RethinkDB

RethinkDB

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

ArangoDB

ArangoDB

A distributed free and open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

InfluxDB

InfluxDB

InfluxDB is a scalable datastore for metrics, events, and real-time analytics. It has a built-in HTTP API so you don't have to write any server side code to get up and running. InfluxDB is designed to be scalable, simple to install and manage, and fast to get data in and out.

Related Comparisons

Bootstrap
Materialize

Bootstrap vs Materialize

Laravel
Django

Django vs Laravel vs Node.js

Bootstrap
Foundation

Bootstrap vs Foundation vs Material UI

Node.js
Spring Boot

Node.js vs Spring-Boot

Liquibase
Flyway

Flyway vs Liquibase