Need advice about which tool to choose?Ask the StackShare community!
PostgreSQL vs Presto: What are the differences?
Key differences between PostgreSQL and Presto
PostgreSQL and Presto are both popular open-source databases that serve different purposes and have different features. Here are the key differences between the two:
Data Storage and Query Engine: PostgreSQL is a relational database management system (RDBMS) that uses a traditional row-based storage model. It is designed for handling structured data and supports ACID transactions. Presto, on the other hand, is a distributed SQL query engine that is built for fast, interactive analytics across various data sources. It can query data from diverse sources like Hadoop, Cassandra, and relational databases.
Scalability and Performance: PostgreSQL is known for its scalability and can handle large amounts of data efficiently, especially with proper indexing and optimization. However, Presto is specifically designed for distributed computing and can process massive amounts of data across multiple nodes, providing high performance even for complex queries.
Data Types and Query Language: PostgreSQL supports a wide range of data types and has a rich set of built-in functions and operators. It also provides an extensive query language (SQL) with support for complex queries and advanced features like window functions. Presto supports a subset of PostgreSQL data types and provides its own query language called PrestoSQL, which is similar to SQL but also includes some additional functions and syntax.
Use Cases and Workloads: PostgreSQL is commonly used as a general-purpose database for various applications and workloads, including web applications, content management systems, and data warehousing. It is well-suited for transactional use cases and OLTP workloads. Presto, on the other hand, is primarily used for analytical queries and data exploration. It is often deployed in data lakes or data warehouses for ad-hoc queries and interactive analytics.
Ecosystem and Integration: PostgreSQL has a mature ecosystem with a wide range of tools and libraries that integrate seamlessly with it. It supports a large number of extensions and has a strong community support. Presto, being a distributed query engine, integrates with various data sources and can query data from different systems using connectors. It also supports pluggable connectors for extending its capabilities.
Ease of Use and Administration: PostgreSQL has a reputation for being easy to use and has excellent documentation and community resources. It provides a graphical interface (pgAdmin) and various command-line tools for managing and administering the database. Presto, being primarily a query engine, requires more setup and configuration compared to PostgreSQL. It is typically managed using tools like Presto Manager and requires more expertise in distributed systems.
In summary, PostgreSQL is a powerful relational database management system with advanced features and broad applicability, while Presto is a distributed SQL query engine specifically designed for high-performance analytics and querying diverse data sources. The choice between them depends on the specific use case and requirements of the project.
For learning purposes, I am trying to design a dashboard that displays the total revenue from all connected webshops/marketplaces, displaying incoming orders, total orders, etc.
So I will need to get the data (using Node backend) from the Shopify and marketplace APIs, storing this in the database, and get the data from the back end.
My question is:
What kind of database should I use? Is MongoDB fine for storing this kind of data? Or should I go with a SQL database?
Postgres is a solid database with a promising background. In the relational side of database design, I see Postgres as an absolute; Now the arguments and conflicts come in when talking about NoSQL data types. The truth is jsonb in Postgres is efficient and gives a good performance and storage. In a comparison with MongoDB with the same resources (such as RAM and CPU) with better tools and community, I think you should go for Postgres and use jsonb for some of the data. All in all, don't use a NoSQL database just cause you have the data type matching this tech, have both SQL and NoSQL at the same time.
I have found MongoDB easier to work with. Postgres and SQL in general, in my experience, is harder to work with. While Postgres does provide data consistency, MongoDB provides flexibility. I've found the MongoDB ecosystem to be really great with a good community. I've worked with MongoDB in production and it's been great. I really like the aggregation system and using query operators such as $in, $pull, $push.
While my opinion may be unpopular, I have found MongoDB really great for relational data, using aggregations from a code perspective. In general, data types are also more flexible with MongoDB.
I will use PostgreSQL because you have more powerfull feature for data agregation and views (the raw data from shopify and others could be stored as is) and then use views to produce diff. kind of reports unless you wanna create those aggregations/views in nodejs code. HTH
I want to store the data retrieved from multiple APIs and perform some analytics on it. The data stored in DB will never/hardly change. First, I thought it would be better to retrieve the data and create table columns for them, but some data might have different columns than others. So I thought about storing the JSON response from API directly to the table and use it. So which database will be the better choice, PostgreSQL or MongoDB.
Hey Krunal, your requirement sounds pretty clear and specific to what you want to do with that data. My recommendation to you, would be to use MongoDB. Since schema-less IO is faster in MongoDB, your general speed of reading / writing from and to the database would be quick. Additionally, the aggregate framework is very powerful with large data so that is also something that you can use in computing your analytics.
I suggest you to go with MongoDB
, because it is schema-less, i.e., it permits you to easily manipulate the schema of a table. If you want to add a column, it can be done without much effort. Moreover, MongoDB
can deal with more types of data, since the latest is stored as key-value pair. I do not what kind of analysis you are going to do, but NoSQL
is not the best choice if you are going to use complex queries. In addition, if you are working with huge amount of data and you are interested in optimising the performance, I suggest you PostgreSQL
.
Since you are speaking about API and JSON, I guess that you may using Node JS
for fetching API. I suggest you to try Mongoose
, which facilitate the use of MongoDB
with Node JS
.
Looks like the use case is to store JSON data. mongoDB and Postgres differ in so many aspects like scaling and consistency. Postgres has excellent JSON support now with the power of SQL. MongoDB is good in handling schema less data. However in this case it seems these differences don’t matter that much. I’d recommend you go with what you are most comfortable with.
I don't have an unquestionable opinion regarding your use case. I only trend to pick the MongoDB since it is schemaless avoiding null columns that you not always know when it is used (it depends on the source of the data). The only drawback that I could consider is the query's complexity in MongoDB, sometimes it is a bit tricky, when compared to the traditional SQL queries.
This is largely a matter of opinion. I see that someone else responded and recommended MongoDB but since you are doing data analytics, I highly recommend you go with SQL. You're going to have a really hard time normalizing the data when you can't manipulate relationships and bulk edit with a nice update query.
I'm much more experienced with MySQL than any other database and I am having a hard time getting on board with noSQL entirely because it's really hard to query complex data with relationships using noSQL. I'm using Firestore with one of my apps and MongoDB with another app but they both use MySQL for the heavy lifting and then a document database for things like permissions, caching, etc.
It sounds like the type of problem you need to reverse engineer. I'm sure you can imagine what the data sets would look like if you use MongoDB or Postgres. I suspect that putting in a little bit more work up front will pay high dividends and productivity once the data is normalized.
Again - it's largely a matter of preference but I prefer SQL almost every time.
I need urgent advice from you all! I am making a web-based food ordering platform which includes 3 different ordering methods (Dine-in using QR code scanning + Take away + Home Delivery) and a table reservation system. We are using React for the front-end, and I need your advice if I should use NestJS or ExpressJS for the backend. And regarding the database, which database should I use, MongoDB or PostgreSQL? Which combination will be better? PS. We want to follow the microservice architecture as scalability, reliability, and usability are the most important Non Functional requirements. Expert advice is needed, please. A load of thanks in advance. Kind Regards, Miqdad
I can't speak for the NestJS vs ExpressJS discussion, but I can given a viewpoint on databases.
The main thing to consider around database choice, is what "shape" the data will be in, and the kind of read/write patterns you expect of that data. The blog example shows up so much for DBMS like MongoDB, because it showcases what NoSQL / document storage is very scalable and performant in: mostly isolated documents with a few views / ways to order them and filter them. In your case, I can imagine a number of "relations" already, which suggest a more traditional SQL solution would work well: You have restaurants, they have maybe a few menus (regular, gluten-free etc), with menu items in, which have different prices over time (25% discount on christmas food just after christmas, 50% off pizzas on wednesdays). Then there's a whole different set of "relations" for people ordering, like showing them past orders, which need to refer to the restaurant etc, and credit card transaction information for refunds etc. That to me suggests PostgreSQL, which will scale quite well if you database design is okay.
PostgreSQL also offers you some extensions, which are just amazing for your use-case. https://postgis.net/ for example will let you query for restaurants based on location, without the big cost that comes from constantly using something like Google Maps API to work out which restaurants are near to someone ordering. Partitioning and window functions will be great for your own use internally too, like answering questions of "What types of takeways perform the best for us, Italian, Mexican?" or in combination with PostGIS, answering questions like "What kind of takeways do we need to market to, to improve our selection?".
While these things can all be implemented in MongoDB, you tend to lose some of the convenience of ACID or have to deal with things like eventual consistency, which requires more thinking on the part of your engineers. PostgreSQL offers decent (if more complex) scalablity and redundancy solutions, and is honestly very well proven and plenty of documentation exists on optimising queries.
Hello, i build microservice systems using Angular And Spring (Java) so i can't help with with ur back end choice, BUT, i definitely advice you to use a Nosql database, thus MongoDB of course or even Cassandra if your looking for extreme scalability with zero point of failure. Anyway, Nosql if much more faster then Sql (in your case Postresql DB). All you wanna do with sql can also be done by nosql (not the opposite of course).I also advice you to use docker containers + kubernetes to orchestrate them, if you need scalability and replication, that way your app can support auto scalability (in case ur users number goes high). Best of luck
About PostgreSQL vs MongoDB: short answer. Both are great. Choose what you like the most. Only if you expect millions of users, I‘ll incline with MongoDB.
I need to add a DBMS to my stack, but I don't know which. I'm tempted to learn SQLite since it would be useful to me with its focus on local access without concurrency. However, doing so feels like I would be defeating the purpose of trying to expand my skill set since it seems like most enterprise applications have the opposite requirements.
To be able to apply what I learn to more projects, what should I try to learn? MySQL? PostgreSQL? Something else? Is there a comfortable middle ground between high applicability and ease of use?
You can easily start with SQlite. Really easy to startup since it doesn't require you to install any additional software since is self-contained. It has interfaces in almost any language and also GUIs. Start learning SQL basics and simpler data models and structures. There are many tutorials, also available in the official website. From there you will easily migrate to another database. MySQL could be next, sonce it's easier to learn at first and has more resources available. PostgreSQL is less widespread, more challenging and has the fewer resorces, but once you have some experience with MySQL is really easy to learn as well. All these technologies are really widespread and used accross the industry so you won't make a wrong decision with any of these.
A question you might want to think about is "What kind of experience do I want to gain, by using a DBMS?". If your aim is to have experience with SQL and any related libraries and frameworks for your language of choice (python, I think?), then it kind of doesn't matter too much which you pick so much. As others have said, SQLite would offer you the ability to very easily get started, and would give you a reasonably standard (if a little basic) SQL dialect to work with.
If your aim is actually to have a bit of "operational" experience, in terms of things like what command line tools might be available as standard for the DBMS, understanding how the DBMS handles multiple databases, when to use multiple schemas vs multiple databases, some basic privilege management etc. Then I would recommend PostgreSQL. SQLite's simplicity actually avoids most of these experiences, which is not helpful to you if that is what you hope to learn. MySQL has a few "quirks" to how it manages things like multiple databases, which may lead you to making less good decisions if you tried to take your experience over to different DBMS, especially in bigger enterprise roles. PostgreSQL is kind of a happy middle ground here, with the ability to start PostgreSQL servers via docker or docker-compose making the actual day-to-day management pretty easy, while still giving you experience of the kinds of considerations I have listed above.
At Vital Beats we make use of PostgreSQL, largely because it offers us a happy balance between good management and backup of data, and good standard command line tools, which is essential for us where we are deploying our solutions within Kubernetes / docker, and so more graphical tools are not always appropriate for us. PostgreSQL is also pretty universally supported in terms of language libraries and frameworks, without having to make compromises on how we want to store and layout our data.
MySQL's very popular, easy to install, is also available as a managed service across most popular cloud offerings. The support/default tooling (such as MySQL Query Workbench) certainly is a little more baked than what you'll find for Postgres.
Hi all. I am an informatics student, and I need to realise a simple website for my friend. I am planning to realise the website using Node.js and Mongoose, since I have already done a project using these technologies. I also know SQL, and I have used PostgreSQL and MySQL previously.
The website will show a possible travel destination and local transportation. The database is used to store information about traveling, so only admin will manage the content (especially photos). While clients will see the content uploaded by the admin. I am planning to use Mongoose because it is very simple and efficient for this project. Please give me your opinion about this choice.
The use case you are describing would benefit from a self-hosted headless CMS like contentful. You can also go for Strapi with a database of your choice but here you would have to host Strapi and the underlying database (if not using SQLite) yourself. If you want to use Strapi, you can ease your work by using something like PlanetSCaleDB as the backing database for Strapi.
Your requirements seem nothing special. on the other hand, MongoDB is commonly used with Node. you could use Mongo without defining a Schema, does it give you any benefits? Also, note that development speed matters. In most cases RDBMS are the best choice, Learn and use Postgres for life!
MongoDB and Mongoose are commonly used with Node.js and the use case doesn't seem to be requiring any special considerations as of now. However using MongoDB now will allow you to easily expand and modify your use case in future.
If not MongoDB, then my second choice will be PostgreSQL. It's a generic purpose database with jsonb support (if you need it) and lots of resources online. Nobody was fired for choosing PostgreSQL.
SQL is not so good at query lat long out of the box. you might need to use additional tools for that like UTM coordinates or Uber's H3.
If you use mongoDB, it support 2d coordinate query out of the box.
Any database will be a great choice for your app, which is less of a technical challenge and more about great content. Go for it, the geographical search features maybe be actually handy for you.
Any database engine should work well but I vote for Postgres because of PostGIS extension that may be handy for travel related site. There's nothing special about your requirements.
Hi, Maxim! Most likely, the site is almost ready. But we would like to share our development with you. https://falcon.web-automation.ru/ This is a constructor for web application. With it, you can create almost any site with different roles which have different levels of access to information and different functionality. The platform is managed via sql. knowing sql, you will be able to change the business logic as necessary and during further project maintenance. We will be glad to hear your feedback about the platform.
I am going to work on a real estate project and have to decide on a database. Now, SQL databases can be very efficient if appropriately designed. More relations between the data and less redundancy. But with a #NoSQL database, the development time is reduced, and it is easy to query. Since this is my first time working on the real estate domain, I would like to pick a database that would be efficient in the long run.
I recommend PostgreSQL as it’s the most powerful out of the 3 databases you mentioned. It supports JSON objects so you can mimic the MongoDB functionality, but I would also argue that SQL is actually quite powerful and in many cases significantly easier to work with than with NoSQL databases.
Stay away from foreign keys, keep it fast and simple. Define your data structures well in advance. Try to model your data structures based on your system’s vision; based on where it’s going and not based solely on what you currently need it to do. This will help you avoid drastic changes to your database after your system is launched. Populate the database with fake data and run tests. PostgreSQL allows you to create Views from multiple tables. Try to create those views and make sure you can easily create useful views from multiple tables. Run an Explain on those view queries to make sure you created your indexes correctly. Make sure it’s fast!
Any of those three databases are going to be efficient, scalable, and reliable in the long term if you configure and use them correctly. They all also have solid hosting solutions.
All things being equal, I would agree with other posters that Postgres is my preference among the three, but there are caveats.
MongoDB and MySQL have better support for mutli-region replication in your big three cloud environments. Azure recently bought Citus Data, which was a best-in-class Postgres replication solution, so they might be the only one I trust to provide cross-region replication at the moment.
If you have a single region deployment and are on AWS, I can't recommend Aurora Postgres highly enough. It's a very good implementation and extremely performant.
I'll second another piece of advice. Postgresql's JSON columns are a dream when it comes to productivity and I use them frequently with our Rails application. In these cases, no migration is required to change schema. We store payloads with dozens or hundreds of keys and performance has not been an issue. We also have a lot of relational tables, so the joins we get with SQL are very important to us and hard to replicate with a NoQL solution.
That really depends of where do you see you application in the long run. On any application, any of those choices are excellent. You could argue about good support on JSON binaries, but even MySQL has an excellent support for that on the latest versions.
On the long run, when your application gets hundreds of thousands of requests per second, you might start thinking about how many inputs you will have in the database compared to the outputs. PostgresSQL it’s excellent at giving you outputs, but table corruption can happen when you start receiving this massive number of inputs (Which was the reason Uber switched from Postgres to MySQL)
On our OPS Platform at CTO.ai , we decided to use Postgres, because we need a reliable and agile way to send the output to our users, so that was out best choice in the long run for our product.
I am one of those who believes that MongoDB can be used for everything, this thanks to the advertising of MongoDB.
We are creating an e-commerce platform, we know that it has many relationships, but with MongoDB we can avoid some, but in the end, some relationships have to exist.
A single developer to create two native applications in Flutter, a web application with React, create the backend with multiple microservices hosted with Google Cloud Run. PostgreSQL can be heavy because it should be used with an ORM, on the contrary, with MongoDB you can avoid some relationships and avoid ORM / ODM.
We need advice from someone who has the experience and has had to choose between these two databases for an e-commerce site.
The real question here is not about the technology but rather your real needs and your data. Do you need to manage data that has core concepts and relations ? (such as a family, with parents and children) or do you need to manage a basic collection of similar data (such as blog entries)? PostgreSQL is definitely a relational database for managing entities and their relationships whereas MongoDB (I may be strongly opinionated here ;-) ) is more targeted at managing collection of entities (such as the blog entries). For an e-commerce site (with some products, products categories, user ratings and comments, prices, bundles...) I would go for PostgreSQL as it will support/guide you in creating a structured data set with all your products, organized in categories and with user ratings/comments attached to them. HTH
Had exactly the same question when selecting data storage for our new product. Not e-commerce though, rather interactive and content-focused HR SaaS for SME.
The key arguments for PostgreSQL
It gives you the opportunity to use relationships where you really need it and just go with key-value tables where you don't.
With Jsonb datatype you can store documents/objects/arrays as JSON then use JSON elements in queries and even indexes.
There are more tools/integrations working with PostgreSQL which you can use out of the box, e.g. Hasura
I am in your spot, exactly. A few months ago, I had decided to use Postgres because since its version 9 it showed a lot of progress for being a high-availability database. However, frankly, I didn't want to model statically all data, since I have several distinct schemas (like for different product types) and I wanted some flexibility to add or remove as I saw fit. One of the main challenges with analyzing a NoSQL database being familiar in the SQL ways, is that it's easy to look for "analogies" for what makes SQL useful, like relationship enforcing, transactions and the cascading effect on deletes, updates and inserts, and that limit your vision a lot when analyzing a tool like Mongo, especially in a micro-services pattern. Now-a-days, I really found my solution in Mongo. Not just because of it being NoSQL, but because all of the support I find in the NodeJS community through packages and utilities that make it dead easy to use it for several use-cases. Whatever Postgres offers, Mongo does it a little easier and better, like text search and geo-queries. What you need to see is to model your data in a way that makes sense with Mongo. For instance, I've got a User service that has all auth related information of a user. But then, I have the same user in the Profile service, with the same id, but totally different fields. You have two de facto ways to connect data, by reference and embedding, which in Ecommerce, both have big uses. Like using references to relate a User to a Profile, and an embed to relate a Product to an Order. There's even a third, albeit a little more "manual" implementation here, the graph relationship in which you can model data, in which you can easily model event-driven documents, like a Purchase that goes from "a customer" to "a store", which you can later use for much easier and deep analytics than with the classical SQL stance. MariaDB has it readily available, and also has many improvements over MySQL and Postgres, especially for NoSQL features and scalability. Sadly it is just seen as a MySQL clone, but it offers more than that (although its documentation could be improved). Using Mongo in a micro-service environment is even better because your models can be smaller, meaning less burden on relationships, although you do compensate with a bit of duplication, but a well-designed schema will have minimal impact on that. Whatever tool might do the job, but I want to cheer on the newer generation. Hope it helps.
Hello,
I am trying to design an online ordering app similar to Doordash or Uber Eats. I'm having a hard time trying to finalise on what database (or mixture of databases) to use. I'm leaning towards using a relational database like MySQL or PostgreSQL. But, when the application grows, I don't want to join on 20 tables to get a data. Any help would be greatly appreciated. Thank you for your time.
Hello Suhas , We build our product www.voilacabs.com which is in the same lines as yours but we have used a combination of Mysql and MongoDB. When using MySQL, i would recommend doing the following: 1. Use Mysql only for storage only and for realtime updates we recommend MongoDB. 2. Don't try to Join more than 3 tables. ( the moment you reach 3 join stop there and try to un-normalized database. 3. Never or very rarely use Auto-increments. ( we recommend using UUIDS ) . Use UUIDS always for Auto increments for MYSQL. If you using Postgre SQL then i would suggest you to please check this https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c There is a stored procedure that generated unique keys instead of auto-increment keys and that will help you sharding or clustering database without sync errors. 4. Also For MongoDB if you can put a layer of REDIS Cache then that will boost your api performance under large loads. 5. Use Node.js programing language as that function asynchronously .
Let me know if you still need any suggestion's . Thanks & Regards Rupen Makhecha CTO @ Voila Cab's www.voilacabs.com
I would recommend a mixture of MySQL and MongoDB. Using MongoDB for the Content Distribution Network (CDN) will make it easy to store high volume incoming data. MySQL is recommended to be used for business logic. PostgreSQL is not recommended since you will be faced with inefficient database replication features and constant migration from one PostgreSQL version to another.
I'm currently developing an app that ranks trending stuff ( such as games, memes or movies, etc. ) or events in a particular country or region. Here are the specs: My app does not require registration and requires cookies and localStorage to track users. Users can add new entries to each trending category provided that their country of origin is recorded in cookies. If each category contains more than 100 items then the oldest items get deleted. The question is: what kind of database should I use for managing this app? Thanks in advance
I think your best and cheapest choice is going to be MongoDB, Although Postgres is probably going to be the more scaleable approach, you likely have a good idea of how you want to present your data, and the app seems small enough that you shouldn't need to worry about scaling issues. It also sounds like your app can grow in a linear capacity based on the number of users, and the amount of data, which is the perfect use-case for noSQL databases (linear, predictable scaling).
Correct me if I have any of these assumptions wrong. 1. You're looking to have a relatively high-read with a lower write volume 2. Your app is essentially a list of objects that can belong to a category 3. users can create objects in this list.
I think Mongo is going to be what you're looking for on the following basis: 1. you absolutely need a database that is shared by all users of your app, therefor IndexedDB is out of the question. 2. You have semi-structured data 3. you probably want the cheapest solution.
I think Postgres is wrong for the following reasons: 1. your app is pretty simple in concept, SQL databases will add unnecessary complexity to your system, either through ORMs or SQL queries. (use an ORM if you go with SQL) 2. Hosting SQL databases for production is not cheap! the cheapest solution I know of for Postgres is ElephantSQL. It provides 20MB for free with 5 concurrent connections, you should be okay to manage these limitations if you decide to go Postgres in the end. Whereas mongoDB Atlas has some great free-tier options.
Although your data might be easier to model in Postgres, you can certainly model your data as a single list of items that have a category attached.
I don't want to officially recommend another tool, but you should really checkout prisma, firebase, amplify, or Azure App Services for this app! Just go completely backend-less [Firebase] https://firebase.google.com/ [Amplify] https://aws.amazon.com/amplify/ [Prisma] https://www.prisma.io/ [Azure App Services] https://azure.microsoft.com/en-us/services/app-service/?v=18.51
Hi everybody, I'm developing an application to be used in a gym setting where athletes fill out a health survey, and coaches can analyze the results. However, due to the dynamic nature of some aspects of the app and more static aspects of the other, I am wondering if/how I would integrate MongoDB with my existing PostgreSQL database. I would like to store things like registrations, license information, and club information in Postgres
, while I am thinking about moving things like user surveys, logging, and user settings information over to MongoDB
. Some fields on the survey are integers, some large blocks of text, and some are arrays. My thought is, if I moved that data to MongoDB
, it would give us greater flexibility in terms of adding and removing fields and data to them, and it would scale a lot easier than Postgres
. Not to mention it will be easier to organize that kind of data. Is that overkill or am I approaching this issue the right way? Thank you!
You can have your cake and eat it too. If you really need the flexibility of a document store, Postgresql's JSONB support allows you to mix and match relational data and document data within the same database/table. You can just as easily run analytical queries against JSONB data in Postgresql as you can against "normal" relational data. MongoDB comes with a significant operational overhead and cost (hello replica sets), so unless you really need MongoDB's sharding capabilities (which you shouldn't until you get to extreme scaling numbers), then just stick with Postgresql and use JSONB where you need it.
With PostgreSQL you could easily integrate JSON or array type columns and develope a simple interface to add columns on your application. Anyway handling all the data this way will require some intermediate skill with PostgreSQL dialect and a mix and match of syntaxes for your analitical queryes. Also you will need to have a good design for you backend to handle all this. MongoDB will handle all this in a more natural way and I believe will be more easily integrated with a Node.js backend.
How are you managing your PostgreSQL schema? It doesn't have to be hard to add or remove fields. We're working on a SQL database client at BaseDash that lets you add/remove columns in a couple clicks.
If you decide to migrate some of your data to MongoDB, you can definitely manage the two databases in parallel. For any records that need to be linked, you can treat it just like a foreign key by creating a column that points to an ID in the other database. For example, you might store user settings in MongoDB, and include a UserId
field that points to your User record in your Postgres database.
Those types of things should fit fine in a postgres json column. You'll actually have more flexibility with postgres because you can have a field as a normal column or in a json column, and you can have constraints and indexes on fields within a json column (or not).
Backend:
- Considering that our main app functionality involves data processing, we chose
Python
as the programming language because it offers many powerful math libraries for data-related tasks. We will useFlask
for the server due to its good integration with Python. We will use a relational database because it has good performance and we are mostly dealing with CSV files that have a fixed structure. We originally choseSQLite
, but after realizing the limitations of file-based databases, we decided to switch toPostgreSQL
, which has better compatibility with our hosting service,Heroku
.
I try to follow the 80-20 rule when it comes to my choice of tools. This means my stack consists of about 80% software I already know well, but I do allow myself 20% of the stack to explore tech I have less experience with.
The exact ratio is not what’s important here, it’s more the fact that you should lean towards using proven technologies.
I wrote more about this on my blog post on Choosing Boring Technology: https://panelbear.com/blog/boring-tech/
We have chosen Tibero over Oracle because we want to offer a PL/SQL-as-a-Service that the users can deploy in any Cloud without concerns from our website at some standard cost. With Oracle Database, developers would have to worry about what they implement and the related costs of each feature but the licensing model from Tibero is just 1 price and we have all features included, so we don't have to worry and developers using our SQLaaS neither. PostgreSQL would be open source. We have chosen Tibero over Oracle because we want to offer a PL/SQL that you can deploy in any Cloud without concerns. PostgreSQL would be the open source option but we need to offer an SQLaaS with encryption and more enterprise features in the background and best value option we have found, it was Tibero Database for PL/SQL-based applications.
The Gentlent Tech Team made lots of updates within the past year. The biggest one being our database:
We decided to migrate our #PostgreSQL -based database systems to a custom implementation of #Cassandra . This allows us to integrate our product data perfectly in a system that just makes sense. High availability and scalability are supported out of the box.
MySQL has a lot of strengths working for it. It's simple and easy to set up and use. It's JSON engine is also really good these days. Mongo is also simple to setup and use, and it's speed as a document-object storage engine is first class.
Where Postgres has both beat is in it's combining of all of the features that make both MySQL and Mongo great, while adding on enterprise grade level scalability and replication. It's Postgres' stability and robustness, while still fulfilling the roles of it's contemporaries extremely well that edge Postgre for me.
We used Mongo for the first iterations of our app, but the relational nature of our data was an awkward fit for a database that is not relational. We sorely lacked relational database integrity features that needed to be done on the application side (poorly) and it was a huge relief when we managed to port our application over to Postgres, which performs great and never gives us trouble, while having very user friendly extensions like JSON and PubSub that made the transition easy.
PostgreSQL is enterprise level database with transactions, full-text indexes, vector indexes, JSON, BLOB, geo-spatial data and a lot more. Highly scalable, configurable and easily maintainable. all that on an open source RDBMS database and you are still looking for GPL licensed MySQL with limited features? Look again.
While there's been some very clever techniques that has allowed non-natively supported geo querying to be performed, it is incredibly slow in the long game and error prone at best.
MySQL finally introduced it's own GEO functions and special indexing operations for GIS type data. I prototyped with this, as MySQL is the most familiar database to me. But no matter what I did with it, how much tuning i'd give it, how much I played with it, the results would come back inconsistent.
It was very disappointing.
I figured, at this point, that SQL Server, being an enterprise solution authored by one of the biggest worldwide software developers in the world, Microsoft, might contain some decent GIS in it.
I was very disappointed.
Postgres is a Database solution i'm still getting familiar with, but I noticed it had no built in support for GIS. So I hilariously didn't pay it too much attention. That was until I stumbled upon PostGIS and my world changed forever.
To provide employees with the critical need of interactive querying, we’ve worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest’s scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.
Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.
We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.
Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.
Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.
#BigData #AWS #DataScience #DataEngineering
The platform deals with time series data from sensors aggregated against things( event data that originates at periodic intervals). We use Cassandra as our distributed database to store time series data. Aggregated data insights from Cassandra is delivered as web API for consumption from other applications. Presto as a distributed sql querying engine, can provide a faster execution time provided the queries are tuned for proper distribution across the cluster. Another objective that we had was to combine Cassandra table data with other business data from RDBMS or other big data systems where presto through its connector architecture would have opened up a whole lot of options for us.
Pros of PostgreSQL
- Relational database763
- High availability510
- Enterprise class database439
- Sql383
- Sql + nosql304
- Great community173
- Easy to setup147
- Heroku131
- Secure by default130
- Postgis113
- Supports Key-Value50
- Great JSON support48
- Cross platform34
- Extensible33
- Replication28
- Triggers26
- Multiversion concurrency control23
- Rollback23
- Open source21
- Heroku Add-on18
- Stable, Simple and Good Performance17
- Powerful15
- Lets be serious, what other SQL DB would you go for?13
- Good documentation11
- Scalable9
- Free8
- Reliable8
- Intelligent optimizer8
- Transactional DDL7
- Modern7
- One stop solution for all things sql no matter the os6
- Relational database with MVCC5
- Faster Development5
- Full-Text Search4
- Developer friendly4
- Excellent source code3
- Free version3
- Great DB for Transactional system or Application3
- Relational datanbase3
- search3
- Open-source3
- Text2
- Full-text2
- Can handle up to petabytes worth of size1
- Composability1
- Multiple procedural languages supported1
- Native0
Pros of Presto
- Works directly on files in s3 (no ETL)18
- Open-source13
- Join multiple databases12
- Scalable10
- Gets ready in minutes7
- MPP6
Sign up to add or upvote prosMake informed product decisions
Cons of PostgreSQL
- Table/index bloatings10