Need advice about which tool to choose?Ask the StackShare community!
For a property and casualty insurance company, we currently use MarkLogic and Hadoop for our raw data lake. Trying to figure out how snowflake fits in the picture. Does anybody have some good suggestions/best practices for when to use and what data to store in Mark logic versus Snowflake versus a hadoop or all three of these platforms redundant with one another?
for property and casualty insurance company we current Use marklogic and Hadoop for our raw data lake. Trying to figure out how snowflake fits in the picture. Does anybody have some good suggestions/best practices for when to use and what data to store in Mark logic versus snowflake versus a hadoop or all three of these platforms redundant with one another?
As i see it, you can use Snowflake as your data warehouse and marklogic as a data lake. You can add all your raw data to ML and curate it to a company data model to then supply this to Snowflake. You could try to implement the dw functionality on marklogic but it will just cost you alot of time. If you are using Aws version of Snowflake you can use ML spark connector to access the data. As an extra you can use the ML also as an Operational report system if you join it with a Reporting tool lie PowerBi. With extra apis you can also provide data to other systems with ML as source.
Hey, we want to build a referral campaign mechanism that will probably contain millions of records within the next few years. We want fast read access based on IDs or some indexes, and isolation is crucial as some listeners will try to update the same document at the same time. What's your suggestion between Couchbase and MongoDB? Thanks!
I am biased (work for Scylla) but it sounds like a KV/wide column would be better in this use case. Document/schema free/lite DBs data stores are easier to get up and running on but are not as scalable (generally) as NoSQL flavors that require a more rigid data model like ScyllaDB. If your data volumes are going to be 10s of TB and transactions per sec 10s of 1000s (or more), look at Scylla. We have something called lightweight transactions (LWT) that can get you consistency.
I have found MongoDB highly consistent and highly available. It suits your needs. We usually trade off partion tolerance fot this. Having said that, I am little biased in recommendation as I haven't had much experience with couchbase on production.
I'm planning to build a freelance marketplace website, using tools like Next.js, Firebase Authentication, Node.js, but I need to know which type of database is suitable with performance and powerful features. I'm trying to figure out what the best stack is for this project. If anyone has advice please, I’d love to hear more details. Thanks.
Postgres and MySQL are very similar, but Mongo has differences in terms of storage type and the CAP theorem. For your requirement, I prefer Postgres (or MySQL) over MongoDB. Mongo gives you no schema which is not always good. on the other hand, it is more common in NodeJS community, so you may find more articles about Node-Mongo stuff. I suggest to stay with RDBMS if possible.
This is a little about experience. Postgresql is fine. You can use either the related table structure or the json table structure.
We have a ready-made engine for the online exchange and marketplace. To customize it, you only need to know sql. Connecting any database is not a problem. https://falconspace.site/list/solutions
Developing a solution that collects Telemetry Data from different devices, nearly 1000 devices minimum and maximum 12000. Each device is sending 2 packets in 1 second. This is time-series data, and this data definition and different reports are saved on PostgreSQL. Like Building information, maintenance records, etc. I want to know about the best solution. This data is required for Math and ML to run different algorithms. Also, data is raw without definitions and information stored in PostgreSQL. Initially, I went with TimescaleDB due to PostgreSQL support, but to increase in sites, I started facing many issues with timescale DB in terms of flexibility of storing data.
My major requirement is also the replication of the database for reporting and different purposes. You may also suggest other options other than Druid and Cassandra. But an open source solution is appreciated.
Hi Umair, Did you try MongoDB. We are using MongoDB on a production environment and collecting data from devices like your scenario. We have a MongoDB cluster with three replicas. Data from devices are being written to the master node and real-time dashboard UI is using the secondary nodes for read operations. With this setup write operations are not affected by read operations too.
For learning purposes, I am trying to design a dashboard that displays the total revenue from all connected webshops/marketplaces, displaying incoming orders, total orders, etc.
So I will need to get the data (using Node backend) from the Shopify and marketplace APIs, storing this in the database, and get the data from the back end.
My question is:
What kind of database should I use? Is MongoDB fine for storing this kind of data? Or should I go with a SQL database?
Postgres is a solid database with a promising background. In the relational side of database design, I see Postgres as an absolute; Now the arguments and conflicts come in when talking about NoSQL data types. The truth is jsonb in Postgres is efficient and gives a good performance and storage. In a comparison with MongoDB with the same resources (such as RAM and CPU) with better tools and community, I think you should go for Postgres and use jsonb for some of the data. All in all, don't use a NoSQL database just cause you have the data type matching this tech, have both SQL and NoSQL at the same time.
I have found MongoDB easier to work with. Postgres and SQL in general, in my experience, is harder to work with. While Postgres does provide data consistency, MongoDB provides flexibility. I've found the MongoDB ecosystem to be really great with a good community. I've worked with MongoDB in production and it's been great. I really like the aggregation system and using query operators such as $in, $pull, $push.
While my opinion may be unpopular, I have found MongoDB really great for relational data, using aggregations from a code perspective. In general, data types are also more flexible with MongoDB.
I will use PostgreSQL because you have more powerfull feature for data agregation and views (the raw data from shopify and others could be stored as is) and then use views to produce diff. kind of reports unless you wanna create those aggregations/views in nodejs code. HTH
I want to store the data retrieved from multiple APIs and perform some analytics on it. The data stored in DB will never/hardly change. First, I thought it would be better to retrieve the data and create table columns for them, but some data might have different columns than others. So I thought about storing the JSON response from API directly to the table and use it. So which database will be the better choice, PostgreSQL or MongoDB.
Hey Krunal, your requirement sounds pretty clear and specific to what you want to do with that data. My recommendation to you, would be to use MongoDB. Since schema-less IO is faster in MongoDB, your general speed of reading / writing from and to the database would be quick. Additionally, the aggregate framework is very powerful with large data so that is also something that you can use in computing your analytics.
I suggest you to go with MongoDB
, because it is schema-less, i.e., it permits you to easily manipulate the schema of a table. If you want to add a column, it can be done without much effort. Moreover, MongoDB
can deal with more types of data, since the latest is stored as key-value pair. I do not what kind of analysis you are going to do, but NoSQL
is not the best choice if you are going to use complex queries. In addition, if you are working with huge amount of data and you are interested in optimising the performance, I suggest you PostgreSQL
.
Since you are speaking about API and JSON, I guess that you may using Node JS
for fetching API. I suggest you to try Mongoose
, which facilitate the use of MongoDB
with Node JS
.
Looks like the use case is to store JSON data. mongoDB and Postgres differ in so many aspects like scaling and consistency. Postgres has excellent JSON support now with the power of SQL. MongoDB is good in handling schema less data. However in this case it seems these differences don’t matter that much. I’d recommend you go with what you are most comfortable with.
I don't have an unquestionable opinion regarding your use case. I only trend to pick the MongoDB since it is schemaless avoiding null columns that you not always know when it is used (it depends on the source of the data). The only drawback that I could consider is the query's complexity in MongoDB, sometimes it is a bit tricky, when compared to the traditional SQL queries.
This is largely a matter of opinion. I see that someone else responded and recommended MongoDB but since you are doing data analytics, I highly recommend you go with SQL. You're going to have a really hard time normalizing the data when you can't manipulate relationships and bulk edit with a nice update query.
I'm much more experienced with MySQL than any other database and I am having a hard time getting on board with noSQL entirely because it's really hard to query complex data with relationships using noSQL. I'm using Firestore with one of my apps and MongoDB with another app but they both use MySQL for the heavy lifting and then a document database for things like permissions, caching, etc.
It sounds like the type of problem you need to reverse engineer. I'm sure you can imagine what the data sets would look like if you use MongoDB or Postgres. I suspect that putting in a little bit more work up front will pay high dividends and productivity once the data is normalized.
Again - it's largely a matter of preference but I prefer SQL almost every time.
I need urgent advice from you all! I am making a web-based food ordering platform which includes 3 different ordering methods (Dine-in using QR code scanning + Take away + Home Delivery) and a table reservation system. We are using React for the front-end, and I need your advice if I should use NestJS or ExpressJS for the backend. And regarding the database, which database should I use, MongoDB or PostgreSQL? Which combination will be better? PS. We want to follow the microservice architecture as scalability, reliability, and usability are the most important Non Functional requirements. Expert advice is needed, please. A load of thanks in advance. Kind Regards, Miqdad
I can't speak for the NestJS vs ExpressJS discussion, but I can given a viewpoint on databases.
The main thing to consider around database choice, is what "shape" the data will be in, and the kind of read/write patterns you expect of that data. The blog example shows up so much for DBMS like MongoDB, because it showcases what NoSQL / document storage is very scalable and performant in: mostly isolated documents with a few views / ways to order them and filter them. In your case, I can imagine a number of "relations" already, which suggest a more traditional SQL solution would work well: You have restaurants, they have maybe a few menus (regular, gluten-free etc), with menu items in, which have different prices over time (25% discount on christmas food just after christmas, 50% off pizzas on wednesdays). Then there's a whole different set of "relations" for people ordering, like showing them past orders, which need to refer to the restaurant etc, and credit card transaction information for refunds etc. That to me suggests PostgreSQL, which will scale quite well if you database design is okay.
PostgreSQL also offers you some extensions, which are just amazing for your use-case. https://postgis.net/ for example will let you query for restaurants based on location, without the big cost that comes from constantly using something like Google Maps API to work out which restaurants are near to someone ordering. Partitioning and window functions will be great for your own use internally too, like answering questions of "What types of takeways perform the best for us, Italian, Mexican?" or in combination with PostGIS, answering questions like "What kind of takeways do we need to market to, to improve our selection?".
While these things can all be implemented in MongoDB, you tend to lose some of the convenience of ACID or have to deal with things like eventual consistency, which requires more thinking on the part of your engineers. PostgreSQL offers decent (if more complex) scalablity and redundancy solutions, and is honestly very well proven and plenty of documentation exists on optimising queries.
Hello, i build microservice systems using Angular And Spring (Java) so i can't help with with ur back end choice, BUT, i definitely advice you to use a Nosql database, thus MongoDB of course or even Cassandra if your looking for extreme scalability with zero point of failure. Anyway, Nosql if much more faster then Sql (in your case Postresql DB). All you wanna do with sql can also be done by nosql (not the opposite of course).I also advice you to use docker containers + kubernetes to orchestrate them, if you need scalability and replication, that way your app can support auto scalability (in case ur users number goes high). Best of luck
About PostgreSQL vs MongoDB: short answer. Both are great. Choose what you like the most. Only if you expect millions of users, I‘ll incline with MongoDB.
Hello, I am developing a new project with an internal chat between users. Also, there are complex relationships between the other project entities but I wolud like to build something scalable and fast and right now I am designing the data model. What kind of database would you recommend me to manage all application data? relational like MySQL, no relational like MongoDB or a mixed one? Thank you
In MongoDB, a write operation is atomic on the level of a single document, so it's harder to deal with consistency without transactions.
MongoDB supports horizontal scaling through Sharding , distributing data across several machines and facilitating high throughput operations with large sets of data. ... Sharding allows you to add additional instances to increase capacity when required
If you are trying with "complex relationships", give a chance to learn ArangoDB and Graph databases. Its database structures allow doing this with faster and simpler queries. The database is not as strict as others and allows arbitrary data. The data model is really like a neural network and you will never need foreign keys tables anymore. In Udemy there is a free course about it to get started.
The most important question is where are you planning to host? On-premise, or in the cloud.
Particularly if you are planning to host in either AWS or Azure, then your first point of call should be the PaaS (Platform as a Service) databases supplied by these vendors, as you will find yourself requiring a lot less effort to support them, much easier Disaster Recovery options, and also, depending on how PAYG the database is that you use, potentially also much cheaper costs than having a dedicated database server.
Your question regards 'Relational or not' is obviously key, and you need to consider both your required data structure, as well as the ACID requirements of your application model, as well as the non-functional requirements in terms of scalability, resilience, whether you want security authorisation at the highest application tier, or right down to 'row' level in the database, etc. - however please don't fall into the trap of considering 'NoSQL' as being single category. MongoDB, with its document-store type solution is a very different model to key-value-pair stores (like AWS DynamoDB), or column stores (like AWS RedShift) or for more complex data relationships, Entity Graph Stores (like AWS Neptune), to stores designed for tokenisation and text search (ElasticSearch) etc.
Also critical in all this is how many items you believe you need to index by. RDBMS/SQL stores are great for having as many indexes as you want, other than the slow-down in write speed, whereas databases like Amazon DynamoDB provide blisteringly fast read/write performance, but are very limited on key indexing capabilities.
It feels like you have most experience with SQL/RDBMS technologies, so for the simplest learning curve, and if your application fits it, then I'd personally start by looking at AWS Aurora https://aws.amazon.com/rds/aurora/ .
FIrstly, it may help if you explain what you mean by "complex relationships between project entities". Secondly, you can build a fast and scalable solution using either. With that said however, the data sounds relational so I would recommend MySQL.
I think, Its depend of your project type and your skills. MySQL is good and simple for maintenance but MongoDB need more skills and knowledge. If you work on little project, use MySQL. For your project type, MySQL is enough after you can migrate with PostgreSQL
I am going to work on a real estate project and have to decide on a database. Now, SQL databases can be very efficient if appropriately designed. More relations between the data and less redundancy. But with a #NoSQL database, the development time is reduced, and it is easy to query. Since this is my first time working on the real estate domain, I would like to pick a database that would be efficient in the long run.
I recommend PostgreSQL as it’s the most powerful out of the 3 databases you mentioned. It supports JSON objects so you can mimic the MongoDB functionality, but I would also argue that SQL is actually quite powerful and in many cases significantly easier to work with than with NoSQL databases.
Stay away from foreign keys, keep it fast and simple. Define your data structures well in advance. Try to model your data structures based on your system’s vision; based on where it’s going and not based solely on what you currently need it to do. This will help you avoid drastic changes to your database after your system is launched. Populate the database with fake data and run tests. PostgreSQL allows you to create Views from multiple tables. Try to create those views and make sure you can easily create useful views from multiple tables. Run an Explain on those view queries to make sure you created your indexes correctly. Make sure it’s fast!
Any of those three databases are going to be efficient, scalable, and reliable in the long term if you configure and use them correctly. They all also have solid hosting solutions.
All things being equal, I would agree with other posters that Postgres is my preference among the three, but there are caveats.
MongoDB and MySQL have better support for mutli-region replication in your big three cloud environments. Azure recently bought Citus Data, which was a best-in-class Postgres replication solution, so they might be the only one I trust to provide cross-region replication at the moment.
If you have a single region deployment and are on AWS, I can't recommend Aurora Postgres highly enough. It's a very good implementation and extremely performant.
I'll second another piece of advice. Postgresql's JSON columns are a dream when it comes to productivity and I use them frequently with our Rails application. In these cases, no migration is required to change schema. We store payloads with dozens or hundreds of keys and performance has not been an issue. We also have a lot of relational tables, so the joins we get with SQL are very important to us and hard to replicate with a NoQL solution.
That really depends of where do you see you application in the long run. On any application, any of those choices are excellent. You could argue about good support on JSON binaries, but even MySQL has an excellent support for that on the latest versions.
On the long run, when your application gets hundreds of thousands of requests per second, you might start thinking about how many inputs you will have in the database compared to the outputs. PostgresSQL it’s excellent at giving you outputs, but table corruption can happen when you start receiving this massive number of inputs (Which was the reason Uber switched from Postgres to MySQL)
On our OPS Platform at CTO.ai , we decided to use Postgres, because we need a reliable and agile way to send the output to our users, so that was out best choice in the long run for our product.
MongoDB's document-oriented paradigm is nicely suited to the results of our ML model. We felt that this compatibility offered some time savings on figuring out and implementing an extensive data formatting and processing system. MongoDB's flexible schemas schemas (due to it being non-relational) were also attractive as a source of additional agility for our development process. The MongoDB ecosystem also has great GUI tools to simplify testing.
At Pushnami we were looking at several alternative databases that would support following architectural requirements: - very quick prototyping for an unknown domain - ability to support large amounts of data - native ability to replicate and fail over - full stack approach for Node.js development After careful consideration MongoDB came on top, and 3 years later we are still very happy with that decision. Currently we keep almost 2TB of data in our cluster, and start thinking about sharding.
After using couchbase for over 4 years, we migrated to MongoDB and that was the best decision ever! I'm very disappointed with Couchbase's technical performance. Even though we received enterprise support and were a listed Couchbase Partner, the experience was horrible. With every contact, the sales team was trying to get me on a $7k+ license for access to features all other open source NoSQL databases get for free.
Here's why you should not use Couchbase
Full-text search Queries The full-text search often returns a different number of results if you run the same query multiple types
N1QL queries Configuring the indexes correctly is next to impossible. It's poorly documented and nobody seems to know what to do, even the Couchbase support engineers have no clue what they are doing.
Community support I posted several problems on the forum and I never once received a useful answer
Enterprise support It's very expensive. $7k+. The team constantly tried to get me to buy even though the community edition wasn't working great
Autonomous Operator It's actually just a poorly configured Kubernetes role that no matter what I did, I couldn't get it to work. The support team was useless. Same lack of documentation. If you do get it to work, you need 6 servers at least to meet their minimum requirements.
Couchbase cloud Typical for Couchbase, the user experience is awful and I could never get it to work.
Minimum requirements
The minimum requirements in production are 6 servers. On AWS the calculated monthly cost would be ~$600
. We achieved better performance using a $16
MongoDB instance on the Mongo Atlas Cloud
writing queries is a nightmare While N1QL is similar to SQL and it's easier to write because of the familiarity, that isn't entirely true. The "smart index" that Couchbase advertises is not smart at all. Creating an index with 5 fields, and only using 4 of them won't result in Couchbase using the same index, so you have to create a new one.
Couchbase UI
The UI that comes with every database deployment is full of bugs, barely functional and the developer experience is poor. When I asked Couchbase about it, they basically said they don't care because real developers use SQL directly from code
Consumes too much RAM
Couchbase is shipped with a smaller Memcached instance to handle the in-memory cache. Memcached ends up using 8 GB of RAM for 5000 documents
! I'm not kidding! We had less than 5000 docs on a Couchbase instance and less than 20 indexes and RAM consumption was always over 8 GB
Memory allocations are useless I asked the Couchbase team a question: If a bucket has 1 GB allocated, what happens when I have more than 1GB stored? Does it overflow? Does it cache somewhere? Do I get an error? I always received the same answer: If you buy the Couchbase enterprise then we can guide you.
We actually use both Mongo and SQL databases in production. Mongo excels in both speed and developer friendliness when it comes to geospatial data and queries on the geospatial data, but we also like ACID compliance hence most of our other data (except on-site logs) are stored in a SQL Database (MariaDB for now)
MySQL has a lot of strengths working for it. It's simple and easy to set up and use. It's JSON engine is also really good these days. Mongo is also simple to setup and use, and it's speed as a document-object storage engine is first class.
Where Postgres has both beat is in it's combining of all of the features that make both MySQL and Mongo great, while adding on enterprise grade level scalability and replication. It's Postgres' stability and robustness, while still fulfilling the roles of it's contemporaries extremely well that edge Postgre for me.
When I was new with web development, I was using PHP for backend and MySQL for database. But after improving my JS skills, I chosen Node.js. Because of too many reasons including npm, express, community, fast coding and etc. MongoDB is so good for using with Node.js. If your JS skills are enough good, I recommend to migrate to Node.js and MongoDB.
My data was inherently hierarchical, but there was not enough content in each level of the hierarchy to justify a relational DB (SQL) with a one-to-many approach. It was also far easier to share data between the frontend (Angular), backend (Node.js) and DB (MongoDB) as they all pass around JSON natively. This allowed me to skip the translation layer from relational to hierarchical. You do need to think about correct indexes in MongoDB, and make sure the objects have finite size. For instance, an object in your DB shouldn't have a property which is an array that grows over time, without limit. In addition, I did use MySQL for other types of data, such as a catalog of products which (a) has a lot of data, (b) flat and not hierarchical, (c) needed very fast queries.
We used Mongo for the first iterations of our app, but the relational nature of our data was an awkward fit for a database that is not relational. We sorely lacked relational database integrity features that needed to be done on the application side (poorly) and it was a huge relief when we managed to port our application over to Postgres, which performs great and never gives us trouble, while having very user friendly extensions like JSON and PubSub that made the transition easy.
We wanted a JSON datastore that could save the state of our bioinformatics visualizations without destructive normalization. As a leading NoSQL data storage technology, MongoDB has been a perfect fit for our needs. Plus it's open source, and has an enterprise SLA scale-out path, with support of hosted solutions like Atlas. Mongo has been an absolute champ. So much so that SQL and Oracle have begun shipping JSON column types as a new feature for their databases. And when Fast Healthcare Interoperability Resources (FHIR) announced support for JSON, we basically had our FHIR datalake technology.
In the field of bioinformatics, we regularly work with hierarchical and unstructured document data. Unstructured text data from PDFs, image data from radiographs, phylogenetic trees and cladograms, network graphs, streaming ECG data... none of it fits into a traditional SQL database particularly well. As such, we prefer to use document oriented databases.
MongoDB is probably the oldest component in our stack besides Javascript, having been in it for over 5 years. At the time, we were looking for a technology that could simply cache our data visualization state (stored in JSON) in a database as-is without any destructive normalization. MongoDB was the perfect tool; and has been exceeding expectations ever since.
Trivia fact: some of the earliest electronic medical records (EMRs) used a document oriented database called MUMPS as early as the 1960s, prior to the invention of SQL. MUMPS is still in use today in systems like Epic and VistA, and stores upwards of 40% of all medical records at hospitals. So, we saw MongoDB as something as a 21st century version of the MUMPS database.
Pros of Cassandra
- Distributed119
- High performance98
- High availability81
- Easy scalability74
- Replication53
- Reliable26
- Multi datacenter deployments26
- Schema optional10
- OLTP9
- Open source8
- Workload separation (via MDC)2
- Fast1
Pros of MarkLogic
- RDF Triples5
- JSON3
- Marklogic is absolutely stable and very fast3
- REST API3
- JavaScript3
- Enterprise3
- Semantics2
- Multi-model DB2
- Bitemporal1
- Tiered Storage1
Pros of MongoDB
- Document-oriented storage828
- No sql593
- Ease of use553
- Fast464
- High performance410
- Free255
- Open source218
- Flexible180
- Replication & high availability145
- Easy to maintain112
- Querying42
- Easy scalability39
- Auto-sharding38
- High availability37
- Map/reduce31
- Document database27
- Easy setup25
- Full index support25
- Reliable16
- Fast in-place updates15
- Agile programming, flexible, fast14
- No database migrations12
- Easy integration with Node.Js8
- Enterprise8
- Enterprise Support6
- Great NoSQL DB5
- Support for many languages through different drivers4
- Schemaless3
- Aggregation Framework3
- Drivers support is good3
- Fast2
- Managed service2
- Easy to Scale2
- Awesome2
- Consistent2
- Good GUI1
- Acid Compliant1
Sign up to add or upvote prosMake informed product decisions
Cons of Cassandra
- Reliability of replication3
- Size1
- Updates1
Cons of MarkLogic
Cons of MongoDB
- Very slowly for connected models that require joins6
- Not acid compliant3
- Proprietary query language2