Needs advice
on
CouchbaseCouchbase
and
MongoDBMongoDB

We Have thousands of .pdf docs generated from the same form but with lots of variability. We need to extract data from open text and more important - from tables inside the docs. The output of Couchbase/Mongo will be one row per document for backend processing. ADOBE renders the tables in an unusable form.

READ LESS
9 upvotes·220.9K views
Replies (3)
Freelancer at havlicekpetr.cz·
Recommends
on
MongoDB

I prefer MongoDB due to own experience with migration of old archive of pdf and meta-data to a new “archive”. The biggest advantage is speed of filters output - a new archive is way faster and reliable then the old one - but also the the easy programming of MongoDB with many code snippets and examples available. I have no personal experience so far with Couchbase. From the architecture point of view both options are OK - go for the one you like.

READ MORE
12 upvotes·213.5K views
Director - NGO "Informational Culture" / Ambassador - OKFN Russia at Infoculture·
Recommends
on
ArangoDB

I would like to suggest MongoDB or ArangoDB (can't choose both, so ArangoDB). MongoDB is more mature, but ArangoDB is more interesting if you will need to bring graph database ideas to solution. For example if some data or some documents are interlinked, then probably ArangoDB is a best solution.

To process tables we used Abbyy software stack. It's great on table extraction.

READ MORE
7 upvotes·213.6K views
View all (3)
Avatar of Petr Havlicek

Petr Havlicek

Freelancer at havlicekpetr.cz