We are evaluating Presto against the Denodo to build the virtualization layer on top of the Cloudera Data warehouse. We have customer and transaction data in the Cloudera data warehouse, and we want to build the virtualization layer on top of the multiple datasets and Cloudera DW.
Good question. If you are looking for an engine for on prem data lakes like HDFS or Cloud data lakes like S3 and GCS, Presto would be the right choice. That is the key reason Facebook created Presto - it replaced Hive. In addition, it is built for open formats like Apache Parquet and Apache ORC. In fact Presto is becoming the de facto engine for data lakes. Like Facebook, many users are migrating from Hive and other Hadoop era engines to Presto. Presto is more than a virtualization layer, it is built like a database engine, has an optimizer and can push down predicates to the object / files underneath limiting reads. In addition, Presto is open source under the Linux Foundation and you can participate in its evolution. This short article may help a bit : https://ahana.io/answers/how-do-i-query-a-data-lake-with-presto/
Compared with this Denodo was built for traditional relational databases not for data lakes. If you are federating across sql server, oracle etc Denodo would be a good choice. Good tutorial here: https://community.denodo.com/tutorials/browse/bi/2virtualization1
To come clean, I may be biased towards Presto as a leader of the Presto Foundation and founder of a Presto company, but I have been building database and distributed systems for over 15 years and try to help users make the right decisions about their data problems they are trying to solve. Hope this helps.
I recommend Presto (more importantly Trino https://trino.io/ ). The creators of Presto moved from Facebook to Trino (forked the code) and Trino now looks like the thriving tool. See Starburst for Enterprise and compatibility with Cloudera: https://docs.starburst.io/358-e/connector/starburst-hive-cdp.html
As for Denodo, this is an enterprise silver bullet. Trino is the community silver bullet. Nothing I've ever seen performs as well as Trino for aggregated data querying over databases.