Need advice about which tool to choose?Ask the StackShare community!

OpenRefine

31
66
+ 1
0
Talend

150
247
+ 1
0
Add tool

Talend vs OpenRefine: What are the differences?

Developers describe Talend as "A single, unified suite for all integration needs". It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms. On the other hand, OpenRefine is detailed as "Desktop application for data cleanup and transformation". It is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

Talend and OpenRefine can be primarily classified as "Big Data" tools.

OpenRefine is an open source tool with 6.62K GitHub stars and 1.18K GitHub forks. Here's a link to OpenRefine's open source repository on GitHub.

Advice on OpenRefine and Talend
karunakaran karthikeyan
Needs advice
on
DremioDremio
and
TalendTalend

I am trying to build a data lake by pulling data from multiple data sources ( custom-built tools, excel files, CSV files, etc) and use the data lake to generate dashboards.

My question is which is the best tool to do the following:

  1. Create pipelines to ingest the data from multiple sources into the data lake
  2. Help me in aggregating and filtering data available in the data lake.
  3. Create new reports by combining different data elements from the data lake.

I need to use only open-source tools for this activity.

I appreciate your valuable inputs and suggestions. Thanks in Advance.

See more
Replies (1)
Rod Beecham
Partnering Lead at Zetaris · | 3 upvotes · 63.9K views
Recommends
on
DremioDremio

Hi Karunakaran. I obviously have an interest here, as I work for the company, but the problem you are describing is one that Zetaris can solve. Talend is a good ETL product, and Dremio is a good data virtualization product, but the problem you are describing best fits a tool that can combine the five styles of data integration (bulk/batch data movement, data replication/data synchronization, message-oriented movement of data, data virtualization, and stream data integration). I may be wrong, but Zetaris is, to the best of my knowledge, the only product in the world that can do this. Zetaris is not a dashboarding tool - you would need to combine us with Tableau or Qlik or PowerBI (or whatever) - but Zetaris can consolidate data from any source and any location (structured, unstructured, on-prem or in the cloud) in real time to allow clients a consolidated view of whatever they want whenever they want it. Please take a look at www.zetaris.com for more information. I don't want to do a "hard sell", here, so I'll say no more! Warmest regards, Rod Beecham.

See more
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
- No public GitHub repository available -

What is OpenRefine?

It is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

What is Talend?

It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms.

Need advice about which tool to choose?Ask the StackShare community!

What companies use OpenRefine?
What companies use Talend?
See which teams inside your own company are using OpenRefine or Talend.
Sign up for StackShare EnterpriseLearn More

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with OpenRefine?
What tools integrate with Talend?
What are some alternatives to OpenRefine and Talend?
Trifacta
It is an Intelligent Platform that Interoperates with Your Data Investments. It sits between the data storage and processing environments and the visualization, statistical or machine learning tools used downstream
R Language
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.
Python
Python is a general purpose programming language created by Guido Van Rossum. Python is most praised for its elegant syntax and readable code, if you are just beginning your programming career python suits you best.
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.
RapidMiner
It is a software platform for data science teams that unites data prep, machine learning, and predictive model deployment.
See all alternatives