StreamSets

#137in Databases

Discussions0

Followers133

50 Alternatives to StreamSets

Compare StreamSets to these popular alternatives based on real-world usage and developer feedback.

Grooper

It empowers rapid innovation for organizations processing and integrating large quantities of difficult data. Created by a team of courageous developers frustrated by limitations in existing solutions, It is an intelligent document and digital data integration platform. It combines patented and sophisticated image processing, capture technology, machine learning, and natural language processing.

1 stacks0 votes2 followers

Compare StreamSets vs Grooper →

Kafka

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

24,170 stacks607 votes22,283 followers

Why developers like Kafka:

✓High-throughput(126)
✓Distributed(119)
✓Scalable(92)

Compare StreamSets vs Kafka →

RabbitMQ

RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.

21,823 stacks558 votes18,944 followers

Why developers like RabbitMQ:

✓It's fast and it works with good metrics/monitoring(235)
✓Ease of configuration(80)
✓I like the admin interface(60)

Compare StreamSets vs RabbitMQ →

NumPy

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

4,299 stacks15 votes799 followers

Why developers like NumPy:

✓Great for data analysis(10)
✓Faster than list(4)

Compare StreamSets vs NumPy →

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

3,075 stacks140 votes3,530 followers

Why developers like Apache Spark:

✓Open-source(61)
✓Fast and Flexible(48)
✓Great for distributed SQL like applications(8)

Compare StreamSets vs Apache Spark →

Amazon SQS

Transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available. With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use.

2,759 stacks171 votes1,986 followers

Why developers like Amazon SQS:

✓Easy to use, reliable(62)
✓Low cost(40)
✓Simple(28)

Compare StreamSets vs Amazon SQS →

Pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.

2,113 stacks23 votes1,311 followers

Why developers like Pandas:

✓Easy data frame management(21)

Compare StreamSets vs Pandas →

Celery

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

1,683 stacks280 votes1,623 followers

Why developers like Celery:

✓Task queue(99)
✓Python integration(63)
✓Django integration(40)

Compare StreamSets vs Celery →

SciPy

Python-based ecosystem of open-source software for mathematics, science, and engineering. It contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

1,504 stacks0 votes180 followers

Compare StreamSets vs SciPy →

ActiveMQ

Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License.

880 stacks77 votes1,276 followers

Why developers like ActiveMQ:

✓Easy to use(18)
✓Open source(14)
✓Efficient(13)

Compare StreamSets vs ActiveMQ →

Dataform

Dataform helps you manage all data processes in your cloud data warehouse. Publish tables, write data tests and automate complex SQL workflows in a few minutes, so you can spend more time on analytics and less time managing infrastructure.

818 stacks0 votes53 followers

Compare StreamSets vs Dataform →

Splunk

It provides the leading platform for Operational Intelligence. Customers use it to search, monitor, analyze and visualize machine data.

773 stacks20 votes1,023 followers

Why developers like Splunk:

✓API for searching logs, running reports(3)
✓Alert system based on custom query results(3)

Compare StreamSets vs Splunk →

MQTT

It was designed as an extremely lightweight publish/subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium.

637 stacks7 votes577 followers

Why developers like MQTT:

✓Varying levels of Quality of Service to fit a range of (3)

Compare StreamSets vs MQTT →

Azure Service Bus

It is a cloud messaging system for connecting apps and devices across public and private clouds. You can depend on it when you need highly-reliable cloud messaging service between applications and services, even when one or more is offline.

553 stacks7 votes536 followers

Why developers like Azure Service Bus:

✓Easy Integration with .Net(4)

Compare StreamSets vs Azure Service Bus →

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

534 stacks38 votes879 followers

Why developers like Apache Flink:

✓Unified batch and stream processing(16)
✓Out-of-the box connector to kinesis,s3,hdfs(8)
✓Easy to use streaming apis(8)

Compare StreamSets vs Apache Flink →

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

521 stacks49 votes840 followers

Why developers like Amazon Athena:

✓Use SQL to analyze CSV files(16)
✓Glue crawlers gives easy Data catalogue(8)
✓Cheap(7)

Compare StreamSets vs Amazon Athena →

PySpark

It is the collaboration of Apache Spark and Python. it is a Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data.

491 stacks0 votes295 followers

Compare StreamSets vs PySpark →

Apache Hive

Hive facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.

488 stacks0 votes475 followers

Compare StreamSets vs Apache Hive →

AWS Glue

A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.

462 stacks9 votes819 followers

Why developers like AWS Glue:

✓Managed Hive Metastore(9)

Compare StreamSets vs AWS Glue →

Anaconda

A free and open-source distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system conda.

440 stacks0 votes490 followers

Compare StreamSets vs Anaconda →

Presto

Distributed SQL Query Engine for Big Data

394 stacks66 votes1,032 followers

Why developers like Presto:

✓Works directly on files in s3 (no ETL)(18)
✓Open-source(13)
✓Join multiple databases(12)

Compare StreamSets vs Presto →

Apache NiFi

An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

393 stacks65 votes692 followers

Why developers like Apache NiFi:

✓Visual Data Flows using Directed Acyclic Graphs (DAGs)(17)
✓Free (Open Source)(8)
✓Simple-to-use(7)

Compare StreamSets vs Apache NiFi →

Druid

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

376 stacks32 votes867 followers

Why developers like Druid:

✓Real Time Aggregations(15)
✓Batch and Real-Time Ingestion(6)
✓OLAP(5)

Compare StreamSets vs Druid →

Confluent

It is a data streaming platform based on Apache Kafka: a full-scale streaming platform, capable of not only publish-and-subscribe, but also the storage and processing of data within the stream

337 stacks14 votes239 followers

Why developers like Confluent:

✓Free for casual use(4)
✓No hypercloud lock-in(3)
✓Dashboard for kafka insight(3)

Compare StreamSets vs Confluent →

Talend

It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms.

297 stacks0 votes249 followers

Compare StreamSets vs Talend →

ZeroMQ

The 0MQ lightweight messaging kernel is a library which extends the standard socket interfaces with features traditionally provided by specialised messaging middleware products. 0MQ sockets provide an abstraction of asynchronous message queues, multiple messaging patterns, message filtering (subscriptions), seamless access to multiple transport protocols and more.

258 stacks71 votes586 followers

Why developers like ZeroMQ:

✓Fast(23)
✓Lightweight(20)
✓Transport agnostic(11)

Compare StreamSets vs ZeroMQ →

Azure Data Factory

It is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud.

254 stacks0 votes484 followers

Compare StreamSets vs Azure Data Factory →

MassTransit

It is free software/open-source .NET-based Enterprise Service Bus software that helps Microsoft developers route messages over MSMQ, RabbitMQ, TIBCO and ActiveMQ service busses, with native support for MSMQ and RabbitMQ.

167 stacks0 votes176 followers

Compare StreamSets vs MassTransit →

Apache Impala

Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.

145 stacks18 votes301 followers

Why developers like Apache Impala:

✓Super fast(11)

Compare StreamSets vs Apache Impala →

NSQ

NSQ is a realtime distributed messaging platform designed to operate at scale, handling billions of messages per day. It promotes distributed and decentralized topologies without single points of failure, enabling fault tolerance and high availability coupled with a reliable message delivery guarantee. See features & guarantees.

142 stacks148 votes356 followers

Why developers like NSQ:

✓It's in golang(29)
✓Lightweight(20)
✓Distributed(20)

Compare StreamSets vs NSQ →

Mosquitto

It is lightweight and is suitable for use on all devices from low power single board computers to full servers.. The MQTT protocol provides a lightweight method of carrying out messaging using a publish/subscribe model. This makes it suitable for Internet of Things messaging such as with low power sensors or mobile devices such as phones, embedded computers or microcontrollers.

136 stacks14 votes306 followers

Why developers like Mosquitto:

✓Simple and light(10)
✓Performance(4)

Compare StreamSets vs Mosquitto →

MediatR

It is a low-ambition library trying to solve a simple problem — decoupling the in-process sending of messages from handling messages. Cross-platform, supporting .NET Framework 4.6.1 and netstandard2.0.

134 stacks0 votes41 followers

Compare StreamSets vs MediatR →

Mule runtime engine

Its mission is to connect the world’s applications, data and devices. It makes connecting anything easy with Anypoint Platform™, the only complete integration platform for SaaS, SOA and APIs. Thousands of organizations in 60 countries, from emerging brands to Global 500 enterprises, use it to innovate faster and gain competitive advantage.

127 stacks8 votes129 followers

Why developers like Mule runtime engine:

✓Open Source(4)

Compare StreamSets vs Mule runtime engine →

WCF

It is a framework for building service-oriented applications. Using this, you can send data as asynchronous messages from one service endpoint to another. A service endpoint can be part of a continuously available service hosted by IIS, or it can be a service hosted in an application.

125 stacks5 votes107 followers

Why developers like WCF:

✓Classes(5)

Compare StreamSets vs WCF →

Apache Pulsar

Apache Pulsar is a distributed messaging solution developed and released to open source at Yahoo. Pulsar supports both pub-sub messaging and queuing in a platform designed for performance, scalability, and ease of development and operation.

119 stacks24 votes199 followers

Why developers like Apache Pulsar:

✓Simple(7)
✓Scalable(4)
✓High-throughput(3)

Compare StreamSets vs Apache Pulsar →

Dremio

Dremio—the data lake engine, operationalizes your data lake storage and speeds your analytics processes with a high-performance and high-efficiency query engine while also democratizing data access for data scientists and analysts.

116 stacks8 votes348 followers

Why developers like Dremio:

✓Nice GUI to enable more people to work with Data(3)

Compare StreamSets vs Dremio →

Dask

It is a versatile tool that supports a variety of workloads. It is composed of two parts: Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads. Big Data collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.

116 stacks0 votes142 followers

Compare StreamSets vs Dask →

Pentaho Data Integration

It enable users to ingest, blend, cleanse and prepare diverse data from any source. With visual tools to eliminate coding and complexity, It puts the best quality data at the fingertips of IT and the business.

112 stacks0 votes79 followers

Compare StreamSets vs Pentaho Data Integration →

Delta Lake

An open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.

105 stacks0 votes315 followers

Compare StreamSets vs Delta Lake →

Azure Synapse

It is an analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. It brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.

104 stacks10 votes230 followers

Why developers like Azure Synapse:

✓ETL(4)
✓Security(3)

Compare StreamSets vs Azure Synapse →

Amazon Redshift Spectrum

With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” -- without having to load or transform any data.

99 stacks3 votes147 followers

Compare StreamSets vs Amazon Redshift Spectrum →

Apache Parquet

It is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

97 stacks0 votes190 followers

Compare StreamSets vs Apache Parquet →

Vertica

It provides a best-in-class, unified analytics platform that will forever be independent from underlying infrastructure.

90 stacks16 votes120 followers

Why developers like Vertica:

✓Shared nothing or shared everything architecture(3)

Compare StreamSets vs Vertica →

Gearman

Gearman allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events.

77 stacks45 votes144 followers

Why developers like Gearman:

✓Free(11)
✓Ease of use and very simple APIs(11)
✓Polyglot(6)

Compare StreamSets vs Gearman →

NServiceBus

Performance, scalability, pub/sub, reliable integration, workflow orchestration, and everything else you could possibly want in a service bus.

76 stacks2 votes132 followers

Compare StreamSets vs NServiceBus →

Apache Kudu

A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's storage layer to enable fast analytics on fast data.

71 stacks10 votes259 followers

Why developers like Apache Kudu:

✓Realtime Analytics(10)

Compare StreamSets vs Apache Kudu →

XMPP

It is a set of open technologies for instant messaging, presence, multi-party chat, voice and video calls, collaboration, lightweight middleware, content syndication, and generalized routing of XML data.

71 stacks0 votes138 followers

Compare StreamSets vs XMPP →

ScratchDB

It is an open-source alternative to BigQuery, Redshift, and Snowflake. It is a wrapper around Clickhouse that lets you input arbitrary JSON and perform analytical queries against it. It automatically creates tables and columns when new data is added.

64 stacks0 votes2 followers

Compare StreamSets vs ScratchDB →

CloudAMQP

Fully managed, highly available RabbitMQ servers and clusters, on all major compute platforms.

62 stacks7 votes84 followers

Why developers like CloudAMQP:

✓Some of the best customer support you'll ever find(4)
✓Easy to provision(3)

Compare StreamSets vs CloudAMQP →