Kafka logo

Kafka

Distributed, fault tolerant, high throughput pub-sub messaging system

What is Kafka?

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
Kafka is a tool in the Message Queue category of a tech stack.
Kafka is an open source tool with 30.1K GitHub stars and 14.4K GitHub forks. Here’s a link to Kafka's open source repository on GitHub

Who uses Kafka?

Companies
1609 companies reportedly use Kafka in their tech stacks, including Uber, Shopify, and Spotify.

Developers
21527 developers on StackShare have stated that they use Kafka.

Kafka Integrations

Datadog, Apache Flink, Presto, Databricks, and Couchbase are some of the popular tools that integrate with Kafka. Here's a list of all 105 tools that integrate with Kafka.
Pros of Kafka
126
High-throughput
119
Distributed
92
Scalable
86
High-Performance
66
Durable
38
Publish-Subscribe
19
Simple-to-use
18
Open source
12
Written in Scala and java. Runs on JVM
9
Message broker + Streaming system
4
KSQL
4
Avro schema integration
4
Robust
3
Suport Multiple clients
2
Extremely good parallelism constructs
2
Partioned, replayable log
1
Simple publisher / multi-subscriber model
1
Flexible
1
Fun
Decisions about Kafka

Here are some stack decisions, common use cases and reviews by companies and developers who chose Kafka in their tech stack.

Needs advice
on
KafkaKafka
and
Vert.xVert.x

We send and receive messages continuously with the help of Kafka. But Kafka is provided by both Apache and Vert.x which are called Kafka and Vert.x Kafka respectively. I need to know which one is best and why in terms of performance. Also the purpose of their own type.

See more
William Seota

I would like to build a mobile app that can scale to around 1M users over 1 year. We are currently testing with 100 users without any real load issues. We use the MERN stack with React Native Expo, and Google Cloud Services for GCB. We also use Google Cloud Run. We use a microservices architecture that we manage ourselves but thought of using Kafka. However, I need advice on optimising the app in terms of:

  1. load balancing,
  2. caching,
  3. database optimisation,
  4. autoscaling,
  5. load testing, and
  6. continuous optimisation frameworks

Any help would be appreciated! Thanks:)

See more

Currently been using an older version of OpenFaaS, but the new version now requires payment for things we did on the older version. Been looking for alternatives to OpenFaas that have Kafka integrations, and scale to 0 capabilities.

looked at Apache OpenWhisk, but we run on RKE2, and my initial install of Openwhisk appears to be too out of date to support RKE2 and missing images from docker.io. So now looking at Knative. What are your thoughts? We need support to be able to process functions about 10k a min, which can vary on time of execution, between ms and mins. So looking for horizontal scaling that can be controlled by other metrics, than just cpu and ram utilization, but more so, for example if the wait is over 5 scale out.. Issue with older openfaas, was scaling on RKE2 was not working great, for example, I could get it to scale from 5 to 20 pods, but only 12 of them would ever have data, but my backlog would have 100k's of files waiting.. So even though it scaled up, it was as if the distribution of work was only being married to specific pods. If I killed the pods that had no work, they come up again with no work, if I killed one with work, then another pod would scale up and another pod would start to get work. And On occasion with hours, it would reset down to the original deployment allotment of pods, and never scale up again, until I go into Kubernetes and tell it to add more pods.

So hoping to find a solution that doesn't require as much triage, to work with scaling, as points in time we are at higher volume and other points of time could be no volume.

See more
Zvonko Pino
Senior Enterprise Architect at Zetaware SRL · | 4 upvotes · 130 views
Needs advice
on
confluent-kafkaconfluent-kafka
and
kafka-pythonkafka-python

Which is the most portable and performant Kafka library? I am evaluating confluent-kafka and kafka-python.

See more
Needs advice
on
KafkaKafka
and
RabbitMQRabbitMQ

I want to collect the dependency data that Java applications build in the maven tool by CI/CD tools. I want to know how to pick collection tech, and what is the pros and cons between Kafka an RabbitMQ.

Thanks!

See more
Needs advice
on
DruidDruidKafkaKafka
and
Apache SparkApache Spark

My process is like this: I would get data once a month, either from Google BigQuery or as parquet files from Azure Blob Storage. I have a script that does some cleaning and then stores the result as partitioned parquet files because the following process cannot handle loading all data to memory.

The next process is making a heavy computation in a parallel fashion (per partition), and storing 3 intermediate versions as parquet files: two used for statistics, and the third will be filtered and create the final files.

I make a report based on the two files in Jupyter notebook and convert it to HTML.

  • Everything is done with vanilla python and Pandas.
  • sometimes I may get a different format of data
  • cloud service is Microsoft Azure.

What I'm considering is the following:

Get the data with Kafka or with native python, do the first processing, and store data in Druid, the second processing will be done with Apache Spark getting data from apache druid.

the intermediate states can be stored in druid too. and visualization would be with apache superset.

See more

Blog Posts

Dec 22 2021 at 5:41AM

Pinterest

MySQLKafkaDruid+3
3
642
Amazon S3KafkaZookeeper+5
8
1686
Mar 24 2021 at 12:57PM

Pinterest

GitJenkinsKafka+7
3
2259

Kafka's Features

  • Written at LinkedIn in Scala
  • Used by LinkedIn to offload processing of all page and other views
  • Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled)
  • Supports both on-line as off-line processing

Kafka Alternatives & Comparisons

What are some alternatives to Kafka?
ActiveMQ
Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License.
RabbitMQ
RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.
Amazon Kinesis
Amazon Kinesis can collect and process hundreds of gigabytes of data per second from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
Akka
Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.
See all alternatives

Kafka's Followers
22177 developers follow Kafka to keep up with related blogs and decisions.