Kafka: Publish and Consume messages

In my earlier posts, I have explained about Kafka and the how to install and run Kafka on your system.

Now we will see how to publish and consume messages in Kafka.

Step 1: Create a Topic

As we know in Kafka publisher publish messages to topic and Kafka will decides this message will be assigned to which partition in a topic.

So first we will create a topic named greetings. For this lets open a new command prompt and navigate to bin/windows folder. Then by using kafka-topic.bat we can create a topic.

Notice the bootstrap server port 9092. This is the default port of Kafka server.

$ kafka-topics.bat --create --topic greetings --bootstrap-server localhost:9092

So now we have successfully created the topic.

We can pass the –describe parameter to kafka-topic.bat to get the information about the topic.

$ kafka-topics.bat --describe --topic greetings --bootstrap-server localhost:9092

Step 2: Publish some events

Now let’s write some message or publish some event to the topic.

To do so open a new command prompt and navigate to bin/windows and type below command.

$ kafka-console-producer.bat --topic greetings --bootstrap-server localhost:9092

Then type the message you want to publish. By default each line you enter will trigger a separate event to the topic.

We can stop the publisher any time by pressing Ctrl + C.

Step 3: Consume events

Open an another terminal and by using kafka-console-consumer.bat you will be able to consume messages.

$ kafka-console-consumer.bat --topic greetings --from-beginning --bootstrap-server localhost:9092

Great 👏

Now you are publishing and consuming messages using Kafka.

Summery :

In this article we have demonstrate how you can create topic in Kafka and produce and consume messages by using Kafka’s producer and consumer console library.

Prev -> Kafka: Install and Run Apache Kafka on windows

Kafka: Install and Run Apache Kafka on windows

Install Apache Kafka on Windows

STEP 1: Install JAVA SDK >8

For this we need java-jdk installed on our system.

STEP 2: Download and Install Apache Kafka binaries

You can download the Apache Kafka binaries from Apache kafka official page:

https://kafka.apache.org/downloads

STEP 3: Extract the binary

Extract the binary to some folder. Create a ‘data‘ folder at bin level.

Inside data folder create zookeeper and kafka folder.

STEP 4: Update configuration value

Update zookeeper data directory path in “config/zookeeper.Properties” configuration file.

With the zookeeper folder path that you have created in data.

Update Apache Kafka log file path in “config/server.properties” configuration file.

STEP 5:  Start Zookeeper

Now we will start zookeeper from command prompt. Go to kafka bin\windows and execute zookeeper-server-start.bat command with config/zookeeper.Properties configuration file.

Here we are using default properties that already bundled with kafka bindary and persist into the config folder. later we can update this according to our uses.

To validate if zookeeper starts successfully check for below logs.

STEP 6:  Start Apache Kafka

Finally we will start Apache Kafka from command prompt just in the same way we started zookeeper. Open an another command prompt, run kafka-server-start.bat command with kafka config/server.properties configuration file.

Summery:

To proceed with kafka you need install and run kafka and zookeeper server on your machine. with the above steps.

Next-> Kafka: Publish and Consume messages

Prev-> Kafka: Introduction to Kafka

Kafka: Introduction to Kafka

In this world of data where things and systems started depending on data, it is very important to get the right data at a right time to get the most of it. In this a great architecture of data streaming – “Apache Kafka” has introduced in 2011

Here I am brining a short course for Kafka where try to provide a basic understanding of Kafka with it’s core architecture and some hands-on on the producer consumer code.

So let’s get started 😊

What is Kafka?

Apache Kafka was originated at LinkedIn and later became an open-sourced Apache project in 2011,  then a first-class Apache project in 2012. Kafka is written in Scala and Java.

Apache Kafka is a publisher-subscriber concept based on a fault-tolerant messaging system. It is fast, scalable, and distributed by design.

“Kafka is an Event Streaming architecture.”

Event streaming is capturing data in real-time from various event sources like databases, cloud services, software applications, etc.

Why Kafka?

Kafka is a messaging system. This is typically suits for the application that requires high throughput and low latency. It can be used for real-time analytics.

Kafka can work with Flume/Flafka, Spark Streaming, Storm, HBase, Flink, and Spark for real-time ingesting, analysis and processing of streaming data. Kafka is a data stream used to feed Hadoop BigData lakes. Kafka brokers support massive message streams for a low-latency follow-up analysis in Hadoop or Spark.

Basics of Kafka:

Apache.org states that:

  • Kafka runs as a cluster on one or more servers.
  • The Kafka cluster stores a stream of records in categories called topics.
  • Each record consists of a key, a value, and a timestamp.

Key Concepts :

Events and Offset :

Kafka uses Log data structure to store the Event/Messages. Each message/Event has a unique Key. Kafka ensures that the message should not be duplicate and must be in sequence.

Offsets are the pointers to understand from where data needs to be picked.

Events/Messages can stay in the partition for very long period and even forever.

Topic and Partitions :

Topic is a uniquely defined category in which producer publishes messages.

Each topic contain one or many partitions. Partitions contains messages.

Messages are written to topics and kafka uses round robin to selects which partition to write the message to.

To make sure that some particular type of messages should go to same partition we can assign Key to the messages, attaching a key to messages will ensure messages with the same key always go to the same partition in a topic. Kafka guarantees order within a partition, but not across partitions in a topic.

Cluster and Broker :

Kafka cluster can have multiple brokers inside it, to maintain load balancing. A single Kafka server is called as Kafka broker. Kafka cluster is stateless hence to maintain cluster state Kafka uses Zookeeper.

I’ll cover zookeeper in the next point. For now let’s understand what is broker.

Broker receives messages from producer and assign offset to it and then store it on local disk.

Broker is also responsible to serve message fetch request coming from consumer.

Each broker contains one or more Topics. Each topic along with their partitions can be assigned to multiple broker but the owner or leader will be only one.

For example in the below diagram Partition 0 is replicated along with topic X in Broker 1 and Broker 2, but the leader will always be only one. The replica is used as a backup of partition. So that if any particular broker fails then the replicator takes leadership.

Producer and consumer only connects to the Leader partition.

Zookeeper:

Kafka uses Zookeeper to maintain and coordinate between brokers.

Zookeeper is also sends notification to the Producer and consumer about the presence of any new broker or if any new leader created. So that according to that they can make decision and start coordinating  the task accordingly.

Consumer Group:

A consumer group is a platform where we can have multiple consumers. Each consumer group has one unique Id.

Only one consume in the group can pull the messages from a particular partition. Same consumer group can not have multiple consumers of same partition.  

Multiple consumers can consume messages from same partition but they must be from different consumer groups.

If the consumers are more in same group and partitions are less then there are changes to have some inactive consumers in the group.

Summery:

Kafka is an event based messaging system. Mostly suited for applications where big amount of real time data needs to be processed.

In the complete architecture of Kafka it provides load balancing, data backup, maintain message order, facility to read messages from a particular position, message storage for longer period, message can be fetched by multiple consumers of different groups.

Next -> Kafka: Install and Run Apache Kafka on windows