Wednesday

Kafka with KRaft (Kafka Raft)

                                                            image credit Kafka

1. What is Kafka?

Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, real-time data streaming. It is used for publish-subscribe messaging, event sourcing, log processing, and real-time analytics.

Key Features:

  • Scalability: Distributes data across multiple brokers.
  • Durability: Stores data persistently.
  • High Throughput: Handles millions of messages per second.
  • Fault Tolerance: Replicates data across nodes.

Core Components:

  • Producers: Send messages (events) to Kafka.
  • Topics: Logical channels where messages are stored.
  • Partitions: Sub-divisions of topics for parallel processing.
  • Consumers: Read messages from topics.
  • Brokers: Kafka servers that store and manage data.
  • Zookeeper (or KRaft in newer versions): Manages metadata and leader election.

2. How Kafka Works (Step-by-Step)

Step 1: Producers Publish Messages

  • Producers send messages to a topic.
  • Kafka distributes messages among partitions (using round-robin or key-based distribution).

Step 2: Brokers Store Messages

  • Brokers receive data and store it in partition logs.
  • Messages persist based on retention policies.

Step 3: Consumers Subscribe to Topics

  • Consumers read messages sequentially from partitions.
  • Kafka maintains an offset (position of last-read message).
  • Consumers can commit offsets for fault tolerance.

Step 4: Message Processing

  • Consumers process messages and may send them to other Kafka topics, databases, or services.

Example: Kafka in Action

Imagine an e-commerce platform:

  • Order Service (Producer) sends order events to orders-topic.
  • Inventory Service (Consumer) updates stock.
  • Shipping Service (Consumer) prepares delivery.

3. What is Zookeeper in Kafka?

Apache Zookeeper is a distributed coordination service used by Kafka for:

  1. Broker Metadata Management: Keeps track of brokers in the cluster.
  2. Leader Election: Ensures high availability by selecting leader brokers.
  3. Topic and Partition Management: Stores information about topics and their partitions.

Limitations of Zookeeper:

  • Single Point of Failure: Zookeeper failure affects Kafka.
  • Operational Overhead: Requires separate setup and maintenance.
  • Scalability Issues: Slower performance in large clusters.

                                    image credit confluent

4. KRaft (Kafka Raft) – Replacing Zookeeper

Kafka Raft (KRaft) is a Zookeeper-free mode introduced in Kafka 2.8+ and production-ready in Kafka 3.3+.

Why KRaft?

  • Removes dependency on Zookeeper.
  • Improves scalability: No separate cluster needed.
  • Simplifies deployment: Kafka manages its metadata internally.

How KRaft Works:

  • Uses the Raft consensus algorithm for leader election and replication.
  • One broker acts as the Controller to manage metadata.
  • Other brokers synchronize metadata through replicated logs.

5. How to Set Up Kafka with KRaft (Without Zookeeper)

Step 1: Install Kafka

Download Kafka:

wget https://downloads.apache.org/kafka/3.5.0/kafka_2.13-3.5.0.tgz
tar -xzf kafka_2.13-3.5.0.tgz
cd kafka_2.13-3.5.0

Step 2: Generate KRaft Metadata

Run:

bin/kafka-storage.sh format -t $(uuidgen) -c config/kraft/server.properties

This initializes metadata storage.

Step 3: Configure Kafka for KRaft

Edit config/kraft/server.properties:

process.roles=controller,broker
node.id=1
controller.quorum.voters=1@localhost:9093
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
log.dirs=/tmp/kafka-logs

Step 4: Start Kafka (Without Zookeeper)

bin/kafka-server-start.sh config/kraft/server.properties

Step 5: Create a Topic

bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

Step 6: Start a Producer

bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092

Enter messages.

Step 7: Start a Consumer

bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092

Conclusion

  • Kafka is a high-throughput event streaming system.
  • Zookeeper was used for metadata management but had scalability issues.
  • KRaft (Kafka Raft) replaces Zookeeper with an internal Raft-based system.
  • Setting up Kafka with KRaft is easier and more scalable.

 You can find more article and tutorial about Kafka here.

No comments:

Some Handy Git Use Cases

Let's dive deeper into Git commands, especially those that are more advanced and relate to your workflow. Understanding Your Workflow Yo...