1. What is Kafka?
Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, real-time data streaming. It is used for publish-subscribe messaging, event sourcing, log processing, and real-time analytics.
Key Features:
- Scalability: Distributes data across multiple brokers.
- Durability: Stores data persistently.
- High Throughput: Handles millions of messages per second.
- Fault Tolerance: Replicates data across nodes.
Core Components:
- Producers: Send messages (events) to Kafka.
- Topics: Logical channels where messages are stored.
- Partitions: Sub-divisions of topics for parallel processing.
- Consumers: Read messages from topics.
- Brokers: Kafka servers that store and manage data.
- Zookeeper (or KRaft in newer versions): Manages metadata and leader election.
2. How Kafka Works (Step-by-Step)
Step 1: Producers Publish Messages
- Producers send messages to a topic.
- Kafka distributes messages among partitions (using round-robin or key-based distribution).
Step 2: Brokers Store Messages
- Brokers receive data and store it in partition logs.
- Messages persist based on retention policies.
Step 3: Consumers Subscribe to Topics
- Consumers read messages sequentially from partitions.
- Kafka maintains an offset (position of last-read message).
- Consumers can commit offsets for fault tolerance.
Step 4: Message Processing
- Consumers process messages and may send them to other Kafka topics, databases, or services.
Example: Kafka in Action
Imagine an e-commerce platform:
- Order Service (Producer) sends order events to
orders-topic
. - Inventory Service (Consumer) updates stock.
- Shipping Service (Consumer) prepares delivery.
3. What is Zookeeper in Kafka?
Apache Zookeeper is a distributed coordination service used by Kafka for:
- Broker Metadata Management: Keeps track of brokers in the cluster.
- Leader Election: Ensures high availability by selecting leader brokers.
- Topic and Partition Management: Stores information about topics and their partitions.
Limitations of Zookeeper:
- Single Point of Failure: Zookeeper failure affects Kafka.
- Operational Overhead: Requires separate setup and maintenance.
- Scalability Issues: Slower performance in large clusters.
image credit confluent
4. KRaft (Kafka Raft) – Replacing Zookeeper
Kafka Raft (KRaft) is a Zookeeper-free mode introduced in Kafka 2.8+ and production-ready in Kafka 3.3+.
Why KRaft?
- Removes dependency on Zookeeper.
- Improves scalability: No separate cluster needed.
- Simplifies deployment: Kafka manages its metadata internally.
How KRaft Works:
- Uses the Raft consensus algorithm for leader election and replication.
- One broker acts as the Controller to manage metadata.
- Other brokers synchronize metadata through replicated logs.
5. How to Set Up Kafka with KRaft (Without Zookeeper)
Step 1: Install Kafka
Download Kafka:
wget https://downloads.apache.org/kafka/3.5.0/kafka_2.13-3.5.0.tgz
tar -xzf kafka_2.13-3.5.0.tgz
cd kafka_2.13-3.5.0
Step 2: Generate KRaft Metadata
Run:
bin/kafka-storage.sh format -t $(uuidgen) -c config/kraft/server.properties
This initializes metadata storage.
Step 3: Configure Kafka for KRaft
Edit config/kraft/server.properties
:
process.roles=controller,broker
node.id=1
controller.quorum.voters=1@localhost:9093
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
log.dirs=/tmp/kafka-logs
Step 4: Start Kafka (Without Zookeeper)
bin/kafka-server-start.sh config/kraft/server.properties
Step 5: Create a Topic
bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Step 6: Start a Producer
bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
Enter messages.
Step 7: Start a Consumer
bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
Conclusion
- Kafka is a high-throughput event streaming system.
- Zookeeper was used for metadata management but had scalability issues.
- KRaft (Kafka Raft) replaces Zookeeper with an internal Raft-based system.
- Setting up Kafka with KRaft is easier and more scalable.
You can find more article and tutorial about Kafka here.
No comments:
Post a Comment