3.2 Key Components
Understanding brokers, producers, consumers, and topics in Kafka.
Video Coming Soon
Kafka Key Components
Topics
A topic is where data is stored in Kafka. Topics are divided into partitions to enable scalability and parallel processing.
Topic Structure
- Think of a topic like a folder
- Partitions are like sections within that folder
- Data is stored and managed within partitions
- Partitions enable horizontal scaling
Replication for Fault Tolerance
When a broker fails, replicas ensure data availability. Kafka replicates each partition across different brokers, providing redundancy.
Single Partition Topic
For a topic with one partition:
- Partition 1 has two replicas
- Replica 1_1 on Broker A (leader by default)
- Replica 1_2 on Broker B (follower)
- Handles all read and write operations through the leader
- If Broker A fails, Replica 1_2 on Broker B becomes the new leader
Multi-Partition Topics
When a topic has multiple partitions, data distribution improves performance:
Example with 2 Partitions:
- Partition 1: Replicated across Broker A and Broker B
- Partition 2: Replicated across Broker A and Broker C
This distribution ensures:
- If Broker B fails, Partition 1 data remains available on Broker A
- If Broker C fails, Partition 2 data remains available on Broker A
- Multiple consumers can read from different partitions simultaneously
- Better load balancing across the cluster
Brokers
Brokers are the backbone of Kafka - the intermediaries between producers and consumers.
Broker Responsibilities
- Store messages sent by producers
- Distribute messages to consumers
- Manage partitions and replicas
- Handle data replication and persistence
- Ensure reliability through redundancy
Broker Architecture
1
2Producer → Broker1 → Consumer
3 ↓ Replicates
4 Broker2Brokers work together to ensure data is:
- Properly stored
- Replicated for fault tolerance
- Available for consumer access
Metadata Management
Zookeeper (Current)
Zookeeper acts as the coordinator for Kafka clusters:
Key Functions:
- Manages broker coordination
- Handles controller election
- Stores topic configurations
- Tracks cluster state
- Ensures synchronization across distributed system
Zookeeper is essential for:
- High availability
- Fault tolerance
- Broker leadership management
KRaft Mode (Future)
KRaft is Kafka's self-managed metadata system that will replace Zookeeper:
Benefits:
- Kafka handles metadata internally
- Simpler architecture
- Improved performance
- No external dependency
- Reduced operational complexity
Comparison:
| Feature | Zookeeper Mode | KRaft Mode |
|---------|---------------|------------|
| Metadata Management | External (Zookeeper) | Internal (Kafka) |
| Architecture Complexity | Higher | Lower |
| Operational Overhead | More | Less |
| Performance | Good | Better |
Producers and Consumers
Producers
Producers send data to Kafka brokers:
- Push messages to specific topics
- Data is routed to appropriate partitions
- Can specify partition keys for ordered delivery
Consumers
Consumers retrieve data from Kafka brokers:
- Pull messages from topics
- Read from partition leaders
- Track their position using offsets
Partition-Based Processing
Kafka uses partitions to distribute load and improve performance:
- Multiple consumers can read from different partitions simultaneously
- Each partition maintains message order
- Parallelism improves throughput
Payment Topic Example
Single Partition Configuration
For a Payment Topic with one partition:
Components:
- Replica 1_1 on Broker A (leader)
- Replica 1_2 on Broker B (follower)
- Zookeeper manages metadata
- Producer sends payment data to leader
- Consumer reads from leader
Limitation: Only one active consumer instance possible. Additional consumer instances will be idle.
Two Partition Configuration
For a Payment Topic with two partitions:
Partition 1:
- Replica 1_1 on Broker A (leader)
- Replica 1_2 on Broker B (follower)
Partition 2:
- Replica 2_1 on Broker B (leader)
- Replica 2_2 on Broker C (follower)
Metadata Management:
- Zookeeper tracks which replica is the leader
- Handles automatic failover if a broker fails
- Ensures continuity of service
Data Distribution:
- Odd-numbered transactions → Partition 1
- Even-numbered transactions → Partition 2
- Better load distribution
Consumer Parallelism:
- Two consumer instances can run simultaneously
- Each consumer reads from one partition
- Parallel processing improves throughput
- Single consumer would read from both partitions
Summary
Kafka's key components work together to provide:
- Topics and Partitions: Organize and distribute data
- Brokers: Store, replicate, and serve data
- Replication: Ensure fault tolerance
- Metadata Management: Coordinate cluster operations (Zookeeper/KRaft)
- Producers: Send data to topics
- Consumers: Read data from topics
- Parallel Processing: Enable scalability through partitions
This architecture allows Kafka to handle high-throughput, fault-tolerant data streaming at scale.