Principle:DataTalksClub Data engineering zoomcamp Kafka Consumer Pattern
Appearance
| Page Metadata | |
|---|---|
| Knowledge Sources | DataTalksClub/data-engineering-zoomcamp (07-streaming) |
| Domains | Data_Engineering, Stream_Processing |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Consuming messages from a distributed log with consumer groups enables subscribers to poll for new messages with automatic offset management and group-based load balancing across partitions.
Description
The Kafka consumer pattern is the read side of a streaming pipeline. Consumers subscribe to one or more topics and continuously poll the broker for new messages. The pattern introduces several important concepts:
- Consumer groups: Each consumer belongs to a named group. Kafka distributes topic partitions across the consumers in a group so that each partition is read by exactly one consumer. This provides horizontal scalability -- adding more consumers to a group increases throughput, up to the number of partitions.
- Offset management: Kafka tracks the position (offset) of each consumer group within each partition. The consumer can start from the earliest available message, the latest message, or a specific offset. Automatic offset commits simplify the programming model by periodically saving the consumer's position.
- Poll-based consumption: Unlike push-based messaging systems, Kafka consumers explicitly poll for messages. This gives the consumer control over its processing rate and allows it to handle backpressure naturally. The poll timeout controls how long the consumer waits for new messages before returning an empty result.
- Deserialization: The consumer must convert the raw bytes received from the broker back into application-level objects. Key and value deserializers are configured at consumer initialization, typically mirroring the serializers used by the producer.
- Graceful shutdown: The consumer poll loop must handle interruption signals (e.g., KeyboardInterrupt) and close the consumer cleanly to commit final offsets and release partition assignments.
Usage
Use this principle when:
- You need to read messages from a Kafka topic for processing, storage, or forwarding.
- You want automatic load balancing across multiple consumer instances.
- You need at-least-once or exactly-once delivery semantics with offset tracking.
- You are building a service that reacts to events in near real-time.
Theoretical Basis
The consumer pattern follows a subscribe-poll-process loop:
FUNCTION consume_messages(topics, broker_config):
consumer = create_consumer(
bootstrap_servers = broker_config.servers,
group_id = "my-consumer-group",
auto_offset_reset = "earliest",
enable_auto_commit = TRUE,
key_deserializer = FUNCTION(bytes): decode_as_integer(bytes),
value_deserializer = FUNCTION(bytes): decode_as_domain_object(bytes)
)
consumer.subscribe(topics)
LOOP FOREVER:
TRY:
messages = consumer.poll(timeout=1.0 seconds)
IF messages IS EMPTY:
CONTINUE
FOR EACH partition, message_batch IN messages:
FOR EACH message IN message_batch:
PROCESS(message.key, message.value)
CATCH InterruptSignal:
BREAK
consumer.close() -- commits final offsets and releases partitions
The design principles of this pattern are:
- Pull-based flow control: By polling with a timeout, the consumer controls its own consumption rate. If processing is slow, the consumer simply polls less frequently, providing natural backpressure.
- Group coordination: The consumer group protocol handles partition assignment, rebalancing when consumers join or leave, and failover when a consumer crashes. This is transparent to the application code.
- Offset semantics: With
auto_offset_reset='earliest', a new consumer group starts reading from the beginning of the topic, ensuring no messages are missed. Withenable_auto_commit=True, offsets are committed periodically, providing at-least-once delivery semantics. - Symmetric deserialization: The consumer's deserializers must be the inverse of the producer's serializers. If the producer encodes keys as UTF-8 strings of integers, the consumer must decode bytes to UTF-8 and then parse as integers.
Related Pages
- Implementation:DataTalksClub_Data_engineering_zoomcamp_JsonConsumer_Implementation
- Principle:DataTalksClub_Data_engineering_zoomcamp_Streaming_Data_Model
- Principle:DataTalksClub_Data_engineering_zoomcamp_Kafka_Infrastructure_Setup
- Principle:DataTalksClub_Data_engineering_zoomcamp_Kafka_Producer_Pattern
- Heuristic:DataTalksClub_Data_engineering_zoomcamp_Kafka_Consumer_Poll_Timeout
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment