Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:DataTalksClub Data engineering zoomcamp Kafka Consumer Pattern

From Leeroopedia


Page Metadata
Knowledge Sources DataTalksClub/data-engineering-zoomcamp (07-streaming)
Domains Data_Engineering, Stream_Processing
Last Updated 2026-02-09 14:00 GMT

Overview

Consuming messages from a distributed log with consumer groups enables subscribers to poll for new messages with automatic offset management and group-based load balancing across partitions.

Description

The Kafka consumer pattern is the read side of a streaming pipeline. Consumers subscribe to one or more topics and continuously poll the broker for new messages. The pattern introduces several important concepts:

  • Consumer groups: Each consumer belongs to a named group. Kafka distributes topic partitions across the consumers in a group so that each partition is read by exactly one consumer. This provides horizontal scalability -- adding more consumers to a group increases throughput, up to the number of partitions.
  • Offset management: Kafka tracks the position (offset) of each consumer group within each partition. The consumer can start from the earliest available message, the latest message, or a specific offset. Automatic offset commits simplify the programming model by periodically saving the consumer's position.
  • Poll-based consumption: Unlike push-based messaging systems, Kafka consumers explicitly poll for messages. This gives the consumer control over its processing rate and allows it to handle backpressure naturally. The poll timeout controls how long the consumer waits for new messages before returning an empty result.
  • Deserialization: The consumer must convert the raw bytes received from the broker back into application-level objects. Key and value deserializers are configured at consumer initialization, typically mirroring the serializers used by the producer.
  • Graceful shutdown: The consumer poll loop must handle interruption signals (e.g., KeyboardInterrupt) and close the consumer cleanly to commit final offsets and release partition assignments.

Usage

Use this principle when:

  • You need to read messages from a Kafka topic for processing, storage, or forwarding.
  • You want automatic load balancing across multiple consumer instances.
  • You need at-least-once or exactly-once delivery semantics with offset tracking.
  • You are building a service that reacts to events in near real-time.

Theoretical Basis

The consumer pattern follows a subscribe-poll-process loop:

FUNCTION consume_messages(topics, broker_config):
    consumer = create_consumer(
        bootstrap_servers  = broker_config.servers,
        group_id           = "my-consumer-group",
        auto_offset_reset  = "earliest",
        enable_auto_commit = TRUE,
        key_deserializer   = FUNCTION(bytes): decode_as_integer(bytes),
        value_deserializer = FUNCTION(bytes): decode_as_domain_object(bytes)
    )

    consumer.subscribe(topics)

    LOOP FOREVER:
        TRY:
            messages = consumer.poll(timeout=1.0 seconds)

            IF messages IS EMPTY:
                CONTINUE

            FOR EACH partition, message_batch IN messages:
                FOR EACH message IN message_batch:
                    PROCESS(message.key, message.value)

        CATCH InterruptSignal:
            BREAK

    consumer.close()   -- commits final offsets and releases partitions

The design principles of this pattern are:

  1. Pull-based flow control: By polling with a timeout, the consumer controls its own consumption rate. If processing is slow, the consumer simply polls less frequently, providing natural backpressure.
  2. Group coordination: The consumer group protocol handles partition assignment, rebalancing when consumers join or leave, and failover when a consumer crashes. This is transparent to the application code.
  3. Offset semantics: With auto_offset_reset='earliest', a new consumer group starts reading from the beginning of the topic, ensuring no messages are missed. With enable_auto_commit=True, offsets are committed periodically, providing at-least-once delivery semantics.
  4. Symmetric deserialization: The consumer's deserializers must be the inverse of the producer's serializers. If the producer encodes keys as UTF-8 strings of integers, the consumer must decode bytes to UTF-8 and then parse as integers.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment