Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Heibaiying BigData Notes Kafka Producer Consumer Pipeline

From Leeroopedia


Knowledge Sources
Domains Big_Data, Stream_Processing, Kafka
Last Updated 2026-02-10 10:00 GMT

Overview

End-to-end process for building a Kafka messaging pipeline with producers that publish messages and consumers that read them using various offset commit strategies.

Description

This workflow covers the complete lifecycle of producing and consuming messages in Apache Kafka. On the producer side, it demonstrates synchronous and asynchronous message sending, custom partitioners for controlling message routing, and callback-based delivery confirmation. On the consumer side, it covers consumer group subscription, poll-based message retrieval, and multiple offset management strategies including auto-commit, synchronous commit, asynchronous commit, and per-partition offset tracking. The workflow also addresses consumer rebalancing and graceful shutdown.

Usage

Execute this workflow when you need to build a reliable message-passing system between distributed applications. Use it when you need to publish events to Kafka topics and consume them with configurable delivery guarantees, whether for log aggregation, event-driven architectures, or data pipeline ingestion.

Execution Steps

Step 1: Configure and Create Kafka Producer

Set up the producer configuration properties including broker addresses, key and value serializers, acknowledgment level, retry count, and batch settings. Initialize the KafkaProducer instance with these properties.

Key considerations:

  • Set bootstrap.servers to the Kafka cluster addresses
  • Choose appropriate serializers for key and value types
  • Configure acks level (0, 1, or all) based on durability requirements
  • Tune batch.size and linger.ms for throughput vs latency tradeoff

Step 2: Send Messages with Delivery Confirmation

Construct ProducerRecord objects with topic, optional key, and value. Send messages using either synchronous mode (blocking for broker acknowledgment) or asynchronous mode (with callback for non-blocking confirmation). Optionally configure a custom partitioner to route messages to specific partitions based on business logic.

What happens:

  • Synchronous send: call send().get() to block until acknowledgment
  • Asynchronous send: provide a Callback to handle success or failure
  • Custom partitioner: implement Partitioner interface to control partition selection
  • Messages are serialized, partitioned, batched, and sent to the broker

Step 3: Configure and Create Kafka Consumer

Set up consumer configuration properties including broker addresses, key and value deserializers, consumer group ID, and offset reset policy. Initialize the KafkaConsumer instance and subscribe to one or more topics.

Key considerations:

  • Set group.id to assign the consumer to a consumer group
  • Configure auto.offset.reset (latest or earliest) for new consumer groups
  • Choose between topic subscription and manual partition assignment
  • Standalone consumers can manually assign specific partitions

Step 4: Poll and Process Messages

Implement the main consumer loop that continuously polls Kafka for new records. Process each ConsumerRecord by extracting topic, partition, offset, key, and value fields. Handle the data according to application logic.

What happens:

  • Call poll() with a timeout duration in a continuous loop
  • Iterate through returned ConsumerRecords batch
  • Extract and process each record's data
  • Handle empty poll results gracefully

Step 5: Manage Offset Commits

Choose and implement an offset commit strategy to track which messages have been processed. Options range from automatic commits to fine-grained per-partition manual commits, each offering different tradeoffs between simplicity and delivery guarantees.

Offset strategies:

  • Auto-commit: simplest but may lose or duplicate messages on failure
  • Synchronous commit (commitSync): blocks until confirmed, safest
  • Asynchronous commit (commitAsync): non-blocking with optional callback
  • Combined approach: async during normal processing, sync before shutdown
  • Per-partition offset commit: finest granularity for exactly-once semantics

Step 6: Handle Consumer Rebalancing and Shutdown

Implement a ConsumerRebalanceListener to handle partition reassignment when consumers join or leave the group. Implement graceful shutdown using the wakeup() mechanism to cleanly exit the poll loop and commit final offsets.

Key considerations:

  • Commit offsets for revoked partitions in onPartitionsRevoked callback
  • Use consumer.wakeup() from a shutdown hook to interrupt polling
  • Catch WakeupException and close the consumer in a finally block
  • Ensure all resources are properly released on shutdown

Execution Diagram

GitHub URL

Workflow Repository