Workflow:Heibaiying BigData Notes Kafka Producer Consumer Pipeline

Knowledge Sources	BigData-Notes Apache Kafka Docs
Domains	Big_Data, Stream_Processing, Kafka
Last Updated	2026-02-10 10:00 GMT

Overview

End-to-end process for building a Kafka messaging pipeline with producers that publish messages and consumers that read them using various offset commit strategies.

Description

This workflow covers the complete lifecycle of producing and consuming messages in Apache Kafka. On the producer side, it demonstrates synchronous and asynchronous message sending, custom partitioners for controlling message routing, and callback-based delivery confirmation. On the consumer side, it covers consumer group subscription, poll-based message retrieval, and multiple offset management strategies including auto-commit, synchronous commit, asynchronous commit, and per-partition offset tracking. The workflow also addresses consumer rebalancing and graceful shutdown.

Usage

Execute this workflow when you need to build a reliable message-passing system between distributed applications. Use it when you need to publish events to Kafka topics and consume them with configurable delivery guarantees, whether for log aggregation, event-driven architectures, or data pipeline ingestion.

Execution Steps

Step 1: Configure and Create Kafka Producer

Set up the producer configuration properties including broker addresses, key and value serializers, acknowledgment level, retry count, and batch settings. Initialize the KafkaProducer instance with these properties.

Key considerations:

Set bootstrap.servers to the Kafka cluster addresses
Choose appropriate serializers for key and value types
Configure acks level (0, 1, or all) based on durability requirements
Tune batch.size and linger.ms for throughput vs latency tradeoff

Step 2: Send Messages with Delivery Confirmation

Construct ProducerRecord objects with topic, optional key, and value. Send messages using either synchronous mode (blocking for broker acknowledgment) or asynchronous mode (with callback for non-blocking confirmation). Optionally configure a custom partitioner to route messages to specific partitions based on business logic.

What happens:

Synchronous send: call send().get() to block until acknowledgment
Asynchronous send: provide a Callback to handle success or failure
Custom partitioner: implement Partitioner interface to control partition selection
Messages are serialized, partitioned, batched, and sent to the broker

Step 3: Configure and Create Kafka Consumer

Set up consumer configuration properties including broker addresses, key and value deserializers, consumer group ID, and offset reset policy. Initialize the KafkaConsumer instance and subscribe to one or more topics.

Key considerations:

Set group.id to assign the consumer to a consumer group
Configure auto.offset.reset (latest or earliest) for new consumer groups
Choose between topic subscription and manual partition assignment
Standalone consumers can manually assign specific partitions

Step 4: Poll and Process Messages

Implement the main consumer loop that continuously polls Kafka for new records. Process each ConsumerRecord by extracting topic, partition, offset, key, and value fields. Handle the data according to application logic.

What happens:

Call poll() with a timeout duration in a continuous loop
Iterate through returned ConsumerRecords batch
Extract and process each record's data
Handle empty poll results gracefully

Step 5: Manage Offset Commits

Choose and implement an offset commit strategy to track which messages have been processed. Options range from automatic commits to fine-grained per-partition manual commits, each offering different tradeoffs between simplicity and delivery guarantees.

Offset strategies:

Auto-commit: simplest but may lose or duplicate messages on failure
Synchronous commit (commitSync): blocks until confirmed, safest
Asynchronous commit (commitAsync): non-blocking with optional callback
Combined approach: async during normal processing, sync before shutdown
Per-partition offset commit: finest granularity for exactly-once semantics

Step 6: Handle Consumer Rebalancing and Shutdown

Implement a ConsumerRebalanceListener to handle partition reassignment when consumers join or leave the group. Implement graceful shutdown using the wakeup() mechanism to cleanly exit the poll loop and commit final offsets.

Key considerations:

Commit offsets for revoked partitions in onPartitionsRevoked callback
Use consumer.wakeup() from a shutdown hook to interrupt polling
Catch WakeupException and close the consumer in a finally block
Ensure all resources are properly released on shutdown

Execution Diagram

GitHub URL

Workflow Repository