Workflow:Heibaiying BigData Notes Kafka Producer Consumer Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Big_Data, Stream_Processing, Kafka |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
End-to-end process for building a Kafka messaging pipeline with producers that publish messages and consumers that read them using various offset commit strategies.
Description
This workflow covers the complete lifecycle of producing and consuming messages in Apache Kafka. On the producer side, it demonstrates synchronous and asynchronous message sending, custom partitioners for controlling message routing, and callback-based delivery confirmation. On the consumer side, it covers consumer group subscription, poll-based message retrieval, and multiple offset management strategies including auto-commit, synchronous commit, asynchronous commit, and per-partition offset tracking. The workflow also addresses consumer rebalancing and graceful shutdown.
Usage
Execute this workflow when you need to build a reliable message-passing system between distributed applications. Use it when you need to publish events to Kafka topics and consume them with configurable delivery guarantees, whether for log aggregation, event-driven architectures, or data pipeline ingestion.
Execution Steps
Step 1: Configure and Create Kafka Producer
Set up the producer configuration properties including broker addresses, key and value serializers, acknowledgment level, retry count, and batch settings. Initialize the KafkaProducer instance with these properties.
Key considerations:
- Set bootstrap.servers to the Kafka cluster addresses
- Choose appropriate serializers for key and value types
- Configure acks level (0, 1, or all) based on durability requirements
- Tune batch.size and linger.ms for throughput vs latency tradeoff
Step 2: Send Messages with Delivery Confirmation
Construct ProducerRecord objects with topic, optional key, and value. Send messages using either synchronous mode (blocking for broker acknowledgment) or asynchronous mode (with callback for non-blocking confirmation). Optionally configure a custom partitioner to route messages to specific partitions based on business logic.
What happens:
- Synchronous send: call send().get() to block until acknowledgment
- Asynchronous send: provide a Callback to handle success or failure
- Custom partitioner: implement Partitioner interface to control partition selection
- Messages are serialized, partitioned, batched, and sent to the broker
Step 3: Configure and Create Kafka Consumer
Set up consumer configuration properties including broker addresses, key and value deserializers, consumer group ID, and offset reset policy. Initialize the KafkaConsumer instance and subscribe to one or more topics.
Key considerations:
- Set group.id to assign the consumer to a consumer group
- Configure auto.offset.reset (latest or earliest) for new consumer groups
- Choose between topic subscription and manual partition assignment
- Standalone consumers can manually assign specific partitions
Step 4: Poll and Process Messages
Implement the main consumer loop that continuously polls Kafka for new records. Process each ConsumerRecord by extracting topic, partition, offset, key, and value fields. Handle the data according to application logic.
What happens:
- Call poll() with a timeout duration in a continuous loop
- Iterate through returned ConsumerRecords batch
- Extract and process each record's data
- Handle empty poll results gracefully
Step 5: Manage Offset Commits
Choose and implement an offset commit strategy to track which messages have been processed. Options range from automatic commits to fine-grained per-partition manual commits, each offering different tradeoffs between simplicity and delivery guarantees.
Offset strategies:
- Auto-commit: simplest but may lose or duplicate messages on failure
- Synchronous commit (commitSync): blocks until confirmed, safest
- Asynchronous commit (commitAsync): non-blocking with optional callback
- Combined approach: async during normal processing, sync before shutdown
- Per-partition offset commit: finest granularity for exactly-once semantics
Step 6: Handle Consumer Rebalancing and Shutdown
Implement a ConsumerRebalanceListener to handle partition reassignment when consumers join or leave the group. Implement graceful shutdown using the wakeup() mechanism to cleanly exit the poll loop and commit final offsets.
Key considerations:
- Commit offsets for revoked partitions in onPartitionsRevoked callback
- Use consumer.wakeup() from a shutdown hook to interrupt polling
- Catch WakeupException and close the consumer in a finally block
- Ensure all resources are properly released on shutdown