Principle:Heibaiying BigData Notes Kafka Message Polling
| Knowledge Sources | |
|---|---|
| Domains | Messaging, Distributed_Systems |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
The consumer poll loop is the fundamental consumption pattern in Kafka, where the consumer repeatedly calls poll() to fetch batches of records, handle heartbeats, and participate in group rebalancing.
Description
Kafka consumers retrieve records by calling the poll() method in a continuous loop. Each call to poll() performs several internal operations:
- Fetching records: The consumer requests records from the broker for its assigned partitions. Records are returned in a ConsumerRecords collection, which may contain records from multiple partitions.
- Heartbeat management: The poll call triggers heartbeat signals to the group coordinator, indicating that the consumer is alive and processing. If the consumer fails to call poll() within the max.poll.interval.ms window, it is considered dead and its partitions are reassigned.
- Rebalance participation: When the group coordinator detects membership changes, it triggers a rebalance during the next poll() call. The consumer receives its new partition assignment as part of the poll response.
The poll() method accepts a Duration timeout parameter. If no records are available, the consumer blocks for up to the specified duration before returning an empty ConsumerRecords. A shorter timeout enables more responsive shutdown handling; a longer timeout reduces CPU usage during idle periods.
Each ConsumerRecord in the returned batch contains the topic, partition, offset, timestamp, key, and value. The application processes these records and then (depending on the offset management strategy) commits the offsets.
Usage
Use the poll loop as the core structure of every Kafka consumer application. The loop should run continuously until a shutdown signal is received. Processing logic should be fast enough to complete within max.poll.interval.ms to avoid triggering unnecessary rebalances.
Theoretical Basis
The poll loop follows this pattern:
1. while (running) {
2. ConsumerRecords records = consumer.poll(timeout);
3. for each record in records {
4. process(record.topic, record.partition, record.offset, record.key, record.value);
5. }
6. // (Optional) commit offsets
7. }
8. consumer.close();
Key configuration parameters affecting poll behavior:
- max.poll.records: The maximum number of records returned per poll call. Limiting this value prevents long processing times that could exceed max.poll.interval.ms.
- max.poll.interval.ms: The maximum delay between poll invocations. Exceeding this triggers a rebalance.
- fetch.min.bytes: The minimum amount of data the broker should return per fetch request. The broker waits until this threshold is met or fetch.max.wait.ms elapses.
- fetch.max.wait.ms: The maximum time the broker waits to fill fetch.min.bytes before responding.
The ConsumerRecords object provides iteration methods including iterator(), records(TopicPartition) for per-partition access, and count() for the total number of records.