Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Heibaiying BigData Notes Kafka Message Polling

From Leeroopedia


Knowledge Sources
Domains Messaging, Distributed_Systems
Last Updated 2026-02-10 10:00 GMT

Overview

The consumer poll loop is the fundamental consumption pattern in Kafka, where the consumer repeatedly calls poll() to fetch batches of records, handle heartbeats, and participate in group rebalancing.

Description

Kafka consumers retrieve records by calling the poll() method in a continuous loop. Each call to poll() performs several internal operations:

  • Fetching records: The consumer requests records from the broker for its assigned partitions. Records are returned in a ConsumerRecords collection, which may contain records from multiple partitions.
  • Heartbeat management: The poll call triggers heartbeat signals to the group coordinator, indicating that the consumer is alive and processing. If the consumer fails to call poll() within the max.poll.interval.ms window, it is considered dead and its partitions are reassigned.
  • Rebalance participation: When the group coordinator detects membership changes, it triggers a rebalance during the next poll() call. The consumer receives its new partition assignment as part of the poll response.

The poll() method accepts a Duration timeout parameter. If no records are available, the consumer blocks for up to the specified duration before returning an empty ConsumerRecords. A shorter timeout enables more responsive shutdown handling; a longer timeout reduces CPU usage during idle periods.

Each ConsumerRecord in the returned batch contains the topic, partition, offset, timestamp, key, and value. The application processes these records and then (depending on the offset management strategy) commits the offsets.

Usage

Use the poll loop as the core structure of every Kafka consumer application. The loop should run continuously until a shutdown signal is received. Processing logic should be fast enough to complete within max.poll.interval.ms to avoid triggering unnecessary rebalances.

Theoretical Basis

The poll loop follows this pattern:

1. while (running) {
2.     ConsumerRecords records = consumer.poll(timeout);
3.     for each record in records {
4.         process(record.topic, record.partition, record.offset, record.key, record.value);
5.     }
6.     // (Optional) commit offsets
7. }
8. consumer.close();

Key configuration parameters affecting poll behavior:

  • max.poll.records: The maximum number of records returned per poll call. Limiting this value prevents long processing times that could exceed max.poll.interval.ms.
  • max.poll.interval.ms: The maximum delay between poll invocations. Exceeding this triggers a rebalance.
  • fetch.min.bytes: The minimum amount of data the broker should return per fetch request. The broker waits until this threshold is met or fetch.max.wait.ms elapses.
  • fetch.max.wait.ms: The maximum time the broker waits to fill fetch.min.bytes before responding.

The ConsumerRecords object provides iteration methods including iterator(), records(TopicPartition) for per-partition access, and count() for the total number of records.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment