Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Heibaiying BigData Notes Kafka Offset Management

From Leeroopedia


Knowledge Sources
Domains Messaging, Distributed_Systems
Last Updated 2026-02-10 10:00 GMT

Overview

Managing consumer offset commits controls how consumption progress is tracked, directly determining whether messages are processed at-most-once, at-least-once, or exactly-once after failures or rebalances.

Description

An offset is a sequential identifier for each record within a partition. When a consumer commits an offset, it is telling Kafka "I have processed all records up to and including this offset." Committed offsets are stored in an internal Kafka topic called __consumer_offsets. When a consumer restarts or a rebalance assigns a partition to a new consumer, consumption resumes from the last committed offset.

Kafka provides several offset commit strategies:

  • Automatic commit (enable.auto.commit=true): Offsets are committed periodically in the background at the interval specified by auto.commit.interval.ms. This is simple but can lead to message loss (if a commit happens before processing completes) or duplicate processing (if processing completes but the next auto-commit has not yet fired).
  • Synchronous commit (commitSync()): Blocks the calling thread until the broker confirms the offset commit. This guarantees that offsets are durably stored before processing continues. It is the safest option but reduces throughput due to blocking on every commit.
  • Asynchronous commit (commitAsync()): Sends the commit request without blocking. A Callback is invoked when the commit succeeds or fails. This provides higher throughput but introduces a risk: if an async commit fails, a subsequent successful commit with a higher offset will mask the failure. However, if the consumer crashes between the failed commit and the next successful one, records may be reprocessed.
  • Per-partition commit: Rather than committing the offsets for all partitions at once, the consumer can commit offsets for specific TopicPartition instances using a Map<TopicPartition, OffsetAndMetadata>. This is useful in batch processing scenarios where different partitions may be processed at different rates.
  • Combined sync and async: A common production pattern combines commitAsync() for regular commits during the poll loop (for throughput) with a final commitSync() on shutdown or error (for safety).

Usage

Use automatic commits only for simple, loss-tolerant consumers. Use synchronous commits when processing correctness is critical and throughput is not the primary concern. Use asynchronous commits with a final synchronous commit for production workloads that need both throughput and safety. Use per-partition commits when batch sizes or processing rates differ across partitions.

Theoretical Basis

The offset commit strategies and their guarantees:

Automatic Commit:
  - Offsets committed every auto.commit.interval.ms
  - Risk: commit before processing = message loss (at-most-once)
  - Risk: processing before commit = duplicate processing (at-least-once)

Synchronous Commit:
  while (running) {
      records = consumer.poll(timeout)
      process(records)
      consumer.commitSync()  // blocks until broker confirms
  }
  - Guarantee: at-least-once (records are processed before commit)
  - Trade-off: reduced throughput due to blocking

Asynchronous Commit:
  while (running) {
      records = consumer.poll(timeout)
      process(records)
      consumer.commitAsync(callback)  // non-blocking
  }
  - Guarantee: at-least-once (with possible reprocessing on failure)
  - Trade-off: higher throughput, but commit failures may go unnoticed

Combined Pattern:
  try {
      while (running) {
          records = consumer.poll(timeout)
          process(records)
          consumer.commitAsync()  // fast, non-blocking
      }
  } finally {
      consumer.commitSync()  // safe, blocking on shutdown
      consumer.close()
  }

The committed offset for a partition is always the offset of the next record to be read, which is the offset of the last processed record plus one.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment