Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Fede1024 Rust rdkafka Transaction Error Recovery

From Leeroopedia




Knowledge Sources
Domains Messaging, Debugging
Last Updated 2026-02-07 19:30 GMT

Overview

Transaction errors in librdkafka fall into three categories requiring different recovery strategies: retriable (retry the operation), abort-required (abort and restart transaction), and fatal (terminate the application).

Description

The transactional producer API has a non-standard error handling model. Unlike typical Rust `Result` errors where you decide what to do, librdkafka transaction errors carry metadata that dictates the required recovery action. The `RDKafkaError` type exposes three methods: `is_retriable()`, `txn_requires_abort()`, and `is_fatal()`. Applications must check these flags and respond accordingly, or risk data corruption, duplicate messages, or hung producers.

Usage

Use this heuristic when implementing transactional produce-consume patterns or handling errors from any `Producer::*_transaction` method. Every call to `begin_transaction`, `commit_transaction`, `abort_transaction`, and `send_offsets_to_transaction` can return errors that need this three-way classification.

The Insight (Rule of Thumb)

  • Action: After every transaction method call, check the error type using `is_retriable()`, `txn_requires_abort()`, and `is_fatal()` in that order.
  • Value:
    • Retriable: Retry the same operation (with backoff).
    • Abort-required: Call `abort_transaction()`, then `begin_transaction()` to start fresh.
    • Fatal: Stop the producer and terminate the application.
  • Trade-off: This error classification adds complexity but is essential for exactly-once semantics. Ignoring it leads to silent data loss or duplication.

Reasoning

The transactional protocol in Kafka is stateful. A retriable error (like a temporary network issue) means the broker can still accept the same operation. An abort-required error means the transaction's state on the broker is inconsistent and must be rolled back. A fatal error means the producer's internal state is unrecoverable (e.g., producer fenced by a newer instance with the same `transactional.id`). These states map directly to Kafka's transaction coordinator protocol.

Code Evidence

Error classification from `src/producer/mod.rs:105-114`:

//! ### Errors
//!
//! Errors returned by transaction methods may:
//!
//! * be retriable ([`RDKafkaError::is_retriable`]), in which case the operation
//!   that encountered the error may be retried.
//! * require abort ([`RDKafkaError::txn_requires_abort`], in which case the
//!   current transaction must be aborted and a new transaction begun.
//! * be fatal ([`RDKafkaError::is_fatal`]), in which case the producer must be
//!   stopped and the application terminated.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment