Principle:Fede1024 Rust rdkafka Client Statistics Monitoring
| Knowledge Sources | |
|---|---|
| Domains | Monitoring, Observability, Performance |
| Last Updated | 2026-02-07 19:00 GMT |
Overview
Principle of continuous client health monitoring through periodic JSON statistics emitted by librdkafka, providing hierarchical metrics for brokers, topics, partitions, consumer groups, and transactional state.
Description
librdkafka periodically emits a comprehensive JSON statistics document when statistics.interval.ms is configured. This document contains a hierarchical tree of metrics: top-level client aggregate counters (messages produced/consumed, bytes sent/received), per-broker connection and latency metrics with HDR histogram percentiles, per-topic batch size statistics, per-partition offset tracking with consumer lag calculation, consumer group rebalance state, and exactly-once semantics (EOS) transactional producer state. The Client Statistics Monitoring principle involves deserializing this JSON into strongly-typed structures for programmatic monitoring, alerting, and dashboarding.
Usage
Apply this principle when you need operational visibility into Kafka client behavior: monitoring consumer lag across partitions, tracking broker connection health and round-trip times, alerting on rebalance storms, measuring producer queue depth, or debugging transactional producer state transitions. Enable by setting statistics.interval.ms (e.g., 5000 for 5-second intervals) and implementing the ClientContext::stats callback.
Theoretical Basis
The statistics monitoring model follows a hierarchical aggregation pattern:
Metric Hierarchy:
Statistics (client-level)
├── Aggregate counters (tx, rx, msg_cnt, msg_size)
├── Brokers (HashMap<String, Broker>)
│ ├── Connection state (INIT, DOWN, CONNECT, AUTH, UP)
│ ├── Request/response counters
│ ├── Latency histograms (Window: min, max, avg, p50-p99.99)
│ │ ├── int_latency (internal queue latency)
│ │ ├── outbuf_latency (request queue latency)
│ │ ├── rtt (round-trip time)
│ │ └── throttle (broker throttling)
│ └── Topic-partition assignments
├── Topics (HashMap<String, Topic>)
│ ├── Batch size/count statistics
│ └── Partitions (HashMap<i32, Partition>)
│ ├── Offset tracking (committed, stored, hi, lo, eof)
│ ├── Consumer lag (hi_offset - committed_offset)
│ └── Message counters (tx, rx, inflight)
├── ConsumerGroup (Optional)
│ ├── State, join_state
│ ├── Rebalance tracking (age, count, reason)
│ └── Assignment size
└── EOS (Optional)
├── Idempotent producer state
├── Transactional state
└── Producer ID/epoch
Key Monitoring Patterns:
- Consumer lag: hi_offset - committed_offset per partition
- Broker health: Connection state transitions and RTT percentiles
- Producer backpressure: msg_cnt vs msg_max ratio
- Rebalance frequency: rebalance_cnt and rebalance_age