Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:DataTalksClub Data engineering zoomcamp Kafka Producer Pattern

From Leeroopedia


Page Metadata
Knowledge Sources DataTalksClub/data-engineering-zoomcamp (07-streaming)
Domains Data_Engineering, Stream_Processing
Last Updated 2026-02-09 14:00 GMT

Overview

Publishing messages to a distributed log with key-based partitioning enables producers to serialize domain objects and route related events to the same partition for ordered processing.

Description

The Kafka producer pattern is the entry point for data into a streaming pipeline. A producer takes domain objects, serializes them into bytes, and sends them to a named topic on the broker. The pattern involves several key concepts:

  • Topic selection: The producer writes to a specific topic, which acts as a named, append-only log. Topics are the primary organizational unit in Kafka.
  • Key-based partitioning: Each message can carry a key. Kafka hashes the key to determine which partition receives the message. All messages with the same key land on the same partition, which guarantees ordering for that key. This is critical when downstream consumers need to process events for a given entity in sequence.
  • Serialization: The producer must convert both the message key and the message value from application-level objects to byte arrays. Common strategies include encoding keys as UTF-8 strings and values as JSON. Custom serializers are configured at producer initialization.
  • Asynchronous sends with confirmation: The send() call is typically asynchronous, returning a future. The producer can call .get() on the future to block until the broker confirms receipt, or it can register a callback for non-blocking confirmation.
  • Error handling: Network issues, broker unavailability, and serialization failures must be caught and handled. Timeout errors are particularly common during broker restarts or network partitions.

Usage

Use this principle when:

  • You need to ingest batch data (CSV files, database extracts) into a streaming pipeline.
  • You want to ensure that related events are co-located on the same partition for ordered processing.
  • You need to transform domain objects into serialized messages for downstream consumption.
  • You are building a bridge between a file-based or request-based data source and a real-time processing system.

Theoretical Basis

The producer pattern follows a read-serialize-send loop:

FUNCTION produce_messages(source_path, topic, broker_config):
    producer = create_producer(
        bootstrap_servers = broker_config.servers,
        key_serializer   = FUNCTION(key): encode_as_string(key),
        value_serializer = FUNCTION(obj): encode_as_json(obj.to_dict())
    )

    records = read_source_data(source_path)

    FOR EACH record IN records:
        partition_key = extract_partition_key(record)
        TRY:
            future = producer.send(
                topic = topic,
                key   = partition_key,
                value = record
            )
            confirmation = future.get()   -- block until broker acknowledges
            LOG "Record sent to offset {confirmation.offset}"
        CATCH TimeoutError:
            LOG "Broker unreachable, message not delivered"

    producer.flush()
    producer.close()

The design principles of this pattern are:

  1. Separation of concerns: The producer class handles transport (connecting to brokers, sending bytes). Serialization is delegated to configurable serializer functions. The domain model is responsible for its own structure.
  2. Idempotent key selection: The partition key should be a natural identifier from the domain (e.g., a location ID or user ID). This ensures deterministic partitioning without maintaining external state.
  3. Backpressure via synchronous confirmation: Calling .get() on the send future provides natural backpressure -- the producer does not outrun the broker. For high-throughput scenarios, batching and asynchronous callbacks are preferred.
  4. Graceful degradation: Timeout handling prevents the producer from crashing when the broker is temporarily unavailable, allowing for retry logic or dead-letter queuing.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment