Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:DataTalksClub Data engineering zoomcamp Kafka Confluent Environment

From Leeroopedia


Knowledge Sources
Domains Stream_Processing, Infrastructure
Last Updated 2026-02-09 07:00 GMT

07-streaming/python/docker/kafka/docker-compose.yml 07-streaming/python/requirements.txt

Overview

Docker Compose environment with Confluent Platform 7.2.0 (Kafka, Zookeeper, Schema Registry, Control Center) and Python streaming libraries (kafka-python, confluent_kafka, PySpark) for stream processing.

Description

This environment provides the Confluent Platform Kafka cluster for stream processing workflows. It includes a full Confluent stack with Kafka broker, Zookeeper, Schema Registry for Avro/JSON schema management, Control Center for monitoring, and Kafka REST Proxy. The Python layer includes kafka-python for JSON producer/consumer patterns and PySpark Structured Streaming for stream-to-stream transformations. Kafka broker listens on both internal (29092) and external (9092) ports.

Usage

Use this environment for any stream processing workflow involving Kafka message production, consumption, or PySpark Structured Streaming. It is the mandatory prerequisite for running the Kafka_Docker_Compose_Setup, Ride_Data_Model, JsonProducer_Implementation, JsonConsumer_Implementation, PySpark_Kafka_ReadStream, and PySpark_WriteStream_Sink implementations.

System Requirements

Category Requirement Notes
OS Linux, macOS, or Windows with Docker Docker Desktop or Docker Engine required
RAM 8GB minimum Confluent Platform is memory-intensive (multiple JVM services)
Disk ~5GB free For Docker images and Kafka log segments
Ports 9092, 2181, 8081, 8082, 9021 Kafka, Zookeeper, Schema Registry, REST Proxy, Control Center

Dependencies

Container Images

  • `confluentinc/cp-kafka:7.2.0` (Kafka broker)
  • `confluentinc/cp-zookeeper:7.2.0` (Zookeeper)
  • `confluentinc/cp-schema-registry:7.2.0` (Schema Registry)
  • `confluentinc/cp-enterprise-control-center:7.2.0` (Monitoring UI)
  • `confluentinc/cp-kafka-rest:7.2.0` (REST Proxy)

Python Packages

From `07-streaming/python/requirements.txt`:

  • `kafka-python` == 1.4.6
  • `confluent_kafka`
  • `requests`
  • `avro`
  • `faust`
  • `fastavro`

Additional for PySpark Streaming

  • `pyspark` (with Kafka connector jars)
  • Java JDK 8 or 11

Credentials

No authentication credentials required for the development setup. Kafka runs without SASL/SSL in development mode.

Quick Install

# Create the Docker network
docker network create kafka-spark-network

# Start Kafka cluster
cd 07-streaming/python/docker/kafka
docker compose up -d

# Install Python dependencies
pip install kafka-python==1.4.6 confluent_kafka requests avro fastavro

Code Evidence

Kafka broker image from `docker-compose.yml:8`:

  broker:
    image: confluentinc/cp-kafka:7.2.0

Kafka listener configuration from `docker-compose.yml:18-21`:

      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://broker:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092

Python kafka-python dependency from `requirements.txt:1`:

kafka-python==1.4.6

PySpark Kafka connection from `streaming.py:9-16`:

    df_stream = spark \
        .readStream \
        .format("kafka") \
        .option("kafka.bootstrap.servers", "localhost:9092,broker:29092") \
        .option("subscribe", consume_topic) \
        .option("startingOffsets", "earliest") \
        .option("checkpointLocation", "checkpoint") \
        .load()

Common Errors

Error Message Cause Solution
`NoBrokersAvailable` Kafka broker not running or wrong bootstrap servers Start Docker Compose and use `localhost:9092` from host or `broker:29092` from containers
`SchemaRegistryError` Schema Registry not reachable Ensure Schema Registry container is running on port 8081
Network `kafka-spark-network` not found Docker network not created Run `docker network create kafka-spark-network` before starting

Compatibility Notes

  • Dual listener setup: Kafka broker exposes two listeners: `PLAINTEXT://broker:29092` for inter-container communication and `PLAINTEXT_HOST://localhost:9092` for host access. Use the correct address based on your context.
  • Schema Registry deprecation: The `SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL` setting is deprecated; the compose file uses `SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS` instead.
  • kafka-python pinned: The `kafka-python` library is pinned to 1.4.6. Newer versions may have breaking API changes.
  • Replication factor: Set to 1 (single-broker development). Not suitable for production.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment