Environment:DataTalksClub Data engineering zoomcamp Kafka Confluent Environment
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, Infrastructure |
| Last Updated | 2026-02-09 07:00 GMT |
07-streaming/python/docker/kafka/docker-compose.yml 07-streaming/python/requirements.txt
Overview
Docker Compose environment with Confluent Platform 7.2.0 (Kafka, Zookeeper, Schema Registry, Control Center) and Python streaming libraries (kafka-python, confluent_kafka, PySpark) for stream processing.
Description
This environment provides the Confluent Platform Kafka cluster for stream processing workflows. It includes a full Confluent stack with Kafka broker, Zookeeper, Schema Registry for Avro/JSON schema management, Control Center for monitoring, and Kafka REST Proxy. The Python layer includes kafka-python for JSON producer/consumer patterns and PySpark Structured Streaming for stream-to-stream transformations. Kafka broker listens on both internal (29092) and external (9092) ports.
Usage
Use this environment for any stream processing workflow involving Kafka message production, consumption, or PySpark Structured Streaming. It is the mandatory prerequisite for running the Kafka_Docker_Compose_Setup, Ride_Data_Model, JsonProducer_Implementation, JsonConsumer_Implementation, PySpark_Kafka_ReadStream, and PySpark_WriteStream_Sink implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows with Docker | Docker Desktop or Docker Engine required |
| RAM | 8GB minimum | Confluent Platform is memory-intensive (multiple JVM services) |
| Disk | ~5GB free | For Docker images and Kafka log segments |
| Ports | 9092, 2181, 8081, 8082, 9021 | Kafka, Zookeeper, Schema Registry, REST Proxy, Control Center |
Dependencies
Container Images
- `confluentinc/cp-kafka:7.2.0` (Kafka broker)
- `confluentinc/cp-zookeeper:7.2.0` (Zookeeper)
- `confluentinc/cp-schema-registry:7.2.0` (Schema Registry)
- `confluentinc/cp-enterprise-control-center:7.2.0` (Monitoring UI)
- `confluentinc/cp-kafka-rest:7.2.0` (REST Proxy)
Python Packages
From `07-streaming/python/requirements.txt`:
- `kafka-python` == 1.4.6
- `confluent_kafka`
- `requests`
- `avro`
- `faust`
- `fastavro`
Additional for PySpark Streaming
- `pyspark` (with Kafka connector jars)
- Java JDK 8 or 11
Credentials
No authentication credentials required for the development setup. Kafka runs without SASL/SSL in development mode.
Quick Install
# Create the Docker network
docker network create kafka-spark-network
# Start Kafka cluster
cd 07-streaming/python/docker/kafka
docker compose up -d
# Install Python dependencies
pip install kafka-python==1.4.6 confluent_kafka requests avro fastavro
Code Evidence
Kafka broker image from `docker-compose.yml:8`:
broker:
image: confluentinc/cp-kafka:7.2.0
Kafka listener configuration from `docker-compose.yml:18-21`:
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://broker:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
Python kafka-python dependency from `requirements.txt:1`:
kafka-python==1.4.6
PySpark Kafka connection from `streaming.py:9-16`:
df_stream = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092,broker:29092") \
.option("subscribe", consume_topic) \
.option("startingOffsets", "earliest") \
.option("checkpointLocation", "checkpoint") \
.load()
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `NoBrokersAvailable` | Kafka broker not running or wrong bootstrap servers | Start Docker Compose and use `localhost:9092` from host or `broker:29092` from containers |
| `SchemaRegistryError` | Schema Registry not reachable | Ensure Schema Registry container is running on port 8081 |
| Network `kafka-spark-network` not found | Docker network not created | Run `docker network create kafka-spark-network` before starting |
Compatibility Notes
- Dual listener setup: Kafka broker exposes two listeners: `PLAINTEXT://broker:29092` for inter-container communication and `PLAINTEXT_HOST://localhost:9092` for host access. Use the correct address based on your context.
- Schema Registry deprecation: The `SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL` setting is deprecated; the compose file uses `SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS` instead.
- kafka-python pinned: The `kafka-python` library is pinned to 1.4.6. Newer versions may have breaking API changes.
- Replication factor: Set to 1 (single-broker development). Not suitable for production.
Related Pages
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Kafka_Docker_Compose_Setup
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Ride_Data_Model
- Implementation:DataTalksClub_Data_engineering_zoomcamp_JsonProducer_Implementation
- Implementation:DataTalksClub_Data_engineering_zoomcamp_JsonConsumer_Implementation
- Implementation:DataTalksClub_Data_engineering_zoomcamp_PySpark_Kafka_ReadStream
- Implementation:DataTalksClub_Data_engineering_zoomcamp_PySpark_WriteStream_Sink