Implementation:Apache Kafka Connect Distributed Script
| Knowledge Sources | |
|---|---|
| Domains | Kafka_Connect, Distributed_Systems, CLI |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
External Tool Doc for launching Kafka Connect in distributed mode via the connect-distributed.sh shell script.
Description
The connect-distributed.sh script is the primary entry point for running Kafka Connect in distributed mode. It is a shell wrapper that configures Log4j2 logging defaults, JVM heap settings, and process naming before delegating to kafka-run-class.sh to launch the org.apache.kafka.connect.cli.ConnectDistributed Java class. Distributed mode enables multiple Connect workers to coordinate via a Kafka cluster, providing scalability and fault tolerance for connector tasks.
Usage
Use this script to start a Kafka Connect worker in distributed mode for production deployments where multiple workers share connector and task assignments across a cluster. This is the recommended deployment model for Kafka Connect in production.
Code Reference
Source Location
- Repository: Apache_Kafka
- File: bin/connect-distributed.sh
- Lines: 1-45
Signature
#!/bin/bash
# Usage: connect-distributed.sh [-daemon] connect-distributed.properties
# Environment variables:
# KAFKA_LOG4J_OPTS - Log4j2 configuration (default: -Dlog4j2.configurationFile=.../config/connect-log4j2.yaml)
# KAFKA_HEAP_OPTS - JVM heap settings (default: -Xms256M -Xmx2G)
# EXTRA_ARGS - Additional arguments for kafka-run-class.sh
# Delegates to:
exec kafka-run-class.sh $EXTRA_ARGS org.apache.kafka.connect.cli.ConnectDistributed "$@"
Import
# No import required; invoke directly from the Kafka installation bin/ directory:
bin/connect-distributed.sh connect-distributed.properties
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| properties_file | File path | Yes | Path to the Connect distributed worker configuration file (e.g., connect-distributed.properties) |
| -daemon | Flag | No | Run the Connect worker as a background daemon process |
| KAFKA_LOG4J_OPTS | Env var | No | Custom Log4j2 configuration; defaults to connect-log4j2.yaml |
| KAFKA_HEAP_OPTS | Env var | No | JVM heap settings; defaults to -Xms256M -Xmx2G |
Outputs
| Name | Type | Description |
|---|---|---|
| Connect worker process | JVM process | A running Kafka Connect distributed worker that joins the Connect cluster |
| Log files | Files | Log output as configured by Log4j2 (connect-log4j2.yaml) |
Usage Examples
Start Connect in Distributed Mode
# Start a Kafka Connect distributed worker with default settings
bin/connect-distributed.sh config/connect-distributed.properties
Start as Daemon
# Start Connect distributed worker in the background
bin/connect-distributed.sh -daemon config/connect-distributed.properties
Custom Heap and Logging
# Override heap settings and logging configuration
export KAFKA_HEAP_OPTS="-Xms512M -Xmx4G"
export KAFKA_LOG4J_OPTS="-Dlog4j2.configurationFile=/path/to/custom-log4j2.yaml"
bin/connect-distributed.sh config/connect-distributed.properties