Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Kafka Trogdor Invocation

From Leeroopedia


Knowledge Sources
Domains Testing, Fault_Injection, Chaos_Engineering
Last Updated 2026-02-09 12:00 GMT

Overview

Principle for launching the Trogdor distributed testing framework to perform fault injection and workload generation across a Kafka cluster.

Description

Trogdor Invocation is the principle of starting components of Kafka's built-in chaos testing framework. Trogdor follows a coordinator-agent architecture where a central coordinator distributes fault injection and workload tasks to agents running on cluster nodes. This enables systematic testing of Kafka's resilience to various failure modes including network partitions, process kills, disk faults, and resource exhaustion.

The framework is distinct from other bin/ scripts because it requires test JARs on the classpath and supports four operating modes: agent (executes tasks on a node), coordinator (manages task distribution), and two CLI clients for interacting with agents and coordinators.

Usage

Use this principle when performing chaos engineering, resilience testing, or performance benchmarking of a Kafka cluster. Trogdor is appropriate for pre-production validation of cluster behavior under failure conditions and for generating controlled produce/consume workloads.

Theoretical Basis

Trogdor implements a coordinator-agent distributed task execution model:

Pseudo-code Logic:

# Abstract algorithm description (NOT real implementation)

# 1. Start agents on each cluster node
agent = Agent(agent_config, node_name)
agent.register_with_coordinator()

# 2. Start coordinator on a designated node
coordinator = Coordinator(coordinator_config)
coordinator.discover_agents()

# 3. Submit tasks via client
client = CoordinatorClient(coordinator_address)
task = FaultSpec(type="network_partition", target_nodes=["node1", "node2"])
client.create_task(task)

# 4. Coordinator distributes task to relevant agents
# 5. Agents execute fault injection locally
# 6. Results are collected and reported

Tasks are defined as JSON specifications that describe the fault type, target nodes, duration, and parameters. The coordinator manages task lifecycle (PENDING, RUNNING, DONE, ERROR) and distributes work to the appropriate agents.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment